Convert Index of a Pandas DataFrame into a Column in Python
Introduction
There are times when using Pandas that you may find yourself needing to convert the row index to a column of its own. This may be a useful operation for a couple of reasons, which we'll see later in this Byte.
DataFrames and Indexing in Pandas
Pandas is a very popular data manipulation library in Python. It has two key data structures - Series and DataFrame. A DataFrame is basically just a table of data, similar to an Excel spreadsheet. Each DataFrame has an index, which you can think of as a special column that identifies each row. By default, the index is a range of integers from 0 to n-1, where n is the number of rows in the DataFrame.
Here's a basic DataFrame:
import pandas as pd
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'London', 'Paris', 'Berlin']
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age City
0 John 28 New York
1 Anna 24 London
2 Peter 35 Paris
3 Linda 32 Berlin
The leftmost column (0,1,2,3) is the index of this DataFrame.
Why Convert the Index into a Column?
So why would we want to convert the index into a column? Well, sometimes the index of a DataFrame can contain valuable information that we want to utilize as part of our data analysis. If our DataFrame is time series data, it's possible that the index could be the timestamp (or relative time since the start of the series). By converting the index into a column, we can then perform operations on it just like any other column.
Convert Index into a Column
Now let's see how we can actually convert the index of a DataFrame into a column. We'll use the reset_index()
function provided by Pandas, which generates a new DataFrame or Series with the index reset.
Here's are the steps:
- Create a DataFrame (or use an existing one).
- Call the
reset_index()
function on the DataFrame. - If you want to keep the old index, use the
drop=False
parameter.
Here's an example:
import pandas as pd
# Step 1: Create a DataFrame
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'London', 'Paris', 'Berlin']
}
df = pd.DataFrame(data)
# Step 2: Reset the index
df_reset = df.reset_index()
print(df_reset)
Output:
index Name Age City
0 0 John 28 New York
1 1 Anna 24 London
2 2 Peter 35 Paris
3 3 Linda 32 Berlin
As you can see, the old index has been converted into a column named "index". The DataFrame now has a new default integer index.
Other Ways to Convert Index into a Column
While the reset_index()
function is a perfectly good way to convert the index into a column, there are also some other ways to do the same thing.
Another way to do this is to manually create and assign a new column. We can create a new column by using the syntax:
df['new_column'] = data
Assuming data
is a series of data, we'll now have a new column containing that data. We can leverage this, along with df.index
, to create a new column of index values:
df = pd.DataFrame({
'A': ['foo', 'bar', 'baz'],
'B': ['one', 'two', 'three']
})
df['idx'] = df.index
print(df)
This will also result in a new column withe index values:
A B idx
0 foo one 0
1 bar two 1
2 baz three 2
Conclusion
In this Byte, we've saw how to convert the index of a DataFrame into a column using Python's Pandas library. We've seen how to use the reset_index()
function, and also an alternative method using rename_axis()
and reset_index()
. We've also discussed some of the situations where converting the index into a column can be useful.