Differences Between iloc and loc in Pandas
Introduction
When working with data in Python, Pandas is a library that often comes to the rescue, especially when dealing with large datasets. One of the most common tasks you'll be performing with Pandas is data indexing and selection. This Byte will introduce you to two powerful tools provided by Pandas for this purpose: iloc
and loc
. Let's get started!
Indexing in Pandas
Pandas provides several methods to index data. Indexing is the process of selecting particular rows and columns of data from a DataFrame. This can be done in Pandas through explicit index and label-based index methods. This Byte will focus on the latter, specifically on the loc
and iloc
functions.
What is iloc?
iloc
is a Pandas function used for index-based selection. This means it indexes based on the integer positions of the rows and columns. For instance, in a DataFrame with n rows, the index of the first row is 0, and the index of the last row is n-1.
Note: iloc
stands for "integer location", so it only accepts integers.
Example: Using iloc
Let's create a simple DataFrame and use iloc
to select data.
import pandas as pd
# Creating a simple DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'Profession': ['Engineer', 'Doctor', 'Lawyer', 'Writer']}
df = pd.DataFrame(data)
print(df)
This will output:
Name Age Profession
0 John 28 Engineer
1 Anna 24 Doctor
2 Peter 35 Lawyer
3 Linda 32 Writer
Let's use iloc
to select the first row of this DataFrame:
first_row = df.iloc[0]
print(first_row)
This will output:
Name John
Age 28
Profession Engineer
Name: 0, dtype: object
Here, df.iloc[0]
returned the first row of the DataFrame. Similarly, you can use iloc
to select any row or column by its integer index.
What is loc?
loc
is another powerful data selection method provided by Pandas. It's works by allowing you to do label-based indexing, which means you select data based on the data's actual label, not its position. It's one of the two primary ways of indexing in Pandas, along with iloc
.
Unlike iloc
, which uses integer-based indexing, loc
uses label-based indexing. This can be a string, or an integer label, but it's not based on the position. It's based on the label itself.
Note: Label-based indexing means that if your DataFrame's index is a list of strings, for example, you'd use those strings to select data, not their position in the DataFrame.
Example: Using loc
Let's look at a simple example of how to use loc
to select data. First, we'll create a DataFrame:
import pandas as pd
data = {
'fruit': ['apple', 'banana', 'cherry', 'date'],
'color': ['red', 'yellow', 'red', 'brown'],
'weight': [120, 150, 10, 15]
}
df = pd.DataFrame(data)
df.set_index('fruit', inplace=True)
print(df)
Output:
color weight
fruit
apple red 120
banana yellow 150
cherry red 10
date brown 15
Now, let's use loc
to select data:
print(df.loc['banana'])
Output:
color yellow
weight 150
Name: banana, dtype: object
As you can see, we used loc
to select the row for "banana" based on its label.
Differences Between iloc and loc
The primary difference between iloc
and loc
comes down to label-based vs integer-based indexing. iloc
uses integer-based indexing, meaning you select data based on its numerical position in the DataFrame. loc
, on the other hand, uses label-based indexing, meaning you select data based on its label.
Another key difference is how they handle slices. With iloc
, the end point of a slice is not included, just like with regular Python slicing. But with loc
, the end point is included.
Conclusion
In this short Byte, we showed examples of using the loc
method in Pandas, saw it in action, and compared it with its couterpart, iloc
. These two methods are both useful tools for selecting data in Pandas, but they work in slightly different ways.