How to Sort a Pandas DataFrame by Date - Stack Abuse

How to Sort a Pandas DataFrame by Date

Introduction

Pandas is an extremely popular data manipulation and analysis library. It's the go-to tool for loading in and analyzing datasets for many.

Correctly sorting data is a crucial element of many tasks regarding data analysis. In this tutorial, we'll take a look at how to sort a Pandas DataFrame by date.

Let's start off with making a simple DataFrame with a few dates:

import pandas as pd

data = {'Name':["John", "Paul", "Dhilan", "Bob", "Henry"], 
'Date of Birth': ["01/06/86", "05/10/77", "11/12/88", "25/12/82", "01/06/86"]}
df = pd.DataFrame(data) 

print(df)

By default our output is sorted by the DataFrames index:

    Name Date of Birth
0    John      01/06/86
1    Paul      05/10/77
2  Dhilan      11/12/88
3     Bob      25/12/82
4   Henry      01/06/86

The eagle-eyed may notice that John and Paul have the same date of birth - this is on-purpose as we'll see in a moment.

Convert Strings to Datetime in Pandas DataFrame

We have input Date of Birth in date format and it appears to be formatted as such. However, the first thing we need to do is ensure Pandas recognises and understands that this date is in fact a date.

The way Pandas stores and manipulates data in a DataFrame is determined by its data type.

The data type of each value is assigned automatically, based on what it looks like. 60 will be assigned an integer type, while John will be assigned a string type. Let's check the current data type of each column:

print(df.dtypes)

This gives us our list of data types:

Name             object
Date of Birth    object
dtype: object

We can see our Date of Birth column has been assigned a basic string object type by default. However, in order to correctly sort, analyse or manipulate our dates correctly we need Pandas to recognise this column contains dates.

Let's explicitly change the data type in our Date of Birth column from an object type to a datetime type.

The easiest way to do this is to use the to_datetime() function:

df["Date of Birth"] = pd.to_datetime(df["Date of Birth"])
print(df.dtypes)

Now, if we check our output:

Name                     object
Date of Birth    datetime64[ns]
dtype: object

So, we can see we have successfully changed our data type to datetime.

Alternatively, we can manually specify the data type of our column, provided of course we know what data type we want it to be:

df["Date of Birth"] = df["Date of Birth"].astype('datetime64[ns]')

Output:

Name                     object
Date of Birth    datetime64[ns]
dtype: object

Whilst both of these methods produce the same result, the to_datetime() method is preferred as it was explicitly designed for this purpose.

Sorting a DataFrame by Date in Pandas

Now that Pandas correctly recognizes our data types, let's sort the DataFrame.

Note: All of the methods we'll use don't sort in-place, so you'll have to either reassign the changed DataFrame to a new (or the same) reference variable to persist the change - or you can use the inplace argument to change the default behavior.

Sort by Single Date Column in Ascending Order

The sort_values() method will, by default, sort data in ascending order. For dates this would mean the first or earliest in order will appear at the top of the list:

df.sort_values(by='Date of Birth', inplace=True)
print(df)

Running this code results in:

Better understand your data with visualizations.

  •  30-day no-questions money-back guarantee
  •  Beginner to Advanced
  •  Updated regularly (latest update June 2021)
  •  Updated with bonus resources and guides
    Name	Date of Birth
1	Paul	1977-05-10
3	Bob	1982-12-25
0	John	1986-01-06
4	Henry	1986-01-06
2	Dhilan	1988-11-12

Alternatively, if you don't want to use the inplace argument, you can simply re-assign the returned DataFrame from the sort_values() method to df (or any other reference variable:

df = df.sort_values(by='Date of Birth')

As we gave John and Henry have the same birthday, the order is based on their corresponding index number.

Sort by Single Date Column in Descending Order

Changing our order of sort to descending can be done by setting the ascending argument to False when calling the sort_values() function:

df.sort_values(by='Date of Birth', ascending = False, inplace=True)

This time we get our data sorted in descending order, meaning the last or most recent will appear at the top of our list. Again as John and Henry have the same birthday their order is based on their index number:

    Name	Date of Birth
2	Dhilan	1988-11-12
0	John	1986-01-06
4	Henry	1986-01-06
3	Bob	1982-12-25
1	Paul	1977-05-10

Sort by Multiple Date Columns

So, what happens if we have multiple date columns that we want to sort by?

Let’s add another date-related column to our DataFrame and make sure both our data types are correctly assigned:

# Values for the new column
employment_start = ["22/05/16", "17/08/10", "22/05/16", "11/06/19", "16/06/05"]
# Adding columns to DataFrame
df['Employment Start'] = employment_start

# Applying to_datetime() function to multiple columns at once
df[['Date of Birth', 'Employment Start']] = df[['Date of Birth', 'Employment Start']].apply(pd.to_datetime)

print(df.dtypes)
print(df)

Now, let's check if things look good:

Name                        object
Date of Birth       datetime64[ns]
Employment Start    datetime64[ns]
dtype: object

     Name Date of Birth Employment Start
0    John    1986-01-06       2016-05-22
1    Paul    1977-05-10       2010-08-17
2  Dhilan    1988-11-12       2016-05-22
3     Bob    1982-12-25       2019-11-06
4   Henry    1986-01-06       2005-06-16

Sort by Multiple Date Columns in Ascending Order

To sort the DataFrame by both Date of Birth and Employment Start in ascending order, we simply need to add both column names to our sort_values() method. Just bear in mind the priority of the sort is determined by which column is entered first:

df.sort_values(by=['Date of Birth', 'Employment Start'], inplace=True)

As this method defaults to ascending order, our output will be:

	Name	Date of Birth	Employment Start
1	Paul	1977-05-10	    2010-08-17
3	Bob	1982-12-25	    2019-11-06
4	Henry	1986-01-06	    2005-06-16
0	John	1986-01-06	    2016-05-22
2	Dhilan	1988-11-12	    2016-05-22

As Date of Birth is the first column entered in our method, Pandas is prioritizing it. Since John and Henry have the same Date of Birth, they're sorted by the Employment Start column instead.

Sort by Multiple Date Columns in Descending Order

As with the single column sort, we can change the order to descending order by changing the ascending parameter to False:

df.sort_values(by=['Date of Birth', 'Employment Start'], ascending = False, inplace=True)

Now, our output in descending order is:

Name	Date of Birth	Employment Start
2	Dhilan	1988-11-12	2016-05-22
0	John	1986-01-06	2016-05-22
4	Henry	1986-01-06	2005-06-16
3	Bob 	1982-12-25	2019-11-06
1	Paul	1977-05-10	2010-08-17

As we can see John and Henry both appear higher in the list as the birthdays are displayed in descending order. This time though, John takes priority over Henry due to his more recent Employment Start date.

Sort by Multiple Date Columns and Variable Order Sorts

Now, what if we not only want to sort using multiple columns but also have these columns sorted using different ascending criteria? With Pandas, this can be implemented within the same sort_values() method we've used so far. We just have to pass the correct and corresponding list of values in the ascending parameter.

In this example let’s assume we want to sort our Employment Start in ascending order, i.e. longest serving first, but then their Date of Birth in descending order i.e. youngest first:

df.sort_values(by=['Employment Start', 'Date of Birth'], ascending = [True, False], inplace=True)

The data is first sorted by Employment Start in ascending order, this takes priority as this was the first column passed in our method. We then sort Date of Birth in descending order. As Dhilan and John share the same Employment Start date, Dhilan now takes priority as he is younger than John:

Name	Date of Birth	Employment Start
4	Henry	1986-01-06	2005-06-16
1	Paul	1977-05-10	2010-08-17
2	Dhilan	1988-11-12	2016-05-22
0	John	1986-01-06	2016-05-22
3	Bob	1982-12-25	2019-11-06

Conclusion

Given the popularity of the Pandas library, it is hardly surprising that sorting data based on columns is a straightforward process. We taken a look at the flexibility of using the sort_values() method across single and multiple columns, in ascending, descending and even a variable order. Whilst we have focused on sorting by date, this method can be used across multiple data types.

When looking to sort by date in particular, the first, and arguably most important step, is making sure we have correctly assigned the datetime type to our data. Without correctly defining our data type we risk Pandas not recognising our dates at all.

Last Updated: March 3rd, 2021

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Want a remote job?

    Prepping for an interview?

    • Improve your skills by solving one coding problem every day
    • Get the solutions the next morning via email
    • Practice on actual problems asked by top companies, like:
     
     
     

    © 2013-2021 Stack Abuse. All rights reserved.