Parse Datetime Strings with parsedatetime in Python

Introduction

In this tutorial, we'll take a look at how to parse Datetime with parsedatetime in Python.

To use the parsedatetime package we first need to install it using pip:

$ pip install parsedatetime

Should pip install parsedatetime fail, the package is also open-source and available on Github.

Convert String to Python's Datetime Object with parsedatetime

The first, and most common way to use parsedatetime is to parse a string into a datetime object. First, you'll want to import the parsedatetime library, and instantiate a Calendar object, which does the actual input, parsing and manipulation of dates:

import parsedatetime
calendar = parsedatetime.Calendar()

Now we can call the parse() method of the calendar instance with a string as an argument. You can put in regular datetime-formatted strings, such as 1-1-2021 or human-readable values such as tomorrow, yesterday, next year, last week, lunch tomorrow, etc... We can also use 'End of Day' structures with tomorrow eod

Let's convert a datetime and human-readable string to a datetime object using parsedatetime:

import parsedatetime
from datetime import datetime

calendar = parsedatetime.Calendar()

print(calendar.parse('tomorrow'))
print(calendar.parse('1-1-2021'))

This results in two printed tuples:

(time.struct_time(tm_year=2021, tm_mon=3, tm_mday=19, tm_hour=9, tm_min=0, tm_sec=0, tm_wday=4, tm_yday=78, tm_isdst=-1), 1)
(time.struct_time(tm_year=2021, tm_mon=1, tm_mday=1, tm_hour=18, tm_min=5, tm_sec=14, tm_wday=3, tm_yday=77, tm_isdst=0), 1)

This isn't very human-readable... The returned tuple for each conversion consists of the struct_time object, which contains information like the year, month, day of month, etc. The second value is the status code - an integer denoting how the conversion went.

0 means unsuccessful parsing, 1 means successful parsing to a date, 2 means successful parsing to a time and 3 means successful parsing to a datetime.

Let's parse this output:

print(calendar.parse('tomorrow')[0].tm_mday)
print(calendar.parse('1-1-2021')[0].tm_mday)

This code results in:

19
1

Then again, we're only getting the day of the month here. Usually, we'd like to output something similar to a YYYY-mm-dd HH:mm:ss format, or any variation of that.

Thankfully, we can easily use the time.struct_time result and generate a regular Python datetime with it:

import parsedatetime
from datetime import datetime

calendar = parsedatetime.Calendar()

time_structure_tomorrow, parse_status_tomorrow = calendar.parse('tomorrow')
time_structure_2021, parse_status_2021 = calendar.parse('1-1-2021')

print(datetime(*time_structure_tomorrow[:6]))
print(datetime(*time_structure_2021[:6]))

The datetime() constructor doesn't need all of the information from the time structure provided by parsedatetime, so we sliced it.

This code results in:

2021-03-19 09:00:00
2021-01-01 18:11:06

Keep in mind that the datetime on the 1st of January took the time of execution into consideration.

Handling Timezones

Sometimes, your application might have to take the timezones of your end-users into consideration. For timezone-support, we usually use the Pytz package, though, you can use other packages as well.

Let's install Pytz via pip:

$ pip install pytz

Now, we can import the parsedatetime and pytz packages into a script, and create a standard Calendar instance:

import parsedatetime
import pytz
from pytz import timezone

calendar = parsedatetime.Calendar()

Let's take a look at the supported timezones, by printing out all_timezones:

print(pytz.all_timezones)

This code will result in a huge list of all available timezones:

['Africa/Abidjan', 'Africa/Accra', 'Africa/Addis_Ababa', 'Africa/Algiers', ...]

Let's chose one of these, such as the first one, and pass it in as the tzinfo argument of Calendar's parseDT() function. Other than that, we'll want to supply a datetimeString argument, which is the actual string we want to parse:

datetime_object, status = calendar.parseDT(datetimeString='tomorrow', tzinfo=timezone('Africa/Abidjan'))

This method returns a tuple of a Datetime object, and the status code of the conversion, which is an integer - 1 meaning "successful", and 0 meaning "unsucessful".

Let's go ahead and print the datetime_object:

print(datetime_object)

This code results in:

2021-03-16 09:00:00+00:00

Calendar.parseDate()

While Calendar.parse() is a general-level parsing method, that returns a tuple with the status code and time.struct_time, the parseDate() method is a method dedicated to short-form string dates, and simply returns a human-readable result:

import parsedatetime
calendar = parsedatetime.Calendar()

result = calendar.parseDate('5/5/91')
print(result)

The result now contains the calculated struct_time value of the date we've passed in:

(1991, 5, 5, 14, 31, 18, 0, 74, 0)

But, what do we do when we want to parse the 5th of May 2077? We can try to run the following code:

import parsedatetime
calendar = parsedatetime.Calendar()
result = calendar.parseDate('5/5/77')
print(result)

However, this code will result in:

(1977, 5, 5, 14, 36, 21, 0, 74, 0)

Calendar.parseDate() mistook the short-form date, for a more realistic 1977. We can solve this in two ways:

  • Simply specify the full year - 2077:
import parsedatetime
calendar = parsedatetime.Calendar()
result = calendar.parseDate('5/5/2077')
print(result)
  • Use a BirthdayEpoch:
import parsedatetime
constants = parsedatetime.Constants()
constants.BirthdayEpoch = 80

# Pass our new constants to the Calendar
calendar = parsedatetime.Calendar(constants)
result = calendar.parseDate('5/5/77')
print(result)

This code will result in:

(2077, 5, 5, 14, 39, 47, 0, 74, 0)

You can access the contants of the parsedatetime library through the Constants object. Here, we've set the BirthdayEpoch to 80.

BirthdayEpoch controls how the package handles two-digit years, such as 77. If the parsed value is lesser than the value we've set for the BirthdayEpoch - it'll add the parsed value to 2000. Since we've set the BirthdayEpoch to 80, and parsed 77, it converts it to 2077.

Otherwise, it'll add the parsed value to 1900.

Calendar.parseDateText()

Another alternative to dealing with the issue of mistaken short-form dates is to, well, use long-form dates. For long-form dates, you can use the parseDateText() method:

import parsedatetime

result2 = calendar.parseDateText('May 5th, 1991')
print(result2)

This code will result in:

(1991, 5, 5, 14, 31, 46, 0, 74, 0)

Using Locales

Finally, we can use parsedatetime with locale information. The locale information comes from either PyICU or the previously used Constants class.

The Constants inner class has a lot of attributes, just like the BirthdayEpoch attribute. Two of these are localeID and userPyICU.

Let's try setting the localeId to Spanish and set the usePyICU to False since we won't use it:

import parsedatetime

constants = parsedatetime.Constants(localeID='es', usePyICU=False)
calendar = parsedatetime.Calendar(constants)
result, code = calendar.parse('Marzo 28')
print(result)

This results in:

(time.struct_time(tm_year=2021, tm_mon=3, tm_mday=28, tm_hour=15, tm_min=0, tm_sec=5, tm_wday=0, tm_yday=74, tm_isdst=0), 1)

The method returns a struct_time, so we can easily convert it into a datetime:

print(datetime(*result[:6]))

This results in:

2021-03-28 22:08:40

Conclusion

In this tutorial, we've gone over several ways to parse datetime using the parsedatetime package in Python.

We went over the conversion between strings and datetime objects through parsedatetime, as well as handling timezones with pytz and locales, using the Constants instance of the parsedatetime library.