Introduction
Converting an object into a savable state (such as a byte stream, textual representation, etc) is called serialization, whereas deserialization converts data from the aforementioned format back to an object. A serialized format retains all the information required to reconstruct an object in memory, in the same state as it was when serialized.
In this guide, you will learn how to serialize and deserialize data in Python with the Pickle module. We'll additionally be working with data that's been serialized/deserialized, with Pandas.
Note: Some knowledge of file handling operations is recommended if you are a complete beginner to the Python programming language. If so - read our Guide to Saving Text, JSON and CSV to a File in Python.
What is Pickling And Unpickling in Python?
Python comes with a built-in package, known as pickle
, that can be used to perform “pickling” and “unpickling” operations.
Pickling and unpickling in Python is the process that is used to describe the conversion of objects into byte streams and vice versa - serialization and deserialization, using Python's pickle
module. Let's take a look at a few examples!
Note: The pickle
module is available in the standard library from Python 3.x onward.
Consider the following code that prints the contents of a dictionary:
import pickle
athletes = {
"Name": ["Cristiano Ronaldo", "Lionel Messi", "Eden Hazard", "Luis Suarez", "Neymar"],
"Club": ["Manchester United", "PSG", "Real Madrid", "Atletico Madrid", "PSG"]
}
print(athletes)
This would result in:
{
'Name': ['Cristiano Ronaldo', 'Lionel Messi', 'Eden Hazard', 'Luis Suarez', 'Neymar'],
'Club': ['Manchester United', 'PSG', 'Real Madrid', 'Atletico Madrid', 'PSG']
}
Let's try to "pickle" the athletes
object to a binary file. We can do this with the dump()
function. It takes two parameters - the object being “pickled” and a File
object to write the data to. The following code “pickles” the data to a new file athletes.txt
that will be created in the same directory the script is running in:
athletes_file = open('athletes.txt', 'wb')
pickle.dump(athletes, athletes_file)
athletes_file.close()
Note: The mode that we have used in the file handling is "wb"
which is used to write binary files. Since we are converting the object to byte stream we will use the "b"
with each mode while handling files.
The created file's contents can't be viewed by a regular text editor because it's binary data and isn't meant to be stored in a human-readable format. To read this information, we'll have to “unpickle” or deserialize this data. We can do this with the load()
function!
The load()
function reads the contents of a “pickled” file and returns the object constructed by reading the data. The type of object as well as its state depend on the contents of the file. Since we've saved a dictionary with athlete names - this object with the same entries is reconstructed. Let's read the “pickled” file you just created back to a Python object and print its contents:
import pickle
athletes_file = open("athletes.txt", "rb")
athletes = pickle.load(athletes_file)
athletes_file.close()
print(athletes)
This results in:
{'Name': ['Cristiano Ronaldo', 'Lionel Messi', 'Eden Hazard', 'Luis Suarez', 'Neymar'], 'Club': ['Manchester United', 'PSG', 'Real Madrid', 'Atletico Madrid', 'PSG']}
As you see, we get back all the data that was “pickled”.
Note: Just like how we used "wb"
to write binary data, we used the "rb"
mode during file handling to read binary data.
Now that we've covered the process of “pickling” and “unpickling” in Python, let's read “pickled” files so that we can put their contents in a Pandas DataFrame
!
How to Read a Pickle File in a Pandas DataFrame?
We'll be using the same data as we did in the earlier examples. First, ensure that you have the Pandas library installed:
$ pip install pandas
Now let's start by converting the objects into a Python DataFrame:
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
import pickle
import pandas as pd
athletes = {
"Name": ["Cristiano Ronaldo", "Lionel Messi", "Eden Hazard", "Luis Suarez", "Neymar"],
"Club": ["Manchester United", "PSG", "Real Madrid", "Atletico Madrid", "PSG"]
}
df = pd.DataFrame(athletes)
print(df)
This results in:
Name Club
0 Cristiano Ronaldo Manchester United
1 Lionel Messi PSG
2 Eden Hazard Real Madrid
3 Luis Suarez Atletico Madrid
4 Neymar PSG
As you can see in the output, we will get a Pandas DataFrame object with 3 columns and 6 rows including the indices. After this, the process is similar to how we handled the normal, non-DataFrame objects. We will use file handling along with the dump()
and load()
methods to first create a “pickle” file from a Pandas DataFrame
, and then read the byte stream to get the Pandas DataFrame
:
# ...
df = pd.DataFrame(athletes)
athelets_df_file = open("athletes_df.txt", "wb")
pickle.dump(df, athelets_df_file)
athelets_df_file.close()
The above code will create a “pickle” file that will store the Pandas DataFrame as a byte stream in our current directory as athletes_df.txt
.
When we want to use this DataFrame again, we can just “unpickle” this file to get it back:
import pickle
athletes_df_file = open("athletes_df.txt", "rb")
athletes = pickle.load(athletes_df_file)
athletes_df_file.close()
print(athletes)
This results in:
Name Club
0 Cristiano Ronaldo Manchester United
1 Lionel Messi PSG
2 Eden Hazard Real Madrid
3 Luis Suarez Atletico Madrid
4 Neymar PSG
That's the awesome thing about “pickled” files! We don't just get the contents stored in the DataFrame
object when we load it, we get the DataFrame
object itself. Without these capabilities, it's common for programmers to save the data in an accessible format like JSON, and then load the JSON data into a new object to use it.
Pickling and unpickling save us from using intermediate data formats and creating methods to load our data.
Pickling into Strings and Unpickling from Strings
It's good to know that the pickle
module also provides us with dumps()
and loads()
methods as well. These methods will “pickle” and “unpickle” Python objects, but instead of using a binary file to store data they return and accept string data.
Let's take a look at a simple example to understand how the dumps()
and loads()
methods work in Python:
import pickle
simple_obj = {1: ['o', 'n', 'e'], "two": (1, 2), 3: "Three"}
pickled_obj = pickle.dumps(simple_obj)
print(pickled_obj)
This results in:
b'\x80\x04\x95-\x00\x00\x00\x00\x00\x00\x00}\x94(K\x01]\x94(\x8c\x01o\x94\x8c\x01n\x94\x8c\x01e\x94e\x8c\x03two\x94K\x01K\x02\x86\x94K\x03\x8c\x05Three\x94u.'
As you can see in the output, the binary string is returned instead of a “pickled” file that was created with the dump()
method. We can take this string and load the object in a new variable:
out = pickle.loads(obj)
print(out)
This results in:
{1: ['o', 'n', 'e'], 'two': (1, 2), 3: 'Three'}
These two methods facilitate easier transfer between Python-based applications, and you could very well send “pickled” data through APIs. More commonly, though, for Python-based web applications, you'll be using JSON-serialized objects instead.
Conclusion
In this article, we learned about the “pickling” and “unpickling” operations in Python that are useful to store your objects for later use. Methods like load()
, loads()
, dump()
, dumps()
are provided by the built-in pickle
module to convert Python objects to and from byte streams.
Creating and loading the data to and from a Pandas DataFrame
object can be done easily using the pickle
module in Python.
Note that “pickling” and “unpickling” are not recommended if you are planning to use the objects in other programming languages as the module does not guarantee cross-programming compatibility.