Pickling is a popular method of preserving food. According to Wikipedia, it is also a pretty ancient procedure – although the origins of pickling are unknown, the ancient Mesopotamians probably used the process 4400 years ago. By placing a product in a specific solution, it is possible to drastically increase its shelf life. In other words, it's a method that lets us store food for later consumption.
If you're a Python developer, you might one day find yourself in need of a way to store your Python objects for later use. Well, what if I told you, you can pickle Python objects too?
Serialization is a process of transforming objects or data structures into byte streams or strings. A byte stream is, well, a stream of bytes – one byte is composed of 8 bits of zeros and ones. These byte streams can then be stored or transferred easily. This allows the developers to save, for example, configuration data or user's progress, and then store it (on disk or in a database) or send it to another location.
Python objects can also be serialized using a module called Pickle.
One of the main differences between pickling Python objects and pickling vegetables is the inevitable and irreversible change of the pickled food's flavor and texture. Meanwhile, pickled Python objects can be easily unpickled back to their original form. This process, by the way, is universally known as deserialization.
Pickling (or serialization in general) should not be confused with compression. The purpose of pickling is to translate data into a format that can be transferred from RAM to disk. Compression, on the other hand, is a process of encoding data using fewer bits (in order to save disk space).
Serialization is especially useful in any software where it's important to be able to save some progress on disk, quit the program and then load the progress back after reopening the program. Video games might be the most intuitive example of serialization's usefulness, but there are many other programs where saving and loading a user's progress or data is crucial.
Pickle vs JSON
It does have, however, some important limitations as well. Most importantly, by default, only a limited subset of Python built-in types can be represented by JSON. With Pickle, we can easily serialize a very large spectrum of Python types, and, importantly, custom classes. This means we don't need to create a custom schema (like we do for JSON) and write error-prone serializers and parsers. All of the heavy liftings is done for you with Pickle.
What can be Pickled and Unpickled
The following types can be serialized and deserialized using the Pickle module:
- All native datatypes supported by Python (booleans, None, integers, floats, complex numbers, strings, bytes, byte arrays)
- Dictionaries, sets, lists, and tuples - as long as they contain pickleable objects
- Functions and classes that are defined at the top level of a module
It is important to remember that pickling is not a language-independent serialization method, therefore your pickled data can only be unpickled using Python. Moreover, it's important to make sure that objects are pickled using the same version of Python that is going to be used to unpickle them. Mixing Python versions, in this case, can cause many problems.
Additionally, functions are pickled by their name references, and not by their value. The resulting pickle does not contain information on the function's code or attributes. Therefore, you have to make sure that the environment where the function is unpickled is able to import the function. In other words, if we pickle a function and then unpickle it in an environment where it's either not defined or not imported, an exception will be raised.
It is also very important to note that pickled objects can be used in malevolent ways. For instance, unpickling data from an untrusted source can result in the execution of a malicious piece of code.
Pickling a Python List
The following very simple example shows the basics of using the Pickle module in Python 3:
import pickle test_list = ['cucumber', 'pumpkin', 'carrot'] with open('test_pickle.pkl', 'wb') as pickle_out: pickle.dump(test_list, pickle_out)
First, we have to import the
pickle module, which is done in line 1. In line 3 we define a simple, three element list that will be pickled.
In line 5 we state that our output pickle file's name will be
test_pickle.pkl. By using the
wb option, we tell the program that we want to write (
w) binary data (
b) inside of it (because we want to create a byte stream). Note that the
pkl extension is not necessary – we're using it in this tutorial because that's the extension included in Python's documentation.
In line 6 we use the
pickle.dump() method to pickle our test list and store it inside the
I encourage you to try and open the generated pickle file in your text editor. You'll quickly notice that a byte stream is definitely not a human-readable format.
Unpickling a Python List
Now, let's unpickle the contents of the test pickle file and bring our object back to its original form.
import pickle with open('test_pickle.pkl', 'rb') as pickle_in: unpickled_list = pickle.load(pickle_in) print(unpickled_list)
As you can see, this procedure is not more complicated than when we pickled the object. In line 3 we open our
test_pickle.pkl file again, but this time our goal is to read (
r) the binary data (
b) stored within it.
Next, in line 5, we use the
pickle.load() method to unpickle our list and store it in the
You can then print the contents of the list to see for yourself that it is identical to the list we pickled in the previous example. Here is the output from running the code above:
$ python unpickle.py ['cucumber', 'pumpkin', 'carrot']
Pickling and Unpickling Custom Objects
As I mentioned before, using Pickle, you can serialize your own custom objects. Take a look at the following example:
import pickle class Veggy(): def __init__(self): self.color = '' def set_color(self, color): self.color = color cucumber = Veggy() cucumber.set_color('green') with open('test_pickle.pkl', 'wb') as pickle_out: pickle.dump(cucumber, pickle_out) with open('test_pickle.pkl', 'rb') as pickle_in: unpickled_cucumber = pickle.load(pickle_in) print(unpickled_cucumber.color)
As you can see, this example is almost as simple as the previous one. Between the lines 3 and 7 we define a simple class that contains one attribute and one method that changes this attribute. In line 9 we create an instance of that class and store it in the
cucumber variable, and in line 10 we set its attribute
color to "green".
Then, using the exact same functions as in the previous example, we pickle and unpickle our freshly created
cucumber object. Running the code above results in the following output:
$ python unpickle_custom.py green
Remember, that we can only unpickle the object in an environment where the class
Veggy is either defined or imported. If we create a new script and try to unpickle the object without importing the
Veggy class, we'll get an "AttributeError". For example, execute the following script:
import pickle with open('test_pickle.pkl', 'rb') as pickle_in: unpickled_cucumber = pickle.load(pickle_in) print(unpickled_cucumber.color)
In the output of the script above, you will see the following error:
$ python unpickle_simple.py Traceback (most recent call last): File "<pyshell#40>", line 2, in <module> unpickled_cucumber = pickle.load(pickle_in) AttributeError: Can't get attribute 'Veggy' on <module '__main__' (built-in)>
As you can see, thanks to the Pickle module, serialization of Python objects is pretty simple. In our examples, we pickled a simple Python list – but you can use the exact same method to save a large spectrum of Python data types, as long as you make sure your objects contain only other pickleable objects.
Pickling has some disadvantages, the biggest of which might be the fact that you can only unpickle your data using Python – if you need a cross-language solution, JSON is definitely a better option. And finally, remember that pickles can be used to carry the code that you don't necessarily want to execute. Similarly to pickled food, as long as you get your pickles from trusted sources, you should be fine.