Introduction
Pickling is a popular method of preserving food. According to Wikipedia, it is also a pretty ancient procedure – although the origins of pickling are unknown, the ancient Mesopotamians probably used the process 4400 years ago. By placing a product in a specific solution, it is possible to drastically increase its shelf life. In other words, it's a method that lets us store food for later consumption.
If you're a Python developer, you might one day find yourself in need of a way to store your Python objects for later use. Well, what if I told you, you can pickle Python objects too?
Serialization
Serialization is a process of transforming objects or data structures into byte streams or strings. A byte stream is, well, a stream of bytes – one byte is composed of 8 bits of zeros and ones. These byte streams can then be stored or transferred easily. This allows the developers to save, for example, configuration data or user's progress, and then store it (on disk or in a database) or send it to another location.
Python objects can also be serialized using a module called Pickle.
One of the main differences between pickling Python objects and pickling vegetables is the inevitable and irreversible change of the pickled food's flavor and texture. Meanwhile, pickled Python objects can be easily un-pickled back to their original form. This process, by the way, is universally known as deserialization.
Pickling (or serialization in general) should not be confused with compression. The purpose of pickling is to translate data into a format that can be transferred from RAM to disk. Compression, on the other hand, is a process of encoding data using fewer bits (in order to save disk space).
Serialization is especially useful in any software where it's important to be able to save some progress on disk, quit the program and then load the progress back after reopening the program. Video games might be the most intuitive example of serialization's usefulness, but there are many other programs where saving and loading a user's progress or data is crucial.
Pickle vs JSON
There is a chance that you have heard of JSON (JavaScript Object Notation), which is a popular format that also lets developers save and transmit objects encoded as strings. This method of serialization has some advantages over pickling. JSON format is human-readable, language-independent, and faster than pickle.
It does have, however, some important limitations as well. Most importantly, by default, only a limited subset of Python built-in types can be represented by JSON. With Pickle, we can easily serialize a very large spectrum of Python types, and, importantly, custom classes. This means we don't need to create a custom schema (like we do for JSON) and write error-prone serializers and parsers. All of the heavy lifting is done for you with Pickle.
What can be Pickled and Unpickled
The following types can be serialized and deserialized using the Pickle module:
- All native data types supported by Python (booleans, None, integers, floats, complex numbers, strings, bytes, byte arrays)
- Dictionaries, sets, lists, and tuples - as long as they contain pickle-able objects
- Functions and classes that are defined at the top level of a module
It is important to remember that pickling is not a language-independent serialization method, therefore your pickled data can only be un-pickled using Python. Moreover, it's important to make sure that objects are pickled using the same version of Python that is going to be used to unpickle them. Mixing Python versions, in this case, can cause many problems.
Additionally, functions are pickled by their name references, and not by their value. The resulting pickle does not contain information on the function's code or attributes. Therefore, you have to make sure that the environment where the function is unpickled is able to import the function. In other words, if we pickle a function and then unpickle it in an environment where it's either not defined or not imported, an exception will be raised.
It is also very important to note that pickled objects can be used in malevolent ways. For instance, un-pickling data from an untrusted source can result in the execution of a malicious piece of code.
Pickling a Python List
The following very simple example shows the basics of using the Pickle module in Python 3:
import pickle
test_list = ['cucumber', 'pumpkin', 'carrot']
with open('test_pickle.pkl', 'wb') as pickle_out:
pickle.dump(test_list, pickle_out)
First, we have to import the pickle
module, which is done in line 1. In line 3 we define a simple, three element list that will be pickled.
In line 5 we state that our output pickle file's name will be test_pickle.pkl
. By using the wb
option, we tell the program that we want to write (w
) binary data (b
) inside of it (because we want to create a byte stream). Note that the pkl
extension is not necessary – we're using it in this tutorial because that's the extension included in Python's documentation.
In line 6 we use the pickle.dump()
method to pickle our test list and store it inside the test_pickle.pkl
file.
I encourage you to try and open the generated pickle file in your text editor. You'll quickly notice that a byte stream is definitely not a human-readable format.
Un-pickling a Python List
Now, let's unpickle the contents of the test pickle file and bring our object back to its original form.
import pickle
with open('test_pickle.pkl', 'rb') as pickle_in:
unpickled_list = pickle.load(pickle_in)
print(unpickled_list)
As you can see, this procedure is not more complicated than when we pickled the object. In line 3 we open our test_pickle.pkl
file again, but this time our goal is to read (r
) the binary data (b
) stored within it.
Next, in line 5, we use the pickle.load()
method to unpickle our list and store it in the unpickled_list
variable.
You can then print the contents of the list to see for yourself that it is identical to the list we pickled in the previous example. Here is the output from running the code above:
$ python unpickle.py
['cucumber', 'pumpkin', 'carrot']
Pickling and Unpickling Custom Objects
As I mentioned before, using Pickle, you can serialize your own custom objects. Take a look at the following example:
import pickle
class Veggy():
def __init__(self):
self.color = ''
def set_color(self, color):
self.color = color
cucumber = Veggy()
cucumber.set_color('green')
with open('test_pickle.pkl', 'wb') as pickle_out:
pickle.dump(cucumber, pickle_out)
with open('test_pickle.pkl', 'rb') as pickle_in:
unpickled_cucumber = pickle.load(pickle_in)
print(unpickled_cucumber.color)
As you can see, this example is almost as simple as the previous one. Between the lines 3 and 7 we define a simple class that contains one attribute and one method that changes this attribute. In line 9 we create an instance of that class and store it in the cucumber
variable, and in line 10 we set its attribute color
to "green".
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
Then, using the exact same functions as in the previous example, we pickle and unpickle our freshly created cucumber
object. Running the code above results in the following output:
$ python unpickle_custom.py
green
Remember, that we can only unpickle the object in an environment where the class Veggy
is either defined or imported. If we create a new script and try to unpickle the object without importing the Veggy
class, we'll get an "AttributeError". For example, execute the following script:
import pickle
with open('test_pickle.pkl', 'rb') as pickle_in:
unpickled_cucumber = pickle.load(pickle_in)
print(unpickled_cucumber.color)
In the output of the script above, you will see the following error:
$ python unpickle_simple.py
Traceback (most recent call last):
File "<pyshell#40>", line 2, in <module>
unpickled_cucumber = pickle.load(pickle_in)
AttributeError: Can't get attribute 'Veggy' on <module '__main__' (built-in)>
Conclusion
As you can see, thanks to the Pickle module, serialization of Python objects is pretty simple. In our examples, we pickled a simple Python list – but you can use the exact same method to save a large spectrum of Python data types, as long as you make sure your objects contain only other pickle-able objects.
Pickling has some disadvantages, the biggest of which might be the fact that you can only unpickle your data using Python – if you need a cross-language solution, JSON is definitely a better option. And finally, remember that pickles can be used to carry the code that you don't necessarily want to execute. Similarly to pickled food, as long as you get your pickles from trusted sources, you should be fine.