Introduction
Python programmers intensively use arrays, lists, and dictionaries as serialized data structures. Storing these data structures persistently requires either a file or a database to properly work.
In this article, we'll take a look at how to write a list to file, and how to read that list back into memory.
To write data in a file, and to read data from a file, the Python programming language offers the standard methods write()
and read()
for dealing with a single line, as well as writelines()
and readlines()
for dealing with multiple lines. Furthermore, both the pickle
and the json
modules allow clever ways of dealing with serialized data sets as well.
Using the read() and write() Methods
To deal with characters (strings) the basic read()
and write()
methods work excellently. Saving such a list line by line into the file listfile.txt
can be done as follows:
# Define a list of places
places = ['Berlin', 'Cape Town', 'Sydney', 'Moscow']
with open('listfile.txt', 'w') as filehandle:
for listitem in places:
filehandle.write(f'{listitem}\n')
The listitem
is extended by a line break "\n"
, firstly, and then stored into the output file. Now we can take a look at how to read the entire list from the file listfile.txt
back into memory:
# Define an empty list
places = []
# Open the file and read the content in a list
with open('listfile.txt', 'r') as filehandle:
for line in filehandle:
# Remove linebreak which is the last character of the string
curr_place = line[:-1]
# Add item to the list
places.append(curr_place)
Keep in mind that you'll need to remove the line break from the end of the string. In this case, it helps us that Python allows list operations on strings, too. This removal is simply done as a list operation on the string itself, which keeps everything but the last element. This element contains the character "\n"
that represents the line break on UNIX/Linux systems.
Using the writelines() and readlines() Methods
As mentioned at the beginning of this article, Python also contains the two methods - writelines()
and readlines()
- to write and read multiple lines in one step, respectively. Let's write the entire list to a file on disk:
# Define a list of places
places_list = ['Berlin', 'Cape Town', 'Sydney', 'Moscow']
with open('listfile.txt', 'w') as filehandle:
filehandle.writelines(f"{place for place in places_list}\n")
To read the entire list from a file on disk we need to:
# Define an empty list
places = []
# Open the file and read the content in a list
with open('listfile.txt', 'r') as filehandle:
filecontents = filehandle.readlines()
for line in filecontents:
# Remove linebreak which is the last character of the string
curr_place = line[:-1]
# Add item to the list
places.append(curr_place)
The code above follows a more traditional approach borrowed from other programming languages. Let's write it in a more Pythonic way:
# Define an empty list
places = []
# Open the file and read the content in a list
with open('listfile.txt', 'r') as filehandle:
places = [current_place.rstrip() for current_place in filehandle.readlines()]
Firstly, the file content is read via readlines()
. Secondly, in a for
loop from each line the line break character is removed using the rstrip()
method. Thirdly, the string is added to the list of places as a new list item.
In comparison with the listing before the code is much more compact, but may be more difficult to read for beginner Python programmers.
Using the Joblib Module
The initial methods explained up to now store the list in a way that humans can still read it - quite literally a sequential list in a file. This is great for creating simple reports or outputting export files for users to further use, such as CSV files. However - if your aim is to just serialize a list into a file, that can be loaded later, there's no need to store it in a human-readable format.
The joblib
module provides the easiest way to dump a Python object (can be any object really):
import joblib
places = ['Berlin', 'Cape Town', 'Sydney', 'Moscow']
# Dumps into file
joblib.dump(places, 'places.sav')
# Loads from file
places = joblib.load('places.sav')
print(places) # ['Berlin', 'Cape Town', 'Sydney', 'Moscow']
joblib
remains the simplest and cleanest way to serialize objects in an efficient format, and load them later. You can use any arbitrary format, such as .sav
, .data
, etc. It doesn't really matter - both joblib
and alternatives like pickle
will read the files just fine.
Using the pickle Module
As an alternative to joblib
, we can use pickle
! Its dump()
method stores the list efficiently as a binary data stream. Firstly, the output file listfile.data
is opened for binary writing ("wb"
). Secondly, the list is stored in the opened file using the dump()
method:
import pickle
places = ['Berlin', 'Cape Town', 'Sydney', 'Moscow']
with open('listfile.data', 'wb') as filehandle:
# Store the data as a binary data stream
pickle.dump(places, filehandle)
As the next step, we read the list from the file as follows. Firstly, the output file listfile.data
is opened binary for reading ("rb"
). Secondly, the list of places is loaded from the file using the load()
method:
import pickle
with open('listfile.data', 'rb') as filehandle:
# Read the data as a binary data stream
placesList = pickle.load(filehandle)
The two examples here demonstrate the usage of strings. Although, pickle
works with all kinds of Python objects such as strings, numbers, self-defined structures, and every other built-in data structure Python provides.
Advice: For a detailed guide on pickling objects in general, read our "How to Pickle and Unpickle Objects in Python"!
Using the JSON Format
The binary data format pickle
uses is specific to Python. To improve the interoperability between different programs the JavaScript Object Notation (JSON) provides an easy-to-use and human-readable schema, and thus became very popular for serializing files and sharing them over APIs.
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
The following example demonstrates how to write a list of mixed variable types to an output file using the json module. Having opened the output file for writing, the dump()
method stores the basic list in the file using the JSON notation:
import json
# Define list with values
basic_list = [1, "Cape Town", 4.6]
# Open output file for writing
with open('listfile.txt', 'w') as filehandle:
json.dump(basic_list, filehandle)
Reading the contents of the output file back into memory is as simple as writing the data. The corresponding method to dump()
is named load()
:
import json
# Open output file for reading
with open('listfile.txt', 'r') as filehandle:
basic_list = json.load(filehandle)
Conclusion
Different methods we've shown above range from simple writing/reading data up to dumping/loading data via binary streams using pickle and JSON. This simplifies storing a list persistently and reading it back into memory.