To work with stored data, file handling belongs to the core knowledge of every professional Python programmer. Right from its earliest release, both reading and writing data to files are built-in Python features. In comparison to other programming languages like C or Java it is pretty simple and only requires a few lines of code. Furthermore, no extra module has to be loaded to do that properly.
Basics of Files in Python
The common methods to operate with files are
open() to open a file,
seek() to set the file's current position at the given offset, and
close() to close the file object when you're done using it. The
open() method returns a file handle that represents a file object to be used to access the file for reading, writing, or appending.
When opening a file for reading, Python needs to know exactly how the file should be opened with the system. Two access modes are available - reading, and reading in binary mode. The respective flags used are
rb, and have to be specified when opening a file with the built-in
open() method. The first mode includes the interpretation of special characters like "CR" (carriage return) and "LF" (linefeed) to represent line-breaks, whereas the binary mode allows you to read the data in raw mode - where the data is stored as is without further interpretation.
Once you've opened a file, the
open() method will return a file object to you. These file objects have methods like
seek(). While some file objects (or file-like objects) have more methods than those listed here, these are the most common. Not all file objects need to implement all of the file methods.
In this article we will be explaining how to read files with Python through examples. Some examples include reading a file line-by-line, as a chunk (a defined number of line at a time), and reading a file in one go. Also, we will show you a way to read a specific line from the file, only, without searching the entire file.
Reading a File Line by Line
The first example is inspired by the two programming languages C, and C++. It is pretty simple, open the file using the
open() method, read the file line by line using the
readline() method, and outputting the line immediately after reading. In use here is a
while loop that continuously reads from the file as long as the
readline() method keeps returning data. In case the end of file (EOF) is reached the
while loop stops and the file object is closed, freeing up the resources for other programs to use.
# define the name of the file to read from filename = "test.txt" # open the file for reading filehandle = open(filename, 'r') while True: # read a single line line = filehandle.readline() if not line: break print(line) # close the pointer to that file filehandle.close()
As you may have noted in Listing 1 we have explicitly opened and closed the file (lines 5 and 14, respectively). Although the Python interpreter closes the opened files automatically at the end of the execution of the Python program, explicitly closing the file via
close() is good programming style, and should not be forgotten.
As an improvement, in Python 2.3 the convenient iterator protocol was introduced. This allows you to simplify the
readline loop as follows:
# define the name of the file to read from filename = "test.txt" for line in open(filename, 'r'): print(line)
In use here is a
for loop in combination with the
in iterator. The file is opened in line 4 of Listing 2. The current line is identified with the help of the
in iterator, read from the file, and its content is output to
stdout in line 5. Python covers opening and closing the file for you when it falls out of scope. While inefficient, this allows you to not have to deal with file handles any more.
Unfortunately the code above is less explicit and relies on Python's internal garbage collection to handle closing the file. Introduced in Python 2.5, the
with command encapsulates the entire process even more, and also handles opening and closing files just once throughout the scoped code block. Listing 3 shows how to use the
# define the name of the file to read from filename = "test.txt" with open(filename, 'r') as filehandle: for line in filehandle: print(line)
The combination of the
with statement and the
open() command opens the file only once (line 4). If successful the
for loop is executed, and the content of the line is printed on
stdout (lines 5 and 6).
Furthermore, the usage of the
with statement has a side effect. Internally, the Python interpreter creates a
finally-block to encapsulate reading from the file. Listing 4 shows what is essentially happening internally in Python with the
with code blocks:
try: filehandle = open(filename, 'r') # do something finally: filehandle.close()
Reading a File as Chunks of Lines
Up to now we have processed a file line by line. This is rather slow for huge files, and can be improved by reading multiple lines at the same time. To achieve that, the
islice() method from the itertools module comes into play. Also, it works as an iterator, and returns a chunk of data that consists of
n lines. At the end of the file, the result might be shorter, and finally the call will return an empty list.
from itertools import islice # define the name of the file to read from filename = "test.txt" # define the number of lines to read number_of_lines = 5 with open(filename, 'r') as input_file: lines_cache = islice(input_file, number_of_lines) for current_line in lines_cache: print (current_line)
Reading a Specific Line from a File
Using the methods shown above we can also perform other useful actions, like reading a specific line from a file. To do this, we make use of a counter and print the appropriate line when we come to it while iterating through the file.
# define the name of the file to read from filename = "test.txt" # define the line number line_number = 3 print ("line %i of %s is: " % (line_number, filename)) with open(filename, 'r') as filehandle: current_line = 1 for line in filehandle: if current_line == line_number: print(line) break current_line += 1
Listing 6 should be simple to understand, but it's a bit longer than the previous examples. It can be shortened using the linecache module. Listing 7 shows how to simplify the code using the
getline() method. If the requested line number falls out of the range of valid lines in the file, then the
getline() method returns an empty string instead.
# import linecache module import linecache # define the name of the file to read from filename = "test.txt" # define line_number line_number = 3 # retrieve specific line line = linecache.getline(filename, line_number) print ("line %i of %s:" % (line_number, filename)) print (line)
Reading the Entire File at Once
Last but not least we will have a look at a very different case than the previous example - reading an entire file in one go. Keep in mind that in most cases you should have enough space on your computer to read the entire file in to memory. Listing 8 uses a combination of the the
with statement, and the
read() method. In this case we'll use
read() to load the file content as a data stream.
# define the name of the file to read from filename = "test.txt" with open(filename, 'r') as filehandle: filecontent = filehandle.read() print (filecontent)
Python also offers the
readlines() method, which is similar to the
readline() method from the first example. In contrast to
read(), the file content is stored in a list, where each line of the content is an item. Listing 9 shows how to access that data:
# define the name of the file to read from filename = "test.txt" with open(filename, 'r') as filehandle: filecontent = filehandle.readlines() for line in filecontent: print (line)
readlines() will read content from the file until it hits EOF, keep in mind that you can also limit the amount of content read by providing the
sizehint parameter, which is the number of bytes to read.
As usual there is more than one way to read the contents of a file. In terms of speed, all of them are more or less in the same category. As for which solution works best for you depends on your specific use case. We think it is quite helpful to see what is possible and then to choose the solution that suits best.
While Python greatly simplifies the process of reading files, it can still become tricky at times, in which case I'd recommend you take a look at the official Python documentation for more info.
- How to read a file properly in Python, https://www.smallsurething.com/how-to-read-a-file-properly-in-python/
- Processing large files using python, http://www.blopig.com/blog/2016/08/processing-large-files-using-python/
- Python itertools module, https://docs.python.org/3.6/library/itertools.html
- Python linecache module, https://docs.python.org/3.6/library/linecache.html
The author would like to thank Zoleka Hatitongwe for her support while preparing the article.