Reading Files with Python

To work with stored data, file handling belongs to the core knowledge of every professional Python programmer. Right from its earliest release, both reading and writing data to files are built-in Python features. In comparison to other programming languages like C or Java it is pretty simple and only requires a few lines of code. Furthermore, no extra module has to be loaded to do that properly.

Basics of Files in Python

The common methods to operate with files are open() to open a file, seek() to set the file's current position at the given offset, and close() to close the file object when you're done using it. The open() method returns a file handle that represents a file object to be used to access the file for reading, writing, or appending.

When opening a file for reading, Python needs to know exactly how the file should be opened with the system. Two access modes are available - reading, and reading in binary mode. The respective flags used are r, and rb, and have to be specified when opening a file with the built-in open() method. The first mode includes the interpretation of special characters like "CR" (carriage return) and "LF" (linefeed) to represent line-breaks, whereas the binary mode allows you to read the data in raw mode - where the data is stored as is without further interpretation.

Once you've opened a file, the open() method will return a file object to you. These file objects have methods like read(), readline(), write(), tell(), and seek(). While some file objects (or file-like objects) have more methods than those listed here, these are the most common. Not all file objects need to implement all of the file methods.

Examples

In this article we will be explaining how to read files with Python through examples. Some examples include reading a file line-by-line, as a chunk (a defined number of line at a time), and reading a file in one go. Also, we will show you a way to read a specific line from the file, only, without searching the entire file.

Reading a File Line by Line

The first example is inspired by the two programming languages C, and C++. It is pretty simple, open the file using the open() method, read the file line by line using the readline() method, and outputting the line immediately after reading. In use here is a while loop that continuously reads from the file as long as the readline() method keeps returning data. In case the end of file (EOF) is reached the while loop stops and the file object is closed, freeing up the resources for other programs to use.

# define the name of the file to read from
filename = "test.txt"

# open the file for reading
filehandle = open(filename, 'r')  
while True:  
    # read a single line
    line = filehandle.readline()
    if not line:
        break
    print(line)

# close the pointer to that file
filehandle.close()  

Listing 1

As you may have noted in Listing 1 we have explicitly opened and closed the file (lines 5 and 14, respectively). Although the Python interpreter closes the opened files automatically at the end of the execution of the Python program, explicitly closing the file via close() is good programming style, and should not be forgotten.

As an improvement, in Python 2.3 the convenient iterator protocol was introduced. This allows you to simplify the readline loop as follows:

# define the name of the file to read from
filename = "test.txt"

for line in open(filename, 'r'):  
    print(line)

Listing 2

In use here is a for loop in combination with the in iterator. The file is opened in line 4 of Listing 2. The current line is identified with the help of the in iterator, read from the file, and its content is output to stdout in line 5. Python covers opening and closing the file for you when it falls out of scope. While inefficient, this allows you to not have to deal with file handles any more.

Unfortunately the code above is less explicit and relies on Python's internal garbage collection to handle closing the file. Introduced in Python 2.5, the with command encapsulates the entire process even more, and also handles opening and closing files just once throughout the scoped code block. Listing 3 shows how to use the with command.

# define the name of the file to read from
filename = "test.txt"

with open(filename, 'r') as filehandle:  
    for line in filehandle:
        print(line)

Listing 3

The combination of the with statement and the open() command opens the file only once (line 4). If successful the for loop is executed, and the content of the line is printed on stdout (lines 5 and 6).

Furthermore, the usage of the with statement has a side effect. Internally, the Python interpreter creates a try-finally-block to encapsulate reading from the file. Listing 4 shows what is essentially happening internally in Python with the with code blocks:

try:  
    filehandle = open(filename, 'r')
    # do something
finally:  
    filehandle.close()

Listing 4

Reading a File as Chunks of Lines

Up to now we have processed a file line by line. This is rather slow for huge files, and can be improved by reading multiple lines at the same time. To achieve that, the islice() method from the itertools module comes into play. Also, it works as an iterator, and returns a chunk of data that consists of n lines. At the end of the file, the result might be shorter, and finally the call will return an empty list.

from itertools import islice

# define the name of the file to read from
filename = "test.txt"

# define the number of lines to read
number_of_lines = 5

with open(filename, 'r') as input_file:  
    lines_cache = islice(input_file, number_of_lines)

    for current_line in lines_cache:
        print (current_line)

Listing 5

Reading a Specific Line from a File

Using the methods shown above we can also perform other useful actions, like reading a specific line from a file. To do this, we make use of a counter and print the appropriate line when we come to it while iterating through the file.

# define the name of the file to read from
filename = "test.txt"

# define the line number
line_number = 3

print ("line %i of %s is: " % (line_number, filename))

with open(filename, 'r') as filehandle:  
current_line = 1  
    for line in filehandle:
        if current_line == line_number:
            print(line)
            break
        current_line += 1

Listing 6

Listing 6 should be simple to understand, but it's a bit longer than the previous examples. It can be shortened using the linecache module. Listing 7 shows how to simplify the code using the getline() method. If the requested line number falls out of the range of valid lines in the file, then the getline() method returns an empty string instead.

# import linecache module
import linecache

# define the name of the file to read from
filename = "test.txt"

# define line_number
line_number = 3

# retrieve specific line
line = linecache.getline(filename, line_number)  
print ("line %i of %s:" % (line_number, filename))  
print (line)  

Listing 7

Reading the Entire File at Once

Last but not least we will have a look at a very different case than the previous example - reading an entire file in one go. Keep in mind that in most cases you should have enough space on your computer to read the entire file in to memory. Listing 8 uses a combination of the the with statement, and the read() method. In this case we'll use read() to load the file content as a data stream.

# define the name of the file to read from
filename = "test.txt"

with open(filename, 'r') as filehandle:  
    filecontent = filehandle.read()
    print (filecontent)

Listing 8

Python also offers the readlines() method, which is similar to the readline() method from the first example. In contrast to read(), the file content is stored in a list, where each line of the content is an item. Listing 9 shows how to access that data:

# define the name of the file to read from
filename = "test.txt"

with open(filename, 'r') as filehandle:  
    filecontent = filehandle.readlines()
    for line in filecontent:
        print (line)

Listing 9

While readlines() will read content from the file until it hits EOF, keep in mind that you can also limit the amount of content read by providing the sizehint parameter, which is the number of bytes to read.

Conclusion

As usual there is more than one way to read the contents of a file. In terms of speed, all of them are more or less in the same category. As for which solution works best for you depends on your specific use case. We think it is quite helpful to see what is possible and then to choose the solution that suits best.

While Python greatly simplifies the process of reading files, it can still become tricky at times, in which case I'd recommend you take a look at the official Python documentation for more info.

Resources

Acknowledgements

The author would like to thank Zoleka Hatitongwe for her support while preparing the article.

Author image
Berlin -- Genève -- Cape Town Twitter Github
IT developer, trainer, and author. Coauthor of the Debian Package Management Book (http://www.dpmb.org/).