Reading Files with Python

Introduction

To work with stored data, file handling becomes the core knowledge of every professional Python programmer. Right from its earliest release, both reading and writing data to files are built-in Python features. In comparison to other programming languages like C or Java, it is pretty simple and only requires a few lines of code. Furthermore, no extra module has to be loaded to do that properly.

In this article we will be explaining how to read files with Python through examples. Some examples include reading a file line-by-line, as a chunk (a defined number of lines at a time), and reading a file in one go. Also, we will show you a way to read a specific line from the file, only, without searching the entire file.

Basics of Files in Python

The common methods to operate with files are open() to open a file, seek() to set the file's current position at the given offset, and close() to close the file object when you're done using it. The built-in open() function returns a file handle that represents a file object to be used to access the file for reading, writing, or appending.

When opening a file for reading, Python needs to know exactly how the file should be opened with the system. Two access modes are available - reading and reading in binary mode. The respective flags used are 'r', and 'rb', and have to be specified when opening a file with the built-in open() function. The first mode includes the interpretation of special characters like "CR" (carriage return) and "LF" (linefeed) to represent line breaks, whereas the binary mode allows you to read the data in raw mode - where the data is stored as is without further interpretation.

Once you've opened a file, the open() function will return a file object to you. These file objects have methods like read(), readline(), write(), tell(), and seek(). While some file objects (or file-like objects) have more methods than those listed here, these are the most common. Not all file objects need to implement all of the file methods.

Reading a File Line-by-Line

The first example is inspired by the two programming languages - C and C++. It is probably the most intuitive approach - open the file using the open() function, read the file line-by-line using the readline() method, and output the line immediately after reading.

In use here is a while loop that continuously reads from the file as long as the readline() method keeps returning data. In case the end of file (EOF) is reached the while loop stops and the file object is closed, freeing up the resources for other programs to use:

# Define the name of the file to read from
filename = "test.txt"

# Open the file for reading
filehandle = open(filename, 'r')

while True:
    # read a single line
    line = filehandle.readline()
    if not line:
        break
    print(line)

# Close the pointer to that file
filehandle.close()

As you may have noted, we have explicitly opened and closed the file in this example. Although the Python interpreter closes the opened files automatically at the end of the execution of the Python program, explicitly closing the file via close() is a good programming style, and should not be forgotten.

As an improvement, the convenient iterator protocol was introduced in Python 2.3. This allows you to simplify the readline loop:

# Define the name of the file to read from
filename = "test.txt"

for line in open(filename, 'r'):
    print(line)

In use here is a for loop in combination with the in iterator. The current line is identified with the help of the in iterator, read from the file, and its content is output to stdout. Python covers the opening and closing of the file for you when it falls out of scope. While inefficient, this allows you to not have to deal with file handles anymore.

Unfortunately, the code above is less explicit and relies on Python's internal garbage collection to handle closing the file.

Introduced in Python 2.5, the with command encapsulates the entire process even more, and also handles opening and closing files just once throughout the scoped code block:

# Define the name of the file to read from
filename = "test.txt"

with open(filename, 'r') as filehandle:
    for line in filehandle:
        print(line)

The combination of the with statement and the open() command opens the file only once. If successful the for loop is executed, and the content of the line is printed on stdout.

Furthermore, the usage of the with statement has a side effect. Internally, the Python interpreter creates a try-finally-block to encapsulate reading from the file. The following example shows what is essentially happening internally in Python with the with code blocks:

try:
    filehandle = open(filename, 'r')
    # Do something...
finally:
    filehandle.close()

Reading a File as Chunks of Lines

Up to now, we have processed a file line by line. This is rather slow for huge files and can be improved by reading multiple lines at the same time. To achieve that, the islice() method from the itertools module comes into play. Also, it works as an iterator and returns a chunk of data that consists of n lines. At the end of the file, the result might be shorter, and finally, the call will return an empty list:

from itertools import islice

# Define the name of the file to read from
filename = "test.txt"

# Define the number of lines to read
number_of_lines = 5

with open(filename, 'r') as input_file:
    lines_cache = islice(input_file, number_of_lines)
   
    for current_line in lines_cache:
        print (current_line)

Reading a Specific Line from a File

Using the methods shown above we can also perform other useful actions, like reading a specific line from a file. To do this, we make use of a counter and print the appropriate line when we come to it while iterating through the file:

# Define the name of the file to read from
filename = "test.txt"

# Define the line number
line_number = 3

print (f"line {line_number} of {filename} is: ")

with open(filename, 'r') as filehandle:
current_line = 1
    for line in filehandle:
        if current_line == line_number:
            print(line)
            break
        current_line += 1

This should be simple to understand, but it's a bit longer than the previous examples. It can be shortened using the linecache module.

The following example shows how to simplify the code using the getline() method. If the requested line number falls out of the range of valid lines in the file, then the getline() method returns an empty string instead:

import linecache

# Define the name of the file to read from
filename = "test.txt"

# Define line_number
line_number = 3

# Retrieve specific line
line = linecache.getline(filename, line_number)

print (f"line {line_number} of {filename}:")
print (line)

Reading the Entire File at Once

Last but not least we will have a look at a very different case than the previous example - reading an entire file in one go.

Keep in mind that in most cases you should have enough memory to read the entire file, since characters don't take up too much space, but be weary of large files. The following example uses a combination of the with statement, and the read() method. In this case, we'll use read() to load the file content as a data stream:

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

# Define the name of the file to read from
filename = "test.txt"

with open(filename, 'r') as filehandle:
    filecontent = filehandle.read()
    print (filecontent)

Python also offers the readlines() method, which is similar to the readline() method from the first example. In contrast to read(), the file content is stored in a list, where each line of the content is an item:

# Define the name of the file to read from
filename = "test.txt"

with open(filename, 'r') as filehandle:
    filecontent = filehandle.readlines()
    for line in filecontent:
        print (line)

While readlines() will read content from the file until it hits EOF, keep in mind that you can also limit the amount of content read by providing the sizehint parameter, which is the number of bytes to read.

Conclusion

As usual, there is more than one way to read the contents of a file. In terms of speed, all of them are more or less in the same category. Which solution works best for you depends on your specific use case. We think it is quite helpful to see what is possible and then to choose the solution that suits best.

While Python greatly simplifies the process of reading files, it can still become tricky at times, in which case I'd recommend you take a look at the official Python documentation for more info.

Last Updated: November 17th, 2023
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Frank HofmannAuthor

IT developer, trainer, and author. Coauthor of the Debian Package Management Book (http://www.dpmb.org/).

Project

Building Your First Convolutional Neural Network With Keras

# python# artificial intelligence# machine learning# tensorflow

Most resources start with pristine datasets, start at importing and finish at validation. There's much more to know. Why was a class predicted? Where was...

David Landup
David Landup
Details
Course

Data Visualization in Python with Matplotlib and Pandas

# python# pandas# matplotlib

Data Visualization in Python with Matplotlib and Pandas is a course designed to take absolute beginners to Pandas and Matplotlib, with basic Python knowledge, and...

David Landup
David Landup
Details

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms