How to Unzip a .gz File Using Python

Introduction

Working with compressed files is a common occurrence in programming, and of the various types of compressed files, .gz files are pretty popular in the Unix and Linux world.

In this Byte, we'll explore a few ways to unzip a .gz file using Python. We'll get into Python's gzip module, read the contents of a .gz file, and more.

What are .gz files?

.gz is the file extension for Gzip files (GNU zip), which is a file compression and decompression tool created by the GNU project. It's often used in Unix and Linux systems for compressing things like log files, data backups, etc.

In Python, we can use the built-in gzip module to read and write .gz files. This module provides a straightforward and easy-to-use interface for handling .gz files.

Reading Contents of a .gz File

To read the contents of a .gz file in Python, we can use the gzip.open() function. This opens the .gz file in binary or text mode, just like the built-in Python open() function. Here's a simple example:

import gzip

with gzip.open('file.txt.gz', 'rt') as f:
    file_content = f.read()

print(file_content)

In the above code, file.txt.gz is the .gz file we want to read, and 'rt' is the mode in which we want to open the file. The 'r' stands for read mode and 't' stands for text mode. The output will be the content of the .gz file.

Using Python's gzip Module

The gzip module in Python provides the GzipFile class which is actually modeled after Python’s File object. This class has methods and properties similar to the ones available in the File object, which makes it easier to work with .gz files.

Here's an example of how to use the gzip module to read a .gz file:

import gzip

with gzip.GzipFile('file.txt.gz', 'r') as f:
    file_content = f.read()

print(file_content)

In the above code, we're using the GzipFile class to open the .gz file. The 'r' mode indicates that we want to open the file for reading. The output will be the content of the .gz file.

Note: While both gzip.open() and gzip.GzipFile() can be used to read .gz files, gzip.open() is more convenient when you want to read the file as text, while gzip.GzipFile() is better for reading binary files.

Using the sh Module for Unzipping

The sh module is a full-fledged subprocess interface for Python that allows you to call any program as if it were a function. To unzip a .gz file, we can use the gunzip command through the sh module. Let's see how to do it.

Get free courses, guided projects, and more

No spam ever. Unsubscribe anytime. Read our Privacy Policy.

First, you need to install the sh module if you haven't done it yet. You can install it using pip:

$ pip install sh

Once installed, you can use it to unzip a .gz file:

from sh import gunzip

gunzip("/path/to/yourfile.gz")

This script will unzip the yourfile.gz file in the same directory. If you want to specify a different output directory, you can do so by passing it as a second argument to the gunzip function:

from sh import gunzip

gunzip("/path/to/yourfile.gz", "-c > /path/to/output/yourfile")

Using the tarfile Module

The tarfile module makes it possible to read and write tar archives, including those using gzip compression. Here is an example:

import tarfile

with tarfile.open("/path/to/yourfile.tar.gz", "r:gz") as tar:
    tar.extractall(path="/path/to/output")

Note: The tarfile module is only applicable for .tar.gz files. If you're dealing with a simple .gz file, you should use the gzip module or the sh module with the gunzip command.

Conclusion

In this Byte, we have seen how to unzip .gz files in Python using the sh module, the gzip module, and the tarfile module. Each method has its own advantages and use cases, so choose the one that best suits your needs.

Last Updated: August 14th, 2023
Was this helpful?

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms