How to Unzip a .gz File Using Python
Introduction
Working with compressed files is a common occurrence in programming, and of the various types of compressed files, .gz files are pretty popular in the Unix and Linux world.
In this Byte, we'll explore a few ways to unzip a .gz file using Python. We'll get into Python's gzip
module, read the contents of a .gz file, and more.
What are .gz files?
.gz is the file extension for Gzip files (GNU zip), which is a file compression and decompression tool created by the GNU project. It's often used in Unix and Linux systems for compressing things like log files, data backups, etc.
In Python, we can use the built-in gzip
module to read and write .gz files. This module provides a straightforward and easy-to-use interface for handling .gz files.
Reading Contents of a .gz File
To read the contents of a .gz file in Python, we can use the gzip.open()
function. This opens the .gz file in binary or text mode, just like the built-in Python open()
function. Here's a simple example:
import gzip
with gzip.open('file.txt.gz', 'rt') as f:
file_content = f.read()
print(file_content)
In the above code, file.txt.gz
is the .gz file we want to read, and 'rt'
is the mode in which we want to open the file. The 'r' stands for read mode and 't' stands for text mode. The output will be the content of the .gz file.
Using Python's gzip Module
The gzip
module in Python provides the GzipFile
class which is actually modeled after Python’s File
object. This class has methods and properties similar to the ones available in the File
object, which makes it easier to work with .gz files.
Here's an example of how to use the gzip
module to read a .gz file:
import gzip
with gzip.GzipFile('file.txt.gz', 'r') as f:
file_content = f.read()
print(file_content)
In the above code, we're using the GzipFile
class to open the .gz file. The 'r' mode indicates that we want to open the file for reading. The output will be the content of the .gz file.
Note: While both gzip.open()
and gzip.GzipFile()
can be used to read .gz files, gzip.open()
is more convenient when you want to read the file as text, while gzip.GzipFile()
is better for reading binary files.
Using the sh Module for Unzipping
The sh
module is a full-fledged subprocess interface for Python that allows you to call any program as if it were a function. To unzip a .gz file, we can use the gunzip
command through the sh
module. Let's see how to do it.
First, you need to install the sh
module if you haven't done it yet. You can install it using pip:
$ pip install sh
Once installed, you can use it to unzip a .gz file:
from sh import gunzip
gunzip("/path/to/yourfile.gz")
This script will unzip the yourfile.gz
file in the same directory. If you want to specify a different output directory, you can do so by passing it as a second argument to the gunzip
function:
from sh import gunzip
gunzip("/path/to/yourfile.gz", "-c > /path/to/output/yourfile")
Using the tarfile Module
The tarfile
module makes it possible to read and write tar archives, including those using gzip compression. Here is an example:
import tarfile
with tarfile.open("/path/to/yourfile.tar.gz", "r:gz") as tar:
tar.extractall(path="/path/to/output")
Note: The tarfile
module is only applicable for .tar.gz files. If you're dealing with a simple .gz file, you should use the gzip
module or the sh
module with the gunzip
command.
Conclusion
In this Byte, we have seen how to unzip .gz files in Python using the sh
module, the gzip
module, and the tarfile
module. Each method has its own advantages and use cases, so choose the one that best suits your needs.