When dealing with large amounts of data or files, you might find yourself needing to compress files into a more manageable format. One of the best ways to do this is by creating a zip archive.
In this article, we'll be exploring how you can create a zip archive of a directory using Python. Whether you're looking to save space, simplify file sharing, or just keep things organized, Python's
zipfile module provides a to do this.
Creating a Zip Archive with Python
Python's standard library comes with a module named
zipfile that provides methods for creating, reading, writing, appending, and listing contents of a ZIP file. This module is useful for creating a zip archive of a directory. We'll start by importing the
import zipfile import os
Now, let's create a function that will zip a directory:
def zip_directory(directory_path, zip_path): with zipfile.ZipFile(zip_path, 'w') as zipf: for root, dirs, files in os.walk(directory_path): for file in files: zipf.write(os.path.join(root, file), os.path.relpath(os.path.join(root, file), os.path.join(directory_path, '..')))
In this function, we first open a new zip file in write mode. Then, we walk through the directory we want to zip. For each file in the directory, we use the
write() method to add it to the zip file. The
os.path.relpath() function is used so that we store the relative path of the file in the zip file, instead of the absolute path.
Let's test our function:
After running this code, you should see a new file named
archive.zip in your current directory. This zip file contains all the files from
Note: Be careful when specifying the paths. If the
zip_path file already exists, it will be overwritten.
zipfile module makes it easy to create a zip archive of a directory. With just a few lines of code, you can compress and organize your files.
In the following sections, we'll dive deeper into handling nested directories, large directories, and error handling. This may seem a bit backwards, but the above function is likely what most people came here for, so I wanted to show it first.
Using the zipfile Module
In Python, the
zipfile module is the best tool for working with zip archives. It provides functions to read, write, append, and extract data from zip files. The module is part of Python's standard library, so there's no need to install anything extra.
Here's a simple example of how you can create a new zip file and add a file to it:
import zipfile # Create a new zip file zip_file = zipfile.ZipFile('example.zip', 'w') # Add a file to the zip file zip_file.write('test.txt') # Close the zip file zip_file.close()
In this code, we first import the
zipfile module. Then, we create a new zip file named 'example.zip' in write mode ('w'). We add a file named 'test.txt' to the zip file using the
write() method. Finally, we close the zip file using the
Creating a Zip Archive of a Directory
Creating a zip archive of a directory involves a bit more work, but it's still fairly easy with the
zipfile module. You need to walk through the directory structure, adding each file to the zip archive.
import os import zipfile def zip_directory(folder_path, zip_file): for folder_name, subfolders, filenames in os.walk(folder_path): for filename in filenames: # Create complete filepath of file in directory file_path = os.path.join(folder_name, filename) # Add file to zip zip_file.write(file_path) # Create a new zip file zip_file = zipfile.ZipFile('example_directory.zip', 'w') # Zip the directory zip_directory('/path/to/directory', zip_file) # Close the zip file zip_file.close()
We first define a function
zip_directory() that takes a folder path and a
ZipFile object. It uses the
os.walk() function to iterate over all files in the directory and its subdirectories. For each file, it constructs the full file path and adds the file to the zip archive.
os.walk() function is a convenient way to traverse directories. It generates the file names in a directory tree by walking the tree either top-down or bottom-up.
Note: Be careful with the file paths when adding files to the zip archive. The
write() method adds files to the archive with the exact path you provide. If you provide an absolute path, the file will be added with the full absolute path in the zip archive. This is usually not what you want. Instead, you typically want to add files with a relative path to the directory you're zipping.
In the main part of the script, we create a new zip file, call the
zip_directory() function to add the directory to the zip file, and finally close the zip file.
Working with Nested Directories
When working with nested directories, the process of creating a zip archive is a bit more complicated. The first function we showed in this article actually handles this case as well, which we'll show again here:
import os import zipfile def zipdir(path, ziph): for root, dirs, files in os.walk(path): for file in files: ziph.write(os.path.join(root, file), os.path.relpath(os.path.join(root, file), os.path.join(path, '..'))) zipf = zipfile.ZipFile('Python.zip', 'w', zipfile.ZIP_DEFLATED) zipdir('/path/to/directory', zipf) zipf.close()
The main difference is that we're actually creating the zip directory outside of the function and pass it as a parameter. Whether you do it within the function itself or not is up to personal preference.
Handling Large Directories
So what if we're dealing with a large directory? Zipping a large directory can consume a lot of memory and even crash your program if you don't take the right precautions.
zipfile module allows us to create a zip archive without loading all files into memory at once. By using the
with statement, we can ensure that each file is closed and its memory freed after it's added to the archive.
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
import os import zipfile def zipdir(path, ziph): for root, dirs, files in os.walk(path): for file in files: with open(os.path.join(root, file), 'r') as fp: ziph.write(fp.read()) with zipfile.ZipFile('Python.zip', 'w', zipfile.ZIP_DEFLATED) as zipf: zipdir('/path/to/directory', zipf)
In this version, we're using the
with statement when opening each file and when creating the zip archive. This will guarantee that each file is closed after it's read, freeing up the memory it was using. This way, we can safely zip large directories without running into memory issues.
Error Handling in zipfile
When working with
zipfile in Python, we need to remember to handle exceptions so our program doesn't crash unexpectedly. The most common exceptions you might encounter are
Let's take a look at how we can handle these exceptions while creating a zip file:
import zipfile try: with zipfile.ZipFile('example.zip', 'w') as myzip: myzip.write('non_existent_file.txt') except FileNotFoundError: print('The file you are trying to zip does not exist.') except RuntimeError as e: print('An unexpected error occurred:', str(e)) except zipfile.LargeZipFile: print('The file is too large to be compressed.')
FileNotFoundError is raised when the file we're trying to zip doesn't exist.
RuntimeError is a general exception that might be raised for a number of reasons, so we print the exception message to understand what went wrong.
zipfile.LargeZipFile is raised when the file we're trying to compress is too big.
zipfile module raises a
LargeZipFile error when the file you're trying to compress is larger than 2 GB. If you're working with large files, you can prevent this error by calling
ZipFile with the
Common Errors and Solutions
While working with the
zipfile module, you might encounter several common errors. Let's explore some of these errors and their solutions:
This error happens when the file or directory you're trying to zip does not exist. So always check if the file or directory exists before attempting to compress it.
This error is raised when you're trying to write a directory to a zip file using
ZipFile.write(). To avoid this, use
os.walk() to traverse the directory and write the individual files instead.
As you probably guessed, this error happens when you don't have the necessary permissions to read the file or write to the directory. Make sure you have the correct permissions before trying to manipulate files or directories.
As mentioned earlier, this error is raised when the file you're trying to compress is larger than 2 GB. To prevent this error, call
ZipFile with the
try: with zipfile.ZipFile('large_file.zip', 'w', allowZip64=True) as myzip: myzip.write('large_file.txt') except zipfile.LargeZipFile: print('The file is too large to be compressed.')
In this snippet, we're using the
allowZip64=True argument to allow zipping files larger than 2 GB.
Compressing Individual Files
zipfile, not only can it compress directories, but it can also compress individual files. Let's say you have a file called
document.txt that you want to compress. Here's how you'd do that:
import zipfile with zipfile.ZipFile('compressed_file.zip', 'w') as myzip: myzip.write('document.txt')
In this code, we're creating a new zip archive named
compressed_file.zip and just adding
document.txt to it. The 'w' parameter means that we're opening the zip file in write mode.
Now, if you check your directory, you should see a new zip file named
Extracting Zip Files
And finally, let's see how to reverse this zipping by extracting the files. Let's say we want to extract the
document.txt file we just compressed. Here's how to do it:
import zipfile with zipfile.ZipFile('compressed_file.zip', 'r') as myzip: myzip.extractall()
In this code snippet, we're opening the zip file in read mode ('r') and then calling the
extractall() method. This method extracts all the files in the zip archive to the current directory.
Note: If you want to extract the files to a specific directory, you can pass the directory path as an argument to the
extractall() method like so:
Now, if you check your directory, you should see the
document.txt file. That's all there is to it!
In this guide, we focused on creating and managing zip archives in Python. We explored the
zipfile module, learned how to create a zip archive of a directory, and even dove into handling nested directories and large directories. We've also covered error handling within zipfile, common errors and their solutions, compressing individual files, and extracting zip files.