The Pathlib module in Python simplifies the way in working with files and folders. The Pathlib module is available from Python 3.4 and higher versions. It combines the best of Python's file system modules namely os, os.path, glob, etc.
In Python, most of the scripts involve interacting with file systems. Hence, it is important to deal with file names and paths. To achieve this, Python includes the Pathlib module which contains useful functions to perform file-related tasks. Pathlib provides a more readable and easier way to build up paths by representing filesystem paths as proper objects and enables us to write code that is portable across platforms.
In this article, we will study the Pathlib module in detail with the help of various examples.
The Concept of Path and Directory
Before moving further into details of the Pathlib module, it's important to understand 2 different concepts namely - path and directory.
The path is used to identify a file. The path provides an optional sequence of directory names terminated by the final file name including the filename extension. The filename extension provides some information about the file format/ contents. The Pathlib module can deal with absolute as well as relative paths. An absolute path begins from the root directory and specifies the complete directory tree, whereas a relative path, as the name suggests, is the path of a file relative to another file or directory (usually the current directory).
Directory represents the filesystem entry of the path and it includes file name, creation time, size, owner, etc.
The Pathlib module in Python deals with path related tasks, such as constructing new paths from names of files and from other paths, checking for various properties of paths and creating files and folders at specific paths.
How to use the Pathlib Module?
To use the pathlib module conveniently within our scripts, we import all the classes in it using:
from pathlib import *
As a first task, let's retrieve the current working directory and home directory objects, respectively, using the code below:
current_dir = Path.cwd() home_dir = Path.home() print(current_dir) print(home_dir)
We can choose to
import pathlib instead of importing all the classes. In that case, all the subsequent uses of classes within the module should be prefixed with
import pathlib current_dir = pathlib.Path.cwd() home_dir = pathlib.Path.home() print(current_dir) print(home_dir)
Why use the Pathlib Module?
If you've been working with the Python language for a while, you would be wondering what is the necessity of Pathlib module when
glob, etc. modules are already available? This is a fully justified concern. Let's try to address this via an example.
Let's say we want to make a file called "output/output.xlsx" within the current working directory. The following code tries to achieve this using the
os.path module. For this,
os.path.join functions are used.
import os outpath = os.path.join(os.getcwd(), 'output') outpath_file = os.path.join(outpath, 'out.xlsx')
outpath_file = os.pathjoin(os.path.join(os.getcwd(), 'output'), "out.xlsx")
Though the code works, it looks clunky and is not readable nor easy to maintain. Imagine how this code would look if we wanted to create a new file inside multiple nested directories.
The same code can be re-written using Pathlib module, as follows:
from pathlib import Path outpath = Path.cwd() / 'output' / 'output.xlsx'
This format is easier to parse mentally. In Pathlib, the
Path.cwd() function is used to get the current working directory and
/ operator is used in place of
os.path.join to combine parts of the path into a compound path object. The function nesting pattern in the
os.path module is replaced by the
Path class of Pathlib module that represents the path by chaining methods and attributes. The clever overloading of the
/ operator makes the code readable and easy to maintain.
Another benefit of the method provided by the Pathlib module is that a
Path object is created rather than creating a string representation of the path. This object has several handy methods that make life easier than working with raw strings that represent paths.
Performing Operations on Paths
os.path module is used only for manipulating path strings. To do something with the path, for example, creating a directory, we need the
os module. The
os module provides a set of functions for working with files and directories, like:
mkdir for creating a directory,
rename to rename a directory,
getsize to get the size of a directory and so on.
Let's write some of these operations using the
os module and then rewrite the same code using the Pathlib module.
Sample code written using
if os.path.isdir(path): os.rmdir(path)
If we use Pathlib module's
path objects to achieve the same functionality, the resulting code will be much more readable and easier to maintain as shown below:
if path.is_dir() path.rmdir()
It is cumbersome to find path related utilies in the
os module. The Pathlib module solves the problem by replacing the utilities of
os module with methods on path objects. Let us understand it even better with a code:
outpath = os.path.join(os.getcwd(), 'output') outpath_tmp = os.path.join(os.getcwd(), 'output.tmp') generate_data(output_tmp) if os.path.getsize(output_tmp): os.rename(outpath_tmp, outpath) else: # Nothing produced os.remove(outpath_tmp)
Here, the function
generate_data() takes a file path as a parameter and writes data to another path. However, if the file that is passed as a parameter is not changed, since the last time the
generate_data() function was executed, an empty file is generated. In that case, the empty file is replaced with the previous version of the file.
outpath stores the data by joining the current working directory with the filename "output". We create a temp version, as well, named as
outpath.tmp. If the size of temp version is not zero, which implies that it is not an empty file, then the temp version is renamed to
outpath, otherwise the temp version is removed and the old version is retained.
os module, manipulating paths of filesystems as string objects become clumsy as there are multiple calls to
os.getcwd(), etc. To avoid this problem, the Pathlib module offers a set of classes that can be used for frequently used operations on the path, in a more readable, simple, object-oriented way.
Let's try to re-write the above code using Pathlib module.
from pathlib import Path outpath = Path.cwd() / 'output' outpath_tmp = Path.cwd() / 'output_tmp' generate_data(output_tmp) if outpath_tmp.stat().st_size: outpath_tmp.rename(outpath) else: # Nothing produced Path_tmp.unlink()
Using Pathlib, os.getcwd() becomes Path.cwd() and '/' operator is used to join paths and used in place of os.path.join. Using the Pathlib module, things can be done in a simpler way using operators and method calls.
Following are commonly used methods and it's usage:
Path.cwd(): Return path object representing the current working directory
Path.home(): Return path object representing the home directory
Path.stat(): return info about the path
Path.chmod(): change file mode and permissions
Path.glob(pattern): Glob the pattern given in the directory that is represented by the path, yielding matching files of any kind
Path.mkdir(): to create a new directory at the given path
Path.open(): To open the file created by the path
Path.rename(): Rename a file or directory to the given target
Path.rmdir(): Remove the empty directory
Path.unlink(): Remove the file or symbolic link
Generating Cross-Platform Paths
Paths use different conventions in different Operating Systems. Windows uses a backslash between folder names, whereas all other popular Operating Systems uses forward slash between folder names. If you want your python code to work, irrespective of the underlying OS, you'll need to handle the different conventions specific to the underlying platform. The Pathlib module makes working with file paths easier. In Pathlib, you can just pass a path or filename to
Path() object using forward slash, irrespective of the OS. Pathlib handles the rest.
pathlib.Path.home() / 'python' / 'samples' / 'test_me.py'
Path() object will covert the
/ to the apt kind of slash, for the underlying Operating System. The
pathlib.Path may represent either Windows or Posix path. Thus, Pathlib solves a lot of cross-functional bugs, by handling paths easily.
Getting Path Information
While dealing with paths, we are interested in finding the parent directory of a file/folder or in following symbolic links. Path class has several convenient methods for doing this, as different parts of a path are available as properties that include the following:
drive: a string that represents the drive name. For example,
PureWindowsPath('c:/Program Files/CSV').drivereturns "C:"
parts: returns a tuple that provides access to the path's components
name: the path component without any directory
parent: sequence providing access to the logical ancestors of the path
stem: final path component without its suffix
suffix: the file extension of the final component
anchor: the part of a path before the directory.
/is used to create child paths and mimics the behavior of
joinpath: combines the path with the arguments provided
match(pattern): returns True/False, based on matching the path with the glob-style pattern provided
In path "/home/projects/stackabuse/python/sample.md":
path: - returns PosixPath('/home/projects/stackabuse/python/sample.md')
path.parts: - returns ('/', 'home', 'projects', 'stackabuse', 'python')
path.name: - returns 'sample.md'
path.stem: - returns 'sample'
path.suffix: - returns '.md'
path.parent: - returns PosixPath('/home/projects/stackabuse/python')
path.parent.parent: - returns PosixPath('/home/projects/stackabuse')
path.match('*.md'): returns True
PurePosixPath('/python').joinpath('edited_version'): returns ('home/projects/stackabuse/python/edited_version
Alternative of the Glob Module
glob module is also available in Python that provides file path related utils.
glob.glob function of the
glob module is used to find files matching a pattern.
from glob import glob top_xlsx_files = glob('*.xlsx') all_xlsx_files = glob('**/*.xlsx', recursive=True)
The pathlib provides glob utilities, as well:
from pathlib import Path top_xlsx_files = Path.cwd().glob('*.xlsx') all_xlsx_files = Path.cwd().rglob('*.xlsx')
The glob functionality is available with
Path objects. Thus, pathlib modules make complex tasks simpler.
Reading and Writing Files using Pathlib
The following methods are used to perform basic operations like reading and writing files:
read_text: File is opened in text mode to read the contents of the file and close it after reading
read_bytes: Used to open the file in binary mode and return contents in binary form and closes the file after the same.
write_text: Used to open the file and writes text and closes it later
write_bytes: Used to write binary data to a file and closes the file, once done
Let's explore the usage of the Pathlib module for common file operations. The following example is used to read the contents of a file:
path = pathlib.Path.cwd() / 'Pathlib.md' path.read_text()
read_text method on
Path object is used to read the contents of the file.
Below example is used to write data to a file, in text mode:
from pathlib import Path p = Path('sample_text_file') p.write_text('Sample to write data to a file')
Thus, in the Pathlib module, by having the path as an object enables us to perform useful actions on the objects for file system involving lots of path manipulation like creating or removing directories, looking for specific files, moving files etc.
To conclude, the Pathlib module provides a huge number of rich and useful features that can be used to perform a variety of path related operations. As an added advantage the library is consistent across the underlying Operating System.