The Pathlib module in Python simplifies the way in working with files and folders. The Pathlib module is available from Python 3.4 and higher versions. It combines the best of Python's file system modules namely os, os.path, glob, etc.
In Python, most of the scripts involve interacting with file systems. Hence, it is important to deal with file names and paths. To achieve this, Python includes the Pathlib module which contains useful functions to perform file-related tasks. Pathlib provides a more readable and easier way to build up paths by representing file system paths as proper objects and enables us to write code that is portable across platforms.
In this article, we will study the Pathlib module in detail with the help of various examples.
The Concept of Path and Directory
Before moving further into details of the Pathlib module, it's important to understand 2 different concepts namely - path and directory.
The path is used to identify a file. The path provides an optional sequence of directory names terminated by the final file name including the filename extension. The filename extension provides some information about the file format/ contents. The Pathlib module can deal with absolute as well as relative paths. An absolute path begins from the root directory and specifies the complete directory tree, whereas a relative path, as the name suggests, is the path of a file relative to another file or directory (usually the current directory).
Directory represents the filesystem entry of the path and it includes file name, creation time, size, owner, etc.
The Pathlib module in Python deals with path related tasks, such as constructing new paths from names of files and from other paths, checking for various properties of paths and creating files and folders at specific paths.
How to use the Pathlib Module?
To use the pathlib
module conveniently within our scripts, we import all the classes in it using:
from pathlib import *
As a first task, let's retrieve the current working directory and home directory objects, respectively, using the code below:
current_dir = Path.cwd()
home_dir = Path.home()
print(current_dir)
print(home_dir)
We can choose to import pathlib
instead of importing all the classes. In that case, all the subsequent uses of classes within the module should be prefixed with pathlib
.
import pathlib
current_dir = pathlib.Path.cwd()
home_dir = pathlib.Path.home()
print(current_dir)
print(home_dir)
Why use the pathlib
Module?
If you've been working with the Python language for a while, you would be wondering what is the necessity of the pathlib
module when modules like os
, os.path
, glob
, etc. are already available? This is a fully justified concern. Let's try to address this via an example.
Let's say we want to make a file called output/output.xlsx
within the current working directory. The following code tries to achieve this using the os.path
module. For this, os.getcwd
and os.path.join
functions are used.
import os
outpath = os.path.join(os.getcwd(), 'output')
outpath_file = os.path.join(outpath, 'out.xlsx')
Alternately,
outpath_file = os.pathjoin(os.path.join(os.getcwd(), 'output'), "out.xlsx")
Though the code works, it looks clunky and is not readable nor easy to maintain. Imagine how this code would look if we wanted to create a new file inside multiple nested directories.
The same code can be re-written using the pathlib
module, as follows:
from pathlib import Path
outpath = Path.cwd() / 'output' / 'output.xlsx'
This format is easier to parse mentally. In Pathlib, the Path.cwd()
function is used to get the current working directory and /
operator is used in place of os.path.join
to combine parts of the path into a compound path object. The function nesting pattern in the os.path
module is replaced by the Path
class of Pathlib module that represents the path by chaining methods and attributes. The clever overloading of the /
operator makes the code readable and easy to maintain.
Another benefit of the method provided by the Pathlib module is that a Path
object is created rather than creating a string representation of the path. This object has several handy methods that make life easier than working with raw strings that represent paths.
Performing Operations on Paths
The classic os.path
module is used only for manipulating path strings. To do something with the path, for example, creating a directory, we need the os
module. The os
module provides a set of functions for working with files and directories, like: mkdir
for creating a directory, rename
to rename a directory, getsize
to get the size of a directory and so on.
Let's write some of these operations using the os
module and then rewrite the same code using the Pathlib module.
Sample code written using os
module:
if os.path.isdir(path):
os.rmdir(path)
If we use Pathlib module's path
objects to achieve the same functionality, the resulting code will be much more readable and easier to maintain as shown below:
if path.is_dir()
path.rmdir()
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
It is cumbersome to find path related utilities in the os
module. The Pathlib module solves the problem by replacing the utilities of the os
module with methods on path objects. Let us understand it even better with a code:
outpath = os.path.join(os.getcwd(), 'output')
outpath_tmp = os.path.join(os.getcwd(), 'output.tmp')
generate_data(output_tmp)
if os.path.getsize(output_tmp):
os.rename(outpath_tmp, outpath)
else: # Nothing produced
os.remove(outpath_tmp)
Here, the function generate_data()
takes a file path as a parameter and writes data to another path. However, if the file that is passed as a parameter is not changed, since the last time the generate_data()
function was executed, an empty file is generated. In that case, the empty file is replaced with the previous version of the file.
The variable outpath
stores the data by joining the current working directory with the filename "output". We create a temp version, as well, named as outpath.tmp
. If the size of the temp version is not zero, which implies that it is not an empty file, then the temp version is renamed to outpath
, otherwise the temp version is removed and the old version is retained.
Using the os
module, manipulating paths of file systems as string objects becomes clumsy as there are multiple calls to os.path.join()
, os.getcwd()
, etc. To avoid this problem, the Pathlib module offers a set of classes that can be used for frequently used operations on the path, in a more readable, simple, object-oriented way.
Let's try to rewrite the above code using the pathlib
module.
from pathlib import Path
outpath = Path.cwd() / 'output'
outpath_tmp = Path.cwd() / 'output_tmp'
generate_data(output_tmp)
if outpath_tmp.stat().st_size:
outpath_tmp.rename(outpath)
else: # Nothing produced
Path_tmp.unlink()
Using Pathlib, os.getcwd()
becomes Path.cwd()
and the '/' operator is used to join paths and used in place of os.path.join
. Using the pathlib
module, things can be done in a simpler way using operators and method calls.
Following are commonly used methods and it's usage:
Path.cwd()
: Return path object representing the current working directoryPath.home()
: Return path object representing the home directoryPath.stat()
: return info about the pathPath.chmod()
: change file mode and permissionsPath.glob(pattern)
: Glob the pattern given in the directory that is represented by the path, yielding matching files of any kindPath.mkdir()
: to create a new directory at the given pathPath.open()
: To open the file created by the pathPath.rename()
: Rename a file or directory to the given targetPath.rmdir()
: Remove the empty directoryPath.unlink()
: Remove the file or symbolic link
Generating Cross-Platform Paths
Paths use different conventions in different Operating Systems. Windows uses a backslash between folder names, whereas all other popular Operating Systems use forward slash between folder names. If you want your python code to work, irrespective of the underlying OS, you'll need to handle the different conventions specific to the underlying platform. The Pathlib module makes working with file paths easier. In Pathlib, you can just pass a path or filename to Path()
object using forward slash, irrespective of the OS. Pathlib handles the rest.
pathlib.Path.home() / 'python' / 'samples' / 'test_me.py'
The Path()
object will convert the /
to the apt kind of slash, for the underlying Operating System. The pathlib.Path
may represent either Windows or Posix path. Thus, Pathlib solves a lot of cross-functional bugs, by handling paths easily.
Getting Path Information
While dealing with paths, we are interested in finding the parent directory of a file/folder or in following symbolic links. Path class has several convenient methods for doing this, as different parts of a path are available as properties that include the following:
drive
: a string that represents the drive name. For example,PureWindowsPath('c:/Program Files/CSV').drive
returns "C:"parts
: returns a tuple that provides access to the path's componentsname
: the path component without any directoryparent
: sequence providing access to the logical ancestors of the pathstem
: final path component without its suffixsuffix
: the file extension of the final componentanchor
: the part of a path before the directory./
is used to create child paths and mimics the behavior ofos.path.join
.joinpath
: combines the path with the arguments providedmatch(pattern)
: returns True/False, based on matching the path with the glob-style pattern provided
In path /home/projects/stackabuse/python/sample.md
:
path
: - returns PosixPath('/home/projects/stackabuse/python/sample.md')path.parts
: - returns ('/', 'home', 'projects', 'stackabuse', 'python')path.name
: - returns 'sample.md'path.stem
: - returns 'sample'path.suffix
: - returns '.md'path.parent
: - returns PosixPath('/home/projects/stackabuse/python')path.parent.parent
: - returns PosixPath('/home/projects/stackabuse')path.match('*.md')
: returns TruePurePosixPath('/python').joinpath('edited_version')
: returns ('home/projects/stackabuse/python/edited_version
Alternative of the Glob Module
Apart from os
, os.path
modules, glob
module is also available in Python that provides file path related utilities. glob.glob
function of the glob
module is used to find files matching a pattern.
from glob import glob
top_xlsx_files = glob('*.xlsx')
all_xlsx_files = glob('**/*.xlsx', recursive=True)
The pathlib
provides glob utilities, as well:
from pathlib import Path
top_xlsx_files = Path.cwd().glob('*.xlsx')
all_xlsx_files = Path.cwd().rglob('*.xlsx')
The glob functionality is available with Path
objects. Thus, pathlib
modules make complex tasks simpler.
Reading and Writing Files using pathlib
The following methods are used to perform basic operations like reading and writing files:
read_text
: File is opened in text mode to read the contents of the file and close it after readingread_bytes
: Used to open the file in binary mode and return contents in binary form and closes the file after the same.write_text
: Used to open the file and writes text and closes it laterwrite_bytes
: Used to write binary data to a file and closes the file, once done
Let's explore the usage of the Pathlib module for common file operations. The following example is used to read the contents of a file:
path = pathlib.Path.cwd() / 'Pathlib.md'
path.read_text()
Here, read_text
method on the Path
object is used to read the contents of the file.
Below example is used to write data to a file, in text mode:
from pathlib import Path
p = Path('sample_text_file') p.write_text('Sample to write data to a file')
Thus, in the pathlib
module, having the path as an object enables us to perform useful actions on the objects for the file system involving lots of path manipulation like creating or removing directories, looking for specific files, moving files etc.
Conclusion
To conclude, the Pathlib module provides a huge number of rich and useful features that can be used to perform a variety of path related operations. As an added advantage the library is consistent across the underlying Operating System.