Introduction
File manipulation is one of the most important skills to master in any programming language, and doing it correctly is of utmost importance. Making a mistake could cause an issue in your program, other programs running on the same system, and even the system itself.
Possible errors can occur due to the parent directory not existing, or by other programs changing files in the file system at the same time, creating something that is called a race condition.
A race condition (in this case called a data race) occurs when two or more programs want to create a file of the same name in the same place. If this type of bug occurs, it is very hard to find and fix since it is nondeterministic, or put simply, different things can happen depending on the exact timing of the two racers competing for the data.
In this article, we'll see how to create a subdirectory in Python the safe way, step by step. Everything henceforth will work on Mac, Linux, and Windows.
Safely Creating a Nested Directory with pathlib
There are plenty of ways to create a subdirectory, but perhaps the simplest is using the pathlib
module. The pathlib
module is primarily made to help abstract different operating system file systems and provide a uniform interface to work with most of them.
Thanks to it, your code should be platform-independent. Note that this only works on newer versions of Python (3.5 and up).
Let's say we have an absolute path of a directory given to us as a string, and we wish to create a subdirectory with a given name. Let's create a directory called OuterDirectory
, and place InnerDirectory
inside it.
We'll import Path
from the pathlib
module, create a Path
object with the desired path for our new file, and use the mkdir()
method which has the following signature:
Path.mkdir(mode=0o777, parents=False, exist_ok=False)
The following code snippet does what we described above:
from pathlib import Path # Import the module
path = Path("/home/kristina/OuterDirectory/InnerDirectory") # Create Path object
path.mkdir() # Cake the directory
If mkdir()
doesn't succeed, no directory will be made and an error will be raised.
mkdir() Options and Errors
If you run the code without creating the OuterDirectory
, you'll see the following error:
Traceback (most recent call last):
File "makesubdir.py", line 3, in <module>
path.mkdir()
File "/home/kristina/anaconda3/lib/python3.7/pathlib.py", line 1230, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/home/kristina/OuterDirectory/InnerDirectory'
Or if InnerDirectory
already exists:
Traceback (most recent call last):
File "/home/kristina/Desktop/UNM/makesubdir.py", line 3, in <module>
path.mkdir()
File "/home/kristina/anaconda3/lib/python3.7/pathlib.py", line 1230, in mkdir
self._accessor.mkdir(self, mode)
FileExistsError: [Errno 17] File exists: '/home/kristina/OuterDirectory/InnerDirectory'
If a directory already exists, the raised error will be FileExistsError
, and if the parent doesn't exist, the FileNotFoundError
will be raised.
Since we don't want our program breaking whenever it encounters an error like this, we'll place this code in a try block:
from pathlib import Path
path = Path("/home/kristina/OuterDirectory/InnerDir")
try:
path.mkdir()
except OSError:
print("Failed to make nested directory")
else:
print("Nested directory made")
When run, if the directory is successfully made, the output will be:
Nested directory made
If we run into errors, the following will be outputted:
Failed to make a nested directory
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
The mkdir()
method takes three parameters: mode
, parents
, and exit_ok
.
- The
mode
parameter, if given, combined withumask
indicates which users have reading, writing, and executing privileges. By default, all users have all privileges which might not be what we want if security is an issue. We'll touch more on this later. parents
indicates, in the case when the parent directory is missing, should the method:
1. Create the missing parent directory itself (true
)
2. Or to raise an error, like in our second example (false
)exist_ok
specifies if theFileExistsError
should be raised if a directory of the same name already exists. Note that this error will still be raised if the file of the same name is not a directory.
Assigning Access Privileges
Let's make a directory called SecondInnerDirectory
where only the owner has all reading, writing and executing privileges, inside the non-existent SecondOuterDirectory
:
from pathlib import Path
path = Path("/home/kristina/SecondOuterDirectory/SecondInnerDirectory")
path.mkdir(mode = 0o007, parents= True, exist_ok= True)
This should execute without error. If we navigate to the SecondOuterDirectory
and check it's contents from the console like so:
$ ls -al
We should get the output:
total 12
drwxrwxr-x 3 kristina kristina 4096 dec 10 01:26 .
drwxr-xr-x 77 kristina kristina 4096 dec 10 01:26 ..
d------r-x 2 kristina kristina 4096 dec 10 01:26 SecondInnerDirectory
Okay, so we can see that the parent directory was successfully made, but privileges aren't as expected. The owner lacks writing privilege.
The issue we have here is that umask
isn't letting us create the desired privileges. To get around this, we'll save umask
's original value, temporarily change it, and finally, return it to its original value using the umask()
method from the OS module. umask()
returns the old value of umask
.
Let's rewrite our code to test this out:
from pathlib import Path
import os
old_mask = os.umask(0) # Saving the old umask value and setting umask to 0
path = Path("/home/kristina/SecondOuterDirectory/SecondInnerDirectory")
path.mkdir(mode = 0o007, parents= True, exist_ok= True)
os.umask(old_mask) # Reverting umask value
Executing this code, and using the ls -al
command again will result in the following output:
total 12
drwxrwxrwx 3 kristina kristina 4096 dec 10 01:45 .
drwxr-xr-x 77 kristina kristina 4096 dec 10 01:45 ..
d------rwx 2 kristina kristina 4096 dec 10 01:45 SecondInnerDirectory
Conclusion
In order to safely manipulate files on many different systems, we need a robust way to handle errors such as data races. Python offers great support for this through the pathlib
module.
Errors can always occur when working with file systems, and the best way to deal with this is by carefully setting up systems to catch all errors that can potentially crash our program or cause other issues. Writing clean code makes for durable programs.