In this tutorial, you will learn how to work along with Python's os
module.
Introduction
Python is one of the most frequently used languages in recent times for various tasks such as data processing, data analysis, and website building. In this process, there are various tasks that are operating system dependent. Python allows the developer to use several OS-dependent functionalities with the Python module os
. This package abstracts the functionalities of the platform and provides the python functions to navigate, create, delete and modify files and folders. In this tutorial one can expect to learn how to import this package, its basic functionalities and a sample project in python which uses this library for a data merging task.
Some Basic Functions
Let's explore the module with some example code.
Import the library:
import os
Let's get the list of methods that we can use with this module.
print(dir(os))
Output:
['DirEntry', 'F_OK', 'MutableMapping', 'O_APPEND', 'O_BINARY', 'O_CREAT', 'O_EXCL', 'O_NOINHERIT', 'O_RANDOM', 'O_RDONLY', 'O_RDWR', 'O_SEQUENTIAL', 'O_SHORT_LIVED', 'O_TEMPORARY', 'O_TEXT', 'O_TRUNC', 'O_WRONLY', 'P_DETACH', 'P_NOWAIT', 'P_NOWAITO', 'P_OVERLAY', 'P_WAIT', 'PathLike', 'R_OK', 'SEEK_CUR', 'SEEK_END', 'SEEK_SET', 'TMP_MAX', 'W_OK', 'X_OK', '_Environ', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_execvpe', '_exists', '_exit', '_fspath', '_get_exports_list', '_putenv', '_unsetenv', '_wrap_close', 'abc', 'abort', 'access', 'altsep', 'chdir', 'chmod', 'close', 'closerange', 'cpu_count', 'curdir', 'defpath', 'device_encoding', 'devnull', 'dup', 'dup2', 'environ', 'errno', 'error', 'execl', 'execle', 'execlp', 'execlpe', 'execv', 'execve', 'execvp', 'execvpe', 'extsep', 'fdopen', 'fsdecode', 'fsencode', 'fspath', 'fstat', 'fsync', 'ftruncate', 'get_exec_path', 'get_handle_inheritable', 'get_inheritable', 'get_terminal_size', 'getcwd', 'getcwdb', 'getenv', 'getlogin', 'getpid', 'getppid', 'isatty', 'kill', 'linesep', 'link', 'listdir', 'lseek', 'lstat', 'makedirs', 'mkdir', 'name', 'open', 'pardir', 'path', 'pathsep', 'pipe', 'popen', 'putenv', 'read', 'readlink', 'remove', 'removedirs', 'rename', 'renames', 'replace', 'rmdir', 'scandir', 'sep', 'set_handle_inheritable', 'set_inheritable', 'spawnl', 'spawnle', 'spawnv', 'spawnve', 'st', 'startfile', 'stat', 'stat_float_times', 'stat_result', 'statvfs_result', 'strerror', 'supports_bytes_environ', 'supports_dir_fd', 'supports_effective_ids', 'supports_fd', 'supports_follow_symlinks', 'symlink', 'sys', 'system', 'terminal_size', 'times', 'times_result', 'truncate', 'umask', 'uname_result', 'unlink', 'urandom', 'utime', 'waitpid', 'walk', 'write']
Now, using the getcwd
method, we can retrieve the path of the current working directory.
print(os.getcwd())
Output:
C:\Users\hpandya\OneDrive\work\StackAbuse\os_python\os_python\Project
List Folders and Files
Let's list the folders/files in the current directory using listdir
:
print(os.listdir())
Output:
['Data', 'Population_Data', 'README.md', 'tutorial.py', 'tutorial_v2.py']
As you can see, I have 2 folders: Data
and Population_Data
. I also have 3 files: README.md
markdown file, and two Python files namely, tutorial.py
and tutorial_v2.py
.
In order to get the entire tree structure of my project folder, let's write a function and then use os.walk()
to iterate over all the files in each folder of the current directory.
# function to list files in each folder of the current working directory
def list_files(startpath):
for root, dirs, files in os.walk(startpath):
# print(dirs)
if dir!= '.git':
level = root.replace(startpath, '').count(os.sep)
indent = ' ' * 4 * (level)
print('{}{}/'.format(indent, os.path.basename(root)))
subindent = ' ' * 4 * (level + 1)
for f in files:
print('{}{}'.format(subindent, f))
Call this function using the current working directory path, which we saw how to do earlier:
startpath = os.getcwd()
list_files(startpath)
Output:
Project/
README.md
tutorial.py
tutorial_v2.py
Data/
uscitiesv1.4.csv
Population_Data/
Alabama/
Alabama_population.csv
Alaska/
Alaska_population.csv
Arizona/
Arizona_population.csv
Arkansas/
Arkansas_population.csv
California/
California_population.csv
Colorado/
Colorado_population.csv
Connecticut/
Connecticut_population.csv
Delaware/
Delaware_population.csv
...
Note: The output has been truncated for brevity.
As seen from the output, the folders' names are ended with a /
and the files within the folders have been indented four spaces to the right. The Data
folder has one CSV file named uscitiesv1.4.csv
. This file has data about population for each city in the United States. The folder Population_Data
has folders for States, containing separated CSV files for population data for each state, extracted from uscitiesv1.4.csv
.
Change Working Directory
Let's change the working directory and enter into the directory of data with the state of "New York".
os.chdir('Population_Data/New York')
Now let's run the list_files
method again, but in this directory.
list_files(os.getcwd())
Output:
New York/
New York_population.csv
As you can see, we have entered the New York
folder under the Population_Data
folder.
Create Single and Nested Directory Structure
Now, let's create a new directory called testdir
in this directory.
os.mkdir('testdir')
list_files(os.getcwd())
Output:
New York/
New York_population.csv
testdir/
As you can see, it creates the new directory in the current working directory.
Let's create a nested directory with 2 levels.
os.mkdir('level1dir/level2dir')
Output:
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
Traceback (most recent call last):
File "<ipython-input-12-ac5055572301>", line 1, in <module>
os.mkdir('level1dir/level2dir')
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'level1dir/level2dir'
We get an Error. To be specific, we get a FileNotFoundError
. You might wonder, why a FileNotFound
error when we are trying to create a directory.
The reason: the Python module looks for a directory called level1dir
to create the directory level2dir
. Since level1dir
does not exist, in the first place, it throws a FileNotFoundError
.
For purposes like this, the mkdirs()
function is used instead, which can create multiple directories recursively.
os.makedirs('level1dir/level2dir')
Check the current directory tree,
list_files(os.getcwd())
Output:
New York/
New York_population.csv
level1dir/
level2dir/
testdir/
As we can see, now we have two subdirectories under the New York
folder. testdir
and level1dir
. level1dir
has a directory underneath called level2dir
.
Remove Single and Multiple Directories Recursively
The os
module also had methods to modify or remove directories, which I'll show here.
Now, let's remove the directories we just created using rmdir
:
os.rmdir('testdir')
Check the current directory tree to verify that the directory no longer exists:
list_files(os.getcwd())
Output:
New York/
New York_population.csv
level1dir/
level2dir/
As you can see, testdir
has been deleted.
Let's try and delete the nested directory structure of level1dir
and level2dir
.
os.rmdir('level1dir')
Output:
OSError
Traceback (most recent call last)
<ipython-input-14-690e535bcf2c> in <module>()
----> 1 os.rmdir('level1dir')
OSError: [WinError 145] The directory is not empty: 'level1dir'
As seen, this throws an OSError
and rightly so. It says that the level1dir
directory is not empty. That is correct because it has level2dir
underneath it.
With the rmdir
method it is not possible to remove a non-empty directory, similar to the Unix command-line version.
Just like the makedirs()
method, let's try rmdirs()
, which recursively removes directories in a tree structure.
os.removedirs('level1dir/level2dir')
Let's see the directory tree structure again:
list_files(os.getcwd())
Output:
New York/
New York_population.csv
This brings us to the previous state of the directory.
Example with Data Processing
So far we have explored how to view, create, and remove a nested directory structure. Now let's see an example of how the os
module helps in data processing.
For that let's go one level up in the directory structure.
os.chdir('../')
With that, let's again view the directory tree structure.
list_files(os.getcwd())
Output:
Population_Data/
Alabama/
Alabama_population.csv
Alaska/
Alaska_population.csv
Arizona/
Arizona_population.csv
Arkansas/
Arkansas_population.csv
California/
California_population.csv
Colorado/
Colorado_population.csv
Connecticut/
Connecticut_population.csv
Delaware/
Delaware_population.csv
...
Note: The output has been truncated for brevity.
Let's merge the data from all of the states, iterating over the directory of each state and merging the CSV files likewise.
import os
import pandas as pd
# create a list to hold the data from each state
list_states = []
# iteratively loop over all the folders and add their data to the list
for root, dirs, files in os.walk(os.getcwd()):
if files:
list_states.append(pd.read_csv(root+'/'+files[0], index_col=None))
# merge the data frames into a single dataframe using Pandas library
merge_data = pd.concat(list_states[1:], sort=False)
Thanks in part to the os
module we were able to create merge_data
, which is a dataframe containing population data from every state.
Conclusion
In this article, we briefly explored different capabilities of Python's built-in os
module. We also saw a brief example of how the module can be used in the world of Data Science and Analytics. It is important to understand that os
has a lot more to offer, and based on the need of the developer a much more complex logic can be constructed.