Introduction to Python OS Module

In this tutorial, you will learn how to work along with Python's os module.

Table of Contents

  1. Introduction
  2. Basic Functions
  3. List Files / Folders in Current Working Directory
  4. Change working Directory
  5. Create Single and Nested Directory Structure
  6. Remove Single and Nested Directory Structure Recursively
  7. Example with Data Processing
  8. Conclusion

Introduction

Python is one of the most frequently used languages in recent times for various tasks such as data processing, data analysis, and website building. In this process, there are various tasks that are operating system dependent. Python allows the developer to use several OS-dependent functionalities with the Python module os. This package abstracts the functionalities of the platform and provides the python functions to navigate, create, delete and modify files and folders. In this tutorial one can expect to learn how to import this package, its basic functionalities and a sample project in python which uses this library for a data merging task.

Some Basic Functions

Let's explore the module with some example code.

Import the library:

import os

Let's get the list of methods that we can use with this module.

print(dir(os))

Output:

['DirEntry', 'F_OK', 'MutableMapping', 'O_APPEND', 'O_BINARY', 'O_CREAT', 'O_EXCL', 'O_NOINHERIT', 'O_RANDOM', 'O_RDONLY', 'O_RDWR', 'O_SEQUENTIAL', 'O_SHORT_LIVED', 'O_TEMPORARY', 'O_TEXT', 'O_TRUNC', 'O_WRONLY', 'P_DETACH', 'P_NOWAIT', 'P_NOWAITO', 'P_OVERLAY', 'P_WAIT', 'PathLike', 'R_OK', 'SEEK_CUR', 'SEEK_END', 'SEEK_SET', 'TMP_MAX', 'W_OK', 'X_OK', '_Environ', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_execvpe', '_exists', '_exit', '_fspath', '_get_exports_list', '_putenv', '_unsetenv', '_wrap_close', 'abc', 'abort', 'access', 'altsep', 'chdir', 'chmod', 'close', 'closerange', 'cpu_count', 'curdir', 'defpath', 'device_encoding', 'devnull', 'dup', 'dup2', 'environ', 'errno', 'error', 'execl', 'execle', 'execlp', 'execlpe', 'execv', 'execve', 'execvp', 'execvpe', 'extsep', 'fdopen', 'fsdecode', 'fsencode', 'fspath', 'fstat', 'fsync', 'ftruncate', 'get_exec_path', 'get_handle_inheritable', 'get_inheritable', 'get_terminal_size', 'getcwd', 'getcwdb', 'getenv', 'getlogin', 'getpid', 'getppid', 'isatty', 'kill', 'linesep', 'link', 'listdir', 'lseek', 'lstat', 'makedirs', 'mkdir', 'name', 'open', 'pardir', 'path', 'pathsep', 'pipe', 'popen', 'putenv', 'read', 'readlink', 'remove', 'removedirs', 'rename', 'renames', 'replace', 'rmdir', 'scandir', 'sep', 'set_handle_inheritable', 'set_inheritable', 'spawnl', 'spawnle', 'spawnv', 'spawnve', 'st', 'startfile', 'stat', 'stat_float_times', 'stat_result', 'statvfs_result', 'strerror', 'supports_bytes_environ', 'supports_dir_fd', 'supports_effective_ids', 'supports_fd', 'supports_follow_symlinks', 'symlink', 'sys', 'system', 'terminal_size', 'times', 'times_result', 'truncate', 'umask', 'uname_result', 'unlink', 'urandom', 'utime', 'waitpid', 'walk', 'write']

Now, using the getcwd method, we can retrieve the path of the current working directory.

print(os.getcwd())

Output:

C:\Users\hpandya\OneDrive\work\StackAbuse\os_python\os_python\Project

List Folders and Files

Let's list the folders/files in the current directory using listdir:

print(os.listdir())

Output:

['Data', 'Population_Data', 'README.md', 'tutorial.py', 'tutorial_v2.py']

As you can see, I have 2 folders: Data and Population_Data. I also have 3 files: README.md markdown file, and two Python files namely, tutorial.py and tutorial_v2.py.

In order to get the entire tree structure of my project folder, let's write a function and then use os.walk() to iterate over all the files in each folder of the current directory.

# function to list files in each folder of the current working directory

def list_files(startpath):
    for root, dirs, files in os.walk(startpath):
        # print(dirs)
        if dir!= '.git':
            level = root.replace(startpath, '').count(os.sep)
            indent = ' ' * 4 * (level)
            print('{}{}/'.format(indent, os.path.basename(root)))
            subindent = ' ' * 4 * (level + 1)
            for f in files:
                print('{}{}'.format(subindent, f))

Call this function using the current working directory path, which we saw how to do earlier:

startpath = os.getcwd()
list_files(startpath)

Output:

Project/
    README.md
    tutorial.py
    tutorial_v2.py
    Data/
        uscitiesv1.4.csv
    Population_Data/
        Alabama/
            Alabama_population.csv
        Alaska/
            Alaska_population.csv
        Arizona/
            Arizona_population.csv
        Arkansas/
            Arkansas_population.csv
        California/
            California_population.csv
        Colorado/
            Colorado_population.csv
        Connecticut/
            Connecticut_population.csv
        Delaware/
            Delaware_population.csv
        ...

Note: The output has been truncated for brevity.

As seen from the output, the folders' names are ended with a / and the files within the folders have been indented four spaces to the right. The Data folder has one csv file named uscitiesv1.4.csv. This file has data about population for each city in the United States. The folder Population_Data has folders for States, containing separated csv files for population data for each state, extracted from uscitiesv1.4.csv.

Change Working Directory

Let's change the working directory and enter into the directory of data with the state of "New York".

os.chdir('Population_Data/New York')

Now let's run the list_files method again, but in this directory.

list_files(os.getcwd())

Output:

New York/
    New York_population.csv

As you can see, we have entered the New York folder under Population_Data folder.

Create Single and Nested Directory Structure

Now, let's create a new directory called testdir in this directory.

os.mkdir('testdir')
list_files(os.getcwd())

Output:

New York/
    New York_population.csv
    testdir/

As you can see, it creates the new directory in the current working directory.

Let's create a nested directory with 2 levels.

os.mkdir('level1dir/level2dir')

Output:

Traceback (most recent call last):

  File "<ipython-input-12-ac5055572301>", line 1, in <module>
    os.mkdir('level1dir/level2dir')

FileNotFoundError: [WinError 3] The system cannot find the path specified: 'level1dir/level2dir'

We get an Error. To be specific, we get a FileNotFoundError. You might wonder, why a FileNotFound error when we are trying to create a directory.

The reason: the Python module looks for a directory called level1dir to create the directory level2dir. Since level1dir does not exist, in the first place, it throws a FileNotFoundError.

For purposes like this, the mkdirs() function is used instead, which can create multiple directories recursively.

os.makedirs('level1dir/level2dir')

Check the current directory tree,

list_files(os.getcwd())

Output:

New York/
    New York_population.csv
    level1dir/
        level2dir/
    testdir/

As we can see, now we have two subdirectories under New York folder. testdir and level1dir. level1dir has a directory underneath called level2dir.

Remove Single and Multiple Directories Recursively

The os module also had methods to modify or remove directories, which I'll show here.

Now, let's remove the directories we just created using rmdir:

os.rmdir('testdir')

Check the current directory tree to verify that the directory no longer exists:

list_files(os.getcwd())

Output:

New York/
    New York_population.csv
    level1dir/
        level2dir/

As you can see, testdir has been deleted.

Let's try and delete the nested directory structure of level1dir and level2dir.

os.rmdir('level1dir')

Output:

OSError
Traceback (most recent call last)
<ipython-input-14-690e535bcf2c> in <module>()
----> 1 os.rmdir('level1dir')

OSError: [WinError 145] The directory is not empty: 'level1dir'

As seen, this throws a OSError and rightly so. It says level1dir directory is not empty. That is correct because it has level2dir underneath it.

With the rmdir method it is not possible to remove a non-empty directory, similar to the Unix command-line version.

Just like the makedirs() method, let's try rmdirs(), which recursively removes directories in a tree structure.

os.removedirs('level1dir/level2dir')

Let's see the directory tree structure again:

list_files(os.getcwd())

Output:

New York/
    New York_population.csv

This brings us to the previous state of the directory.

Example with Data Processing

So far we have explored how to view, create, and remove a nested directory structure. Now let's see an example of how the os module helps in data processing.

For that let's go one level up in the directory structure.

os.chdir('../')

With that, let's again view the directory tree structure.

list_files(os.getcwd())

Output:

Population_Data/
    Alabama/
        Alabama_population.csv
    Alaska/
        Alaska_population.csv
    Arizona/
        Arizona_population.csv
    Arkansas/
        Arkansas_population.csv
    California/
        California_population.csv
    Colorado/
        Colorado_population.csv
    Connecticut/
        Connecticut_population.csv
    Delaware/
        Delaware_population.csv
...

Note: The output has been truncated for brevity.

Let's merge the data from all of the states, iterating over the directory of each state and merging the CSV files likewise.

import os
import pandas as pd

# create a list to hold the data from each state
list_states = []

# iteratively loop over all the folders and add their data to the list
for root, dirs, files in os.walk(os.getcwd()):
    if files:
        list_states.append(pd.read_csv(root+'/'+files[0], index_col=None))

# merge the dataframes into a single dataframe using Pandas library
merge_data = pd.concat(list_states[1:], sort=False)

Thanks in part to the os module we were able to create merge_data, which is a dataframe containing population data from every state.

Conclusion

In this article, we briefly explored different capabilities of Python's built-in os module. We also saw a brief example of how the module can be used in the world of Data Science and Analytics. It is important to understand that os has a lot more to offer, and based on the need of the developer a much more complex logic can be constructed.