Introduction
Modules are the highest level organizational unit in Python. If you're at least a little familiar with Python, you've probably not only used ready modules, but also created a few yourself. So what exactly is a module? Modules are units that store code and data, provide code-reuse to Python projects, and are also useful in partitioning the system's namespaces in self-contained packages. They're self-contained because you can only access a module's attributes after importing it. You can also understand them as packages of names, which when imported become attributes of the imported module object. In fact, any Python file with a .py extension represents a module.
In this article we start from the core basics of module creation and importing, to more advanced module usage cases, to packaging and submitting your modules to an "official" Python software repository, structured respectively into three parts: Creating a Module, Using a Module, and Submitting a Package to PyPI.
Creating a Module
The Basics
There's really not much philosophy in creating a Python module since files with a .py suffix represents a module. Although, not every Python file is designed to be imported as a module. Python files which are used to run as a stand-alone Python app (top-level files) are usually designed to run as scripts and importing them would actually run the commands in the script.
Modules which are designed to be imported by other code won't execute any code, but only expose its top-level names as attributes to the imported object. It is also possible to design dual-mode code Python modules which could be used for both - importing and running as a top-level script.
While module creation rules are pretty relaxed, there is one rule on module naming. Since module filenames become variable names in Python when imported, it is not permitted to name modules with Python reserved words. For example a for.py module can be created, but cannot be imported because "for" is reserved word. Let's illustrate what we've mentioned so far in a "Hello world!" example.
# Module file: my_module.py
def hello_printer():
print("Hello world!")
name = "John"
# Script file: my_script.py
import my_module
my_module.hello_printer()
print("Creator:", my_module.name)
The 'my_module.py' is designed as a module whose code can be imported and reused in other Python files. You can see that by its content: it doesn't call for any action, just defines functions and variables. In contrast, the 'my_script.py' is designed as a top-level script which runs the Python program - it explicitly calls a function hello_printer
and prints a variable's value to the screen.
Let's run the 'my_script.py' file in the terminal:
$ python my_script.py
Hello world!
Creator: John
As noted before, an important takeaway from this first basic example is that module filenames are important. Once imported they become variables/objects in the importer module. All top-level code definitions within a module become attributes of that variable.
By 'top-level' I mean any function or variable which is not nested inside another function or class. These attributes can then be accessed using the standard <object>.<attribute>
statement in Python.
In the following section we first look at the "big picture" of multi-file Python programs, and then in "dual mode" Python files.
Program Architecture
Any non-trivial Python program would be organized in multiple files, connected with each other using imports. Python, as most of the other programming languages, uses this modular program structure, where functionalities are grouped in to reusable units. In general, we can distinguish three types of files in a multi-file Python application:
- top-level file: A Python file, or script, which is the main entry point of the program. This file is run to launch your application.
- user-defined modules: Python files which are imported in to the top-level file, or among each other, and provide separate functionalities. These files are usually not launched directly from your command prompt, and are custom-made for the purpose of the project.
- standard library modules: Pre-coded modules which are built-in to the Python installation package, such as platform-independent tools for system interfaces, Internet scripting, GUI construction, and others. These modules are not part of the Python executable itself, but part of the standard Python library.
Figure 1 shows an example program structure with the three file types:
Figure 1: An example program structure including a top-level script, custom modules, and standard library modules.
In this figure, the module 'top_module.py' is a top-level Python file which imports tools defined in module 'module1', but also has access to tools in 'module2' through 'module1'. The two custom modules use each other's resources, as well as other modules from the standard Python library. The importing chain can go as deep as you want: there's no limit in the number of imported files, and they can import each-other, although you need to be careful with circular importing.
Let's illustrate this through a code example:
# top_module.py
import module1
module1.print_parameters()
print(module1.combinations(5, 2))
# module1.py
from module2 import k, print_parameters
from math import factorial
n = 5.0
def combinations(n, k):
return factorial(n) / factorial(k) / factorial(n-k)
# module2.py
import module1
k = 2.0
def print_parameters():
print('k = %.f n = %.f' % (k, module1.n))
In the above example, 'top_module.py' is a top-level module that will be run by the user, and it imports tools from other modules through 'module1.py'. module1
and module2
are user-defined modules, while the 'math' module is imported from the standard Python library. When running the top-level script, we get:
$ python top_module.py
k = 2 n = 5
10.0
When a top-level Python file is run, its source code statements, and the statements within imported modules, are compiled in an intermediate format known as byte code, which is a platform-independent format. Byte code files of imported modules are stored with a .pyc extension in the same directory as the .py file for Python versions up to 3.2, and in directory __pycache__ in the program's home directory in Python 3.2+.
$ ls __pycache__/
module1.cpython-36.pyc module2.cpython-36.pyc
Dual-Mode Code
As mentioned earlier, Python files can also be designed as both importable modules and top-level scripts. That is, when run, the Python module will run as a stand-alone program, and when imported, it will act as a importable module containing code definitions.
This is easily done using the attribute __name__ , which is automatically built into every module. If the module is run as a top-level script the __name__ attribute will equal to the string "__main__", otherwise if imported, it will contain the name of the actual module.
Here's an example of dual-mode code:
# hiprinter.py
# Name definitions part
multiply = 3
def print_hi():
print("Hi!" * multiply)
# Stand-alone script part
if __name__ == '__main__':
print_hi()
The above 'hiprinter.py' file defines a function, which will be exposed to the client when it's imported. If the file is run as a stand-alone program, the same function is called automatically. The difference here, compared with the 'my_script.py' example in Section The Basics, is that when 'hiprinter.py' is imported it won't run the code nested under the if __name__ == '__main__'
statement.
# Terminal window
$ python hiprinter.py
Hi!Hi!Hi!
# Python interpreter
>> import hiprinter
>> hiprinter.print_hi()
Hi!Hi!Hi!
The dual-mode code is very common in practice, and especially useful for unit-testing: while variables and functions are defined as top-level names in the file, the part inside the if
statement can serve as a testing area of the above defined names.
Using a Module
Import Statements
The example in Section Program Architecture was useful to look at the difference between two importing statements: import
and from
. The main difference is that import
loads the entire module as a single object, while from
loads specific properties and functions from the module. Importing names with the from
statement can then be used directly in the importer module, without calling the imported object name.
Using the from
statement is only allowed in the top-level of the module file in Python 3.x, and not within a function. Python 2.x allows to use it in a function, but issues a warning. Performance-wise, the from
statement is slower than import
because it does all the work that import
does - going through all the content of the imported module, and then does an extra step in selecting the appropriate names for importing.
There's also a third import statement from *
which is used to import all top-level names from the imported module and use them directly in the importer class. For example we could have used:
from module2 import *
This would import all names (variables and functions) from the module2.py file. This approach is not recommended because of possible name duplication - the imported names could overwrite already existing names in the importer module.
Module Search Path
One important aspect when writing modular Python apps is locating the modules that need to be imported. While modules of the standard Python library are configured to be globally accessible, importing user-defined modules across directory boundaries can get more complicated.
Python uses a list of directories in which it looks for modules, known as the search path. The search path is composed of directories found in the following:
- Program's home directory. The location of the top-level script. Note that the home directory may not be the same with the current working directory.
PYTHONPATH
directories. If set, thePYTHONPATH
environment variable defines a concatenation of user-defined directories where the Python interpreter should look for modules.- Standard library directories. These directories are automatically set with the installation of Python, and are always searched.
- Directories listed in .pth files. This option is an alternative to
PYTHONPATH
, and it works by adding your directories, one per line, in a text file with suffix .pth, which should be placed in the Python install directory, which usually is /usr/local/lib/python3.6/ on a Unix machine or C:\Python36\ on a Windows machine. - The site-packages directory. This directory is where all the third-party extensions are automatically added.
PYTHONPATH
is probably the most suitable way for developers to include their custom modules in the search path. You can easily check if the variable is set on your computer, which in my case results in:
$ echo $PYTHONPATH
/Users/Code/Projects/:
To create the variable on a Windows machine you should use the instructions in "Control Panel -> System -> Advanced", while on a MacOS and other Unix systems it's easiest to append the following line to either ~/.bashrc or ~/.bash_profile files, where your directories are concatenated with a colon (":") sign.
export PYTHONPATH=<Directory1:Directory2:...:DirectoryN>:$PYTHONPATH".
This method is very similar to adding directories to your Unix $PATH.
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
Once all directories are found in the search path during the program startup, they are stored in a list which can be explored with sys.path
in Python. Of course, you could also append a directory to sys.path
, and then import your modules, which will only modify the search path during the execution of the program.
Anyway PYTHONPATH
and .pth options allow more permanent modification of the search path. It is important to know that Python scans the search path string from left to right, thus modules within the left-most listed directories can overwrite ones with the same name in the right-most part. Note that the module search paths are needed only for importing modules across different directories.
As shown in the following example, the empty string at the front of the list is for the current directory:
import sys
sys.path
['',
'/Users/Code/Projects',
'/Users/Code/Projects/Blogs',
'/Users/Code/anaconda3/lib/python36.zip',
'/Users/Code/anaconda3/lib/python3.6',
'/Users/Code/anaconda3/lib/python3.6/site-packages',
'/Users/Code/anaconda3/lib/python3.6/site-packages/IPython/extensions',
'/Users/Code/.ipython']
As a bottom line, organizing your Python program in multiple interconnected modules is fairly straightforward if your program is well-structured: in self-contained, naturally grouped code portions. In more complex, or not-so-well structured programs, importing can become a burden and you'll need to tackle more advanced importing topics.
Module Reloads
Thanks to caching, a module can be imported only once per process. Since Python is interpreted language, it runs the imported module's code once it reaches an import
or from
statement. Later imports within the same process (for example: the same Python interpreter) won't run the imported module's code again. It'll just retrieve the module from cache.
Here's an example. Let's reuse the above code in 'my_module.py', import it in a Python interpreter, then modify the file, and re-import it again.
>> import my_module
>> print(my_module.name)
John
# Now modify the 'name' variable in 'my_module.py' into name = 'Jack' and reimport the module
>> import my_module
>> print(my_module.name)
John
To disable caching and enable re-importing of modules, Python provides a reload
function. Let's try it in the same Python window as earlier:
>> from imp import reload # Python3.x
>> reload(my_module)
<module 'my_module' from '/Users/Code/Projects/small_example/my_module.py'>
>> print(my_module.name)
Jack
The reload
function modifies the module in-place. That is, without affecting other objects which reference to the imported module. You may notice that the function also returns the module itself, giving its name and file path. This feature is especially useful in the development phase, but also in larger projects.
For example, for programs which need an always-on connectivity to a server it's much more costly to restart the whole application than doing a dynamic reload, or for hot-reloading for use during development.
Module Packages
When importing module names, you actually load Python files stored somewhere in your file system. As mentioned earlier, the imported modules must reside in a directory, which is listed in your module search path (sys.path
). In Python there's more than these "name imports" - you can actually import a whole directory containing Python files as a module package. These imports are known as package imports.
So how do you import module packages? Let's create a directory named 'mydir' which includes a 'mod0.py' module and two subdirectories 'subdir1' and 'subdir2', containing the 'mod1.py' and 'mod2.py' modules respectively. The directory structure looks like this:
$ ls -R mydir/
mod0.py subdir1 subdir2
my_dir//subdir1:
mod1.py
my_dir//subdir2:
mod2.py
The usual approach explained so far was to add the 'mydir', 'subdir1', and 'subdir2' paths to the module search path (sys.path
), in order to be able to import 'mod0.py', 'mod1.py' and 'mod2.py'. This could become a big overhead if your modules are spread across many different subdirectories, which is usually the case. Anyway, package imports are here to help. They work with importing the name of the folder itself.
This command for example is not permitted, and will result in InvalidSyntax error:
>> import /Users/Code/Projects/mydir/
File "<stdin>", line 1
import /Users/Code/Projects/mydir/
^
SyntaxError: invalid syntax
The right way to do it is to set only the container directory '/Users/Code/Projects/' in your module search path (adding it to the PYTHONPATH
environment variable or listing it in a .pth file) and then import your modules using the dotted syntax. These are some valid imports:
>> import mydir.mod0
>> import mydir.subdir1.mod1 as mod1
>> from mydir.subdir2.mod2 import print_name # print_name is a name defined within mod2.py
You've probably noticed previously that some Python directories include a __init__.py file. This was actually a requirement in Python2.x in order to tell Python that your directory is a module package. The __init__.py file is also a normal Python file which runs whenever that directory is imported, and is suitable for initializing values, e.g. for making the connection to a database.
Anyway, in most of the cases these files are just left empty. In Python3.x these files are optional, and you can use them if needed. The next few lines show how names defined in __init__.py become attributes of the imported object (the name of the directory containing it).
# __init__.py file in mydir/subdir1/ with code:
param = "init subdir1"
print(param)
# Import it from a Python interpreter
>> import mydir.subdir1.mod1
init subdir1
# param is also accessible as an attribute to mydir.subdir1 object
>> print(mydir.subdir1.param)
init subdir1
Another important topic when talking about module packages is relative imports. Relative imports are useful when importing modules within the package itself. In this case Python will look for the imported module within the scope of the package and not in the module search path.
We'll demonstrate one useful case with an example:
# mydir/subdir1/mod1.py
import mod2
# In Python interpreter:
>> import mydir.subdir1.mod1
ModuleNotFoundError: No module named 'mod2'
The import mod2
line tells Python to search for module 'mod2' in the module search path, and therefore it's unsuccessful. Instead, a relative import will work just fine. The following relative import statement uses a double dot ("..") which denotes the parent of the current package ('mydir/'). The following subdir2 must be included to create a full relative path to the mod2 module.
# mydir/subdir1/mod1.py
from ..subdir2 import mod2
Relative imports are a huge topic and could take up an entire book chapter. They also highly differ between Python2.x and 3.x versions. For now, we've only show one useful case, but there should be more to follow in separate blog posts.
And speaking of Python 2.x, support for this version ends in 2020, so in cases where there is big difference between Python versions, like in relative imports, it's better to focus on the 3.x version.
Submitting a Package to PyPi
So far you've learned how to write Python modules, distinguish between importable modules and top-level ones, use user-defined modules across directory boundaries, amend the module search path, and create/import module packages, among other things. Once you've created a useful software, packed in a module package, you might want to share it with the large Python community. After all, Python is built and maintained by the community.
The Python Package Index (PyPI) is a software repository for Python, currently holding over 120K of packages (as of the time of this writing). You might have installed modules before from this repository using the pip
command.
For example the following line will download and install the Numpy library for scientific computing:
$ pip install numpy
There's more information on installing packages with pip here. But how do you contribute your own package? Here are a few steps to help you with it.
- First, satisfy the requirements for packaging and distributing. There are two steps needed here:
$ pip install twine
- The next step is to configure your project. In general this means adding a few Python files to your project that will hold the configuration information, guides for usage, etc. PyPI provides an example sample project which you can use as a guide. Here are the most important files you need to add:
- setup.py: This file needs to be added to the root of your project, and serves as an installation command line interface. It must contain a
setup()
function which will accept as arguments information such as: project name, version, description, license, project dependencies, etc. - README.rst: A text file describing your package.
- licence.txt: A text file containing your software licence. More information on choosing a license, via GitHub.
- setup.py: This file needs to be added to the root of your project, and serves as an installation command line interface. It must contain a
- Package your project. The most used package type is 'wheel', although you could also provide the minimum requirement as 'source distribution/package'. Here you need to use the 'setup.py' file from the previous step. Running one of the following commands will create a 'dist/' directory in the root of your project, which contains the files to upload to PyPI.
# Package as source distribution
$ python setup.py sdist
# Package as wheel supporting a single Python version
$ python setup.py bdist_wheel
- The final step is uploading your distribution to PyPI. Basically there're two steps here:
- Create a PyPI account.
- Upload the contents of the 'dist/' directory created in the previous step. Here you might want to upload a test first using the PyPI Test Site.
$ twine upload dist/*
That's pretty much it. For more information, the PyPI website has all the detailed instructions if you get stuck.
Conclusion
This post was intended to guide you from the core basics of Python modules (creating and importing your first importable modules), to a bit more advanced topics (amending the search path, module packages, reloads and some basic relative imports), to submitting your Python package to the Python software repository PyPI.
There is a lot of information on this topic and we weren't able to cover everything in this one post, so you may not be able to tackle all these steps and submit an official package within the reading time of this post. However, each step should be a brief introduction to guide you on your learning path.