In this article we go over the Python file types .pyc, .pyo and .pyd, and how they're used to store bytecode that will be imported by other Python programs.
You might have worked with .py files writing Python code, but you want to know what these other file types do and where they come into use. To understand these, we will look at how Python transforms code you write into instructions the machine can execute directly.
Bytecode and the Python Virtual Machine
Python ships with an interpreter that can be used as a REPL (read-eval-print-loop), interactively, on the command line. Alternatively, you can invoke Python with scripts of Python code. In both cases, the interpreter parses your input and then compiles it into bytecode (lower-level machine instructions) which is then executed by a "Pythonic representation" of the computer. This Pythonic representation is called the Python virtual machine.
However, it differs enough from other virtual machines like the Java virtual machine or the Erlang virtual machine that it deserves its own study. The virtual machine, in turn, interfaces with the operating system and actual hardware to execute native machine instructions.
The critical thing to keep in mind when you see .pyc, .pyo and .pyd file types, is that these are files created by the Python interpreter when it transforms code into compiled bytecode. Compilation of Python source into bytecode is a necessary intermediate step in the process of translating instructions from source code in human-readable language into machine instructions that your operating system can execute.
Throughout this article we'll take a look at each file type in isolation, but first we'll provide a quick background on the Python virtual machine and Python bytecode.
The .pyc File Type
We consider first the .pyc file type. Files of type .pyc are automatically generated by the interpreter when you import a module, which speeds up future importing of that module. These files are therefore only created from a .py file if it is imported by another .py file or module.
Here is an example Python module which we want to import. This module calculates factorials.
# math_helpers.py # a function that computes the nth factorial, e.g. factorial(2) def factorial(n): if n == 0: return 1 else: return n * factorial(n - 1) # a main function that uses our factorial function defined above def main(): print("I am the factorial helper") print("you can call factorial(number) where number is any integer") print("for example, calling factorial(5) gives the result:") print(factorial(5)) # this runs when the script is called from the command line if __name__ == '__main__': main()
Now, when you just run this module from the command line, using
python math_helpers.py, no .pyc files get created.
Let's now import this in another module, as shown below. We are importing the factorial function from the math_helpers.py file and using it to compute the factorial of 6.
# computations.py # import from the math_helpers module from math_helpers import factorial # a function that makes use of the imported function def main(): print("Python can compute things easily from the REPL") print("for example, just write : 4 * 5") print("and you get: 20.") print("Computing things is easier when you use helpers") print("Here we use the factorial helper to find the factorial of 6") print(factorial(6)) # this runs when the script is called from the command line if __name__ == '__main__': main()
We can run this script by invoking
python computations.py at the terminal. Not only do we get the result of 6 factorial, i.e. 720, but we also notice that the interpreter automatically creates a math_helpers.pyc file. This happens because the computations module imports the math_helpers module. To speed up the loading of the imported module in the future, the interpreter creates a bytecode file of the module.
When the source code file is updated, the .pyc file is updated as well. This happens whenever the update time for the source code differs from that of the bytecode file and ensures that the bytecode is up to date.
Note that using .pyc files only speeds up the loading of your program, not the actual execution of it. What this means is that you can improve startup time by writing your main program in a module that gets imported by another, smaller module. To get performance improvements more generally, however, you'll need to look into techniques like algorithm optimization and algorithmic analysis.
Because .pyc files are platform independent, they can be shared across machines of different architectures. However, if developers have different clock times on their systems, checking in the .pyc files into source control can create timestamps that are effectively in the future for others' time readings. As such, updates to source code no longer trigger changes in the bytecode. This can be a nasty bug to discover. The best way to avoid it is to add .pyc files to the ignore list in your version control system.
The .pyo File Type
The .pyo file type is also created by the interpreter when a module is imported. However, the .pyo file results from running the interpreter when optimization settings are enabled.
The optimizer is enabled by adding the "-O" flag when we invoke the Python interpreter. Here is a code example to illustrate the use of optimization. First, we have a module that defines a lambda. In Python, a lambda is just like a function, but is defined more succinctly.
# lambdas.py # a lambda that returns double whatever number we pass it g = lambda x: x * 2
If you remember from the previous example, we will need to import this module to make use of it. In the following code listing, we import lambdas.py and make use of the g lambda.
# using_lambdas.py # import the lambdas module import lambdas # a main function in which we compute the double of 7 def main(): print(lambdas.g(7)) # this executes when the module is invoked as a script at the command line if __name__ == '__main__': main()
Now we come to the critical part of this example. Instead of invoking Python normally as in the last example, we will make use of optimization here. Having the optimizer enabled creates smaller bytecode files than when not using the optimizer.
To run this example using the optimizer, invoke the command:
$ python -O using_lambdas.py
Not only do we get the correct result of doubling 7, i.e. 14, as output at the command line, but we also see that a new bytecode file is automatically created for us. This file is based on the importation of lambdas.py in the invocation of using_lambdas.py. Because we had the optimizer enabled, a .pyo bytecode file is created. In this case, it is named lambdas.pyo.
The optimizer, which doesn't do a whole lot, removes assert statements from your bytecode. The result won't be noticeable in most cases, but there may be times when you need it.
Also note that, since a .pyo bytecode file is created, it substitutes for the .pyc file that would have been created without optimization. When the source code file is updated, the .pyo file is updated whenever the update time for the source code differs from that of the bytecode file.
The .pyd File Type
The .pyd file type, in contrast to the preceding two, is platform-specific to the Windows class of operating systems. It may thus be commonly encountered on personal and enterprise editions of Windows 10, 8, 7 and others.
In the Windows ecosystem, a .pyd file is a library file containing Python code which can be called out to and used by other Python applications. In order to make this library available to other Python programs, it is packaged as a dynamic link library.
Dynamic link libraries (DLLs) are Windows code libraries that are linked to calling programs at run time. The main advantage of linking to libraries at run time like the DLLs is that it facilitates code reuse, modular architectures and faster program startup. As a result, DLLs provide a lot of functionality around the Windows operating systems.
A .pyd file is a dynamic link library that contains a Python module, or set of modules, to be called by other Python code. To create a .pyd file, you need to create a module named, for example, example.pyd. In this module, you will need to create a function named
PyInit_example(). When programs call this library, they need to invoke
import foo, and the
PyInit_example() function will run.
For more information on creating your own Python .pyd files, check out this article.
Differences Between These File Types
While some similarities exist between these file types, there are also some big differences. For example, while the .pyc and .pyo files are similar in that they contain Python bytecode, they differ in that the .pyo files are more compact thanks to the optimizations made by the interpreter.
The third file type, the .pyd, differs from the previous two by being a dynamically-linked library to be used on the Windows operating system. The other two file types can be used on any operating system, not just Windows.
Each of these file types, however, involve code that is called and used by other Python programs.
In this article we described how each special file type, .pyc, .pyo, and .pyd, is used by the Python virtual machine for re-using code. Each file, as we saw, has its own special purposes and use-cases, whether it be to speed up module loading, speed up execution, or facilitate code re-use on certain operating systems.