Introduction
Python is not necessarily known for its speed, but there are certain things that can help you squeeze out a bit more performance from your code. Surprisingly, one of these practices is running code in a function rather than in the global scope. In this article, we'll see why Python code runs faster in a function and how Python code execution works.
Python Code Execution
To understand why Python code runs faster in a function, we need to first understand how Python executes code. Python is an interpreted language, which means it reads and executes code line by line. When Python executes a script, it first compiles it to bytecode, an intermediate language that's closer to machine code, and then the Python interpreter executes this bytecode.
def hello_world():
print("Hello, World!")
import dis
dis.dis(hello_world)
2 0 LOAD_GLOBAL 0 (print)
2 LOAD_CONST 1 ('Hello, World!')
4 CALL_FUNCTION 1
6 POP_TOP
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
The dis module in Python disassembles the function hello_world
into bytecode, as seen above.
Note: The Python interpreter is a virtual machine that executes the bytecode. The default Python interpreter is CPython, which is written in C. There are other Python interpreters like Jython (written in Java), IronPython (for .NET), and PyPy (written in Python and C), but CPython is the most commonly used.
Why Python Code Runs Faster in a Function
Consider a simplified example with a loop that iterates over a range of numbers:
def my_function():
for i in range(100000000):
pass
When this function is compiled, the bytecode might look something like this:
SETUP_LOOP 20 (to 23)
LOAD_GLOBAL 0 (range)
LOAD_CONST 3 (100000000)
CALL_FUNCTION 1
GET_ITER
FOR_ITER 6 (to 22)
STORE_FAST 0 (i)
JUMP_ABSOLUTE 13
POP_BLOCK
LOAD_CONST 0 (None)
RETURN_VALUE
The key instruction here is STORE_FAST
, which is used to store the loop variable i
.
Now let's consider the bytecode if the loop is at the top level of a Python script:
SETUP_LOOP 20 (to 23)
LOAD_NAME 0 (range)
LOAD_CONST 3 (100000000)
CALL_FUNCTION 1
GET_ITER
FOR_ITER 6 (to 22)
STORE_NAME 1 (i)
JUMP_ABSOLUTE 13
POP_BLOCK
LOAD_CONST 2 (None)
RETURN_VALUE
Notice the STORE_NAME
instruction is used here, rather than STORE_FAST
.
The bytecode STORE_FAST
is faster than STORE_NAME
because in a function, local variables are stored in a fixed-size array, not a dictionary. This array is directly accessible via an index, making variable retrieval very quick. Basically, it's just a pointer lookup into the list and an increase in the reference count of the PyObject, both of which are highly efficient operations.
On the other hand, global variables are stored in a dictionary. When you access a global variable, Python has to perform a hash table lookup, which involves calculating a hash and then retrieving the value associated with it. Though this is optimized, it's still inherently slower than an index-based lookup.
Benchmarking and Profiling Python Code
Want to test this for yourself? Try benchmarking and profiling your code.
Benchmarking and profiling are important practices in performance optimization. They help you understand how your code behaves and where the bottlenecks are.
Benchmarking is where you time your code to see how long it takes to run. You can use Python's built-in time
module, as we'll show later, or use more sophisticated tools like timeit.
Profiling, on the other hand, provides a more detailed view of your code's execution. It shows you where your code spends most of its time, which functions are called, and how often. Python's built-in profile or cProfile modules can be used for this.
Here's one way you can profile your Python code:
import cProfile
def loop():
for i in range(10000000):
pass
cProfile.run('loop()')
This will output a detailed report of all the function calls made during the execution of the loop
function.
Note: Profiling can add quite a bit of overhead to your code execution, so the execution time shown by the profiler will likely be longer than the actual execution time.
Benchmarking Code in a Function vs. Global Scope
In Python, the speed of code execution can vary depending on where the code is executed - in a function or in the global scope. Let's compare the two using a simple example.
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
Consider the following code snippet that calculates the factorial of a number:
def factorial(n):
result = 1
for i in range(1, n + 1):
result *= i
return result
Now let's run the same code but in the global scope:
n = 20
result = 1
for i in range(1, n + 1):
result *= i
To benchmark these two pieces of code, we can use the timeit
module in Python, which provides a simple way to time small bits of Python code.
import timeit
# Factorial function here...
def benchmark():
start = timeit.default_timer()
factorial(20)
end = timeit.default_timer()
print(end - start)
#
# Run benchmark on function code
#
benchmark()
# Prints: 3.541994374245405e-06
#
# Run benchmark on global scope code
#
start = timeit.default_timer()
n = 20
result = 1
for i in range(1, n + 1):
result *= i
end = timeit.default_timer()
print(end - start)
# Pirnts: 5.375011824071407e-06
You'll find that the function code executes faster than the global scope code. This is because Python executes function code faster due to the reasons we discussed earlier.
Note: If you run the benchmark()
function and global scope code in the same script, the global scope code will run faster. This is because the benchmark()
function adds some overhead to the execution time and the global code is given some optimizations internally. However, if you run them separately, you'll find that the function code does run faster.
Profiling Code in a Function vs. Global Scope
Python provides a built-in module called cProfile
for this purpose. Let's use it to profile a new function, which computes the sum of squres, in both a local and global scope.
import cProfile
def sum_of_squares():
total = 0
for i in range(1, 10000000):
total += i * i
i = None
total = 0
def sum_of_squares_g():
global i
global total
for i in range(1, 10000000):
total += i * i
def profile(func):
pr = cProfile.Profile()
pr.enable()
func()
pr.disable()
pr.print_stats()
#
# Profile function code
#
print("Function scope:")
profile(sum_of_squares)
#
# Profile global scope code
#
print("Global scope:")
profile(sum_of_squares_g)
From the profiling results, you'll see that the function code is more efficient in terms of execution time.
Function scope:
2 function calls in 0.903 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.903 0.903 0.903 0.903 profiler.py:3(sum_of_squares)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Global scope:
2 function calls in 1.358 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 1.358 1.358 1.358 1.358 profiler.py:10(sum_of_squares_g)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
We consider the sum_of_squares_g()
function to be global since it used two global variables, i
and total
. As we saw earlier, it's the global variables that slow down the code execution, which is why we made those variables global in this code.
Optimizing Python Function Performance
Given that Python functions tend to run faster than equivalent code in the global scope, it's worth looking into how we can further optimize our function performance.
Of course, because of what we saw earlier, one strategy is to use local variables instead of global variables. Here's an example:
import time
# Global variable
x = 5
def calculate_power_global():
for i in range(10000000):
y = x ** 2 # Accessing global variable
def calculate_power_local(x):
for i in range(10000000):
y = x ** 2 # Accessing local variable
start = time.time()
calculate_power_global()
end = time.time()
print(f"Execution time with global variable: {end - start} seconds")
start = time.time()
calculate_power_local(x)
end = time.time()
print(f"Execution time with local variable: {end - start} seconds")
In this example, calculate_power_local
will typically run faster than calculate_power_global
, because it's using a local variable instead of a global one.
Execution time with global variable: 1.9901456832885742 seconds
Execution time with local variable: 1.9626312255859375 seconds
Another optimization strategy is to use built-in functions and libraries whenever possible. Python's built-in functions are implemented in C, which is much faster than Python. Similarly, many Python libraries, such as NumPy and Pandas, are also implemented in C or C++, making them faster than equivalent Python code.
For example, consider the task of summing a list of numbers. You could write a function to do this:
def sum_numbers(numbers):
total = 0
for number in numbers:
total += number
return total
However, Python's built-in sum
function will do the same thing, but faster:
numbers = [1, 2, 3, 4, 5]
total = sum(numbers)
Try timing these two code snippets yourself and figure out which one is faster!
Conclusion
In this article, we've explored the interesting world of Python code execution, specifically focusing on why Python code tends to run faster when encapsulated in a function. We briefly looked into the concepts of benchmarking and profiling, providing practical examples of how these processes can be carried out in both a function and the global scope.
We also discussed a few ways to optimize your Python function performance. While these tips can certainly make your code run faster, you should use certain optimizations carefully as it's important to balance readability and maintainability with performance.