Understanding Python's "yield" Keyword

The yield keyword in Python is used to create generators. A generator is a type of collection that produces items on-the-fly and can only be iterated once. By using generators you can improve your application's performance and consume less memory as compared to normal collections, so it provides a nice boost in performance.

In this article we'll explain how to use the yield keyword in Python and what it does exactly. But first, let's study the difference between a simple list collection and generator, and then we will see how yield can be used to create more complex generators.

Differences Between a List and Generator

In the following script we will create both a list and a generator and will try to see where they differ. First we'll create a simple list and check its type:

# Creating a list using list comprehension
squared_list = [x**2 for x in range(5)]

# Check the type
type(squared_list)  

When running this code you should see that the type displayed will be "list".

Now let's iterate over all the items in the squared_list.

# Iterate over items and print them
for number in squared_list:  
    print(number)

The above script will produce following results:

$ python squared_list.py 
0  
1  
4  
9  
16  

Now let's create a generator and perform the same exact task:

# Creating a generator
squared_gen = (x**2 for x in range(5))

# Check the type
type(squared_gen)  

To create a generator, you start exactly as you would with list comprehension, but instead you have to use parentheses instead of square brackets. The above script will display "generator" as the type for squared_gen variable. Now let's iterate over the generator using a for-loop.

for number in squared_gen:  
    print(number)

The output will be:

$ python squared_gen.py 
0  
1  
4  
9  
16  

The output is the same as that of the list. So what is the difference? One of the main differences lies in the way the list and generators store elements in the memory. Lists store all of the elements in memory at once, whereas generators "create" each item on-the-fly, displays it, and then moves to the next element, discarding the previous element from the memory.

One way to verify this is to check the length of both the list and generator that we just created. The len(squared_list) will return 5 while len(squared_gen) will throw an error that a generator has no length. Also, you can iterate over a list as many times as you want but you can iterate over a generator only once. To iterate again, you must create the generator again.

Using the Yield Keyword

Now we know the difference between simple collections and generators, let us see how yield can help us define a generator.

In the previous examples, we created a generator implicitly using the list comprehension style. However in more complex scenarios we can instead create functions that return a generator. The yield keyword, unlike the return statement, is used to turn a regular Python function in to a generator. This is used as an alternative to returning an entire list at once. This will be again explained with the help of some simple examples.

Again, let's first see what our function returns if we do not use the yield keyword. Execute the following script:

def cube_numbers(nums):  
    cube_list =[]
    for i in nums:
        cube_list.append(i**3)
    return cube_list

cubes = cube_numbers([1, 2, 3, 4, 5])

print(cubes)  

In this script a function cube_numbers is created that accepts a list of numbers, take their cubes and returns the entire list to the caller. When this function is called, a list of cubes is returned and stored in the cubes variable. You can see from the output that the returned data is in-fact a full list:

$ python cubes_list.py 
[1, 8, 27, 64, 125]

Now, instead of returning a list, let's modify the above script so that it returns a generator.

def cube_numbers(nums):  
    for i in nums:
        yield(i**3)

cubes = cube_numbers([1, 2, 3, 4, 5])

print(cubes)  

In the above script, the cube_numbers function returns a generator instead of list of cubed number. It's very simple to create a generator using the yield keyword. Here we do not need the temporary cube_list variable to store cubed number, so even our cube_numbers method is simpler. Also, no return statement is needed, but instead the yield keyword is used to return the cubed number inside of the for-loop.

Now, when cube_number function is called, a generator is returned, which we can verify by running the code:

$ python cubes_gen.py 
<generator object cube_numbers at 0x1087f1230>  

Even though we called the cube_numbers function, it doesn't actually execute at this point in time, and there are not yet any items stored in memory.

To get the function to execute, and therefore the next item from generator, we use the built-in next method. When you call the next iterator on the generator for the first time, the function is executed until the yield keyword is encountered. Once yield is found the value passed to it is returned to the calling function and the generator function is paused in its current state.

Here is how you get a value from your generator:

next(cubes)  

The above function will return "1". Now when you call next again on the generator, the cube_numbers function will resume executing from where it stopped previously at yield. The function will continue to execute until it finds yield again. The next function will keep returning cubed value one by one until all the values in the list are iterated.

Once all the values are iterated the next function throws a StopIteration exception. It is important to mention that the cubes generator doesn't store any of these items in memory, rather the cubed values are computed at runtime, returned, and forgotten. The only extra memory used is the state data for the generator itself, which is usually much less than a large list. This makes generators ideal for memory-intensive tasks.

Instead of always having to use the next iterator, you can instead use a "for" loop to iterate over a generators values. When using a "for" loop, behind the scenes the next iterator is called until all the items in the generator are iterated over.

Optimized Performance

As mentioned earlier, generators are very handy when it comes to memory-intensive tasks since they do not need to store all of the collection items in memory, rather they generate items on the fly and discards it as soon as the iterator moves to the next item.

In the previous examples the performance difference of a simple list and generator was not visible since the list sizes were so small. In this section we'll check out some examples where we can distinguish between the performance of lists and generators.

In the code below we will write a function that returns a list that contains 1 million dummy car objects. We will calculate the memory occupied by the process before and after calling the function (which creates the list).

Take a look at the following code:

import time  
import random  
import os  
import psutil

car_names = ['Audi', 'Toyota', 'Renault', 'Nissan', 'Honda', 'Suzuki']  
colors = ['Black', 'Blue', 'Red', 'White', 'Yellow']

def car_list(cars):  
    all_cars = []
    for i in range (cars):
        car = {
            'id': i,
            'name': random.choice(car_names),
            'color': random.choice(colors)
        }
        all_cars.append(car)
    return all_cars

# Get used memory
process = psutil.Process(os.getpid())  
print('Memory before list is created: ' + str(process.memory_info().rss/1000000))

# Call the car_list function and time how long it takes
t1 = time.clock()  
cars = car_list(1000000)  
t2 = time.clock()

# Get used memory
process = psutil.Process(os.getpid())  
print('Memory after list is created: ' + str(process.memory_info().rss/1000000))

print('Took {} seconds'.format(t2-t1))  

Note: You may have to pip install psutil to get this code to work on your machine.

In the machine on which the code was run, following results were obtained (yours may look slightly different):

$ python perf_list.py 
Memory before list is created: 8  
Memory after list is created: 334  
Took 1.584018 seconds  

Before the list was created the process memory was 8 MB, and after the creation of list with 1 million items, the occupied memory jumped to 334 MB. Also, the time it took to create the list was 1.58 seconds.

Now, let's repeat the above process but replace the list with generator. Execute the following script:

import time  
import random  
import os  
import psutil

car_names = ['Audi', 'Toyota', 'Renault', 'Nissan', 'Honda', 'Suzuki']  
colors = ['Black', 'Blue', 'Red', 'White', 'Yellow']

def car_list_gen(cars):  
    for i in range (cars):
        car = {
            'id':i,
            'name':random.choice(car_names),
            'color':random.choice(colors)
        }
        yield car

# Get used memory
process = psutil.Process(os.getpid())  
print('Memory before list is created: ' + str(process.memory_info().rss/1000000))

# Call the car_list_gen function and time how long it takes
t1 = time.clock()  
cars = car_list_gen(1000000)  
t2 = time.clock()

# Get used memory
process = psutil.Process(os.getpid())  
print('Memory after list is created: ' + str(process.memory_info().rss/1000000))

print('Took {} seconds'.format(t2-t1))  

Following results were obtained by executing the above script:

$ python perf_gen.py 
Memory before list is created: 8  
Memory after list is created: 8  
Took 3e-06 seconds  

From the output, you can see that by using generators the memory difference is negligible (it remains at 8 MB) since the generators do not store the items in memory. Furthermore, the time taken to call the generator function was only 0.000003 seconds, which is also far less compared to time taken for list creation.

Conclusion

Hopefully from this article you have a better understanding of the yield keyword, including how it's used, what it's used for, and why you'd want to use it. Python generators are a great way to improve the performance of your programs and they're very simple to use, but understanding when to use them is the challenge for many novice programmers.