The yield
keyword in Python is used to create generators. A generator is a type of collection that produces items on-the-fly and can only be iterated once. By using generators you can improve your application's performance and consume less memory as compared to normal collections, so it provides a nice boost in performance.
In this article we'll explain how to use the yield
keyword in Python and what it does exactly. But first, let's study the difference between a simple list collection and generator, and then we will see how yield
can be used to create more complex generators.
Differences Between a List and Generator
In the following script we will create both a list and a generator and will try to see where they differ. First we'll create a simple list and check its type:
# Creating a list using list comprehension
squared_list = [x**2 for x in range(5)]
# Check the type
type(squared_list)
When running this code you should see that the type displayed will be "list".
Now let's iterate over all the items in the squared_list
.
# Iterate over items and print them
for number in squared_list:
print(number)
The above script will produce following results:
$ python squared_list.py
0
1
4
9
16
Now let's create a generator and perform the same exact task:
# Creating a generator
squared_gen = (x**2 for x in range(5))
# Check the type
type(squared_gen)
To create a generator, you start exactly as you would with list comprehension, but instead you have to use parentheses instead of square brackets. The above script will display "generator" as the type for squared_gen
variable. Now let's iterate over the generator using a for-loop.
for number in squared_gen:
print(number)
The output will be:
$ python squared_gen.py
0
1
4
9
16
The output is the same as that of the list. So what is the difference? One of the main differences lies in the way the list and generators store elements in the memory. Lists store all of the elements in memory at once, whereas generators "create" each item on-the-fly, displays it, and then moves to the next element, discarding the previous element from the memory.
One way to verify this is to check the length of both the list and generator that we just created. The len(squared_list)
will return 5 while len(squared_gen)
will throw an error that a generator has no length. Also, you can iterate over a list as many times as you want but you can iterate over a generator only once. To iterate again, you must create the generator again.
Using the Yield Keyword
Now we know the difference between simple collections and generators, let us see how yield
can help us define a generator.
In the previous examples, we created a generator implicitly using the list comprehension style. However in more complex scenarios we can instead create functions that return a generator. The yield
keyword, unlike the return
statement, is used to turn a regular Python function in to a generator. This is used as an alternative to returning an entire list at once. This will be again explained with the help of some simple examples.
Again, let's first see what our function returns if we do not use the yield
keyword. Execute the following script:
def cube_numbers(nums):
cube_list =[]
for i in nums:
cube_list.append(i**3)
return cube_list
cubes = cube_numbers([1, 2, 3, 4, 5])
print(cubes)
In this script a function cube_numbers
is created that accepts a list of numbers, take their cubes and returns the entire list to the caller. When this function is called, a list of cubes is returned and stored in the cubes
variable. You can see from the output that the returned data is in-fact a full list:
$ python cubes_list.py
[1, 8, 27, 64, 125]
Now, instead of returning a list, let's modify the above script so that it returns a generator.
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
def cube_numbers(nums):
for i in nums:
yield(i**3)
cubes = cube_numbers([1, 2, 3, 4, 5])
print(cubes)
In the above script, the cube_numbers
function returns a generator instead of list of cubed number. It's very simple to create a generator using the yield
keyword. Here we do not need the temporary cube_list
variable to store cubed number, so even our cube_numbers
method is simpler. Also, no return
statement is needed, but instead the yield
keyword is used to return the cubed number inside of the for-loop.
Now, when cube_number
function is called, a generator is returned, which we can verify by running the code:
$ python cubes_gen.py
<generator object cube_numbers at 0x1087f1230>
Even though we called the cube_numbers
function, it doesn't actually execute at this point in time, and there are not yet any items stored in memory.
To get the function to execute, and therefore the next item from generator, we use the built-in next
method. When you call the next
iterator on the generator for the first time, the function is executed until the yield
keyword is encountered. Once yield
is found the value passed to it is returned to the calling function and the generator function is paused in its current state.
Here is how you get a value from your generator:
next(cubes)
The above function will return "1". Now when you call next
again on the generator, the cube_numbers
function will resume executing from where it stopped previously at yield
. The function will continue to execute until it finds yield
again. The next
function will keep returning cubed value one by one until all the values in the list are iterated.
Once all the values are iterated the next
function throws a StopIteration exception. It is important to mention that the cubes
generator doesn't store any of these items in memory, rather the cubed values are computed at runtime, returned, and forgotten. The only extra memory used is the state data for the generator itself, which is usually much less than a large list. This makes generators ideal for memory-intensive tasks.
Instead of always having to use the next
iterator, you can instead use a "for" loop to iterate over a generators values. When using a "for" loop, behind the scenes the next
iterator is called until all the items in the generator are iterated over.
Optimized Performance
As mentioned earlier, generators are very handy when it comes to memory-intensive tasks since they do not need to store all of the collection items in memory, rather they generate items on the fly and discards it as soon as the iterator moves to the next item.
In the previous examples the performance difference of a simple list and generator was not visible since the list sizes were so small. In this section we'll check out some examples where we can distinguish between the performance of lists and generators.
In the code below we will write a function that returns a list that contains 1 million dummy car
objects. We will calculate the memory occupied by the process before and after calling the function (which creates the list).
Take a look at the following code:
import time
import random
import os
import psutil
car_names = ['Audi', 'Toyota', 'Renault', 'Nissan', 'Honda', 'Suzuki']
colors = ['Black', 'Blue', 'Red', 'White', 'Yellow']
def car_list(cars):
all_cars = []
for i in range(cars):
car = {
'id': i,
'name': random.choice(car_names),
'color': random.choice(colors)
}
all_cars.append(car)
return all_cars
# Get used memory
process = psutil.Process(os.getpid())
print('Memory before list is created: ' + str(process.memory_info().rss/1000000))
# Call the car_list function and time how long it takes
t1 = time.clock()
cars = car_list(1000000)
t2 = time.clock()
# Get used memory
process = psutil.Process(os.getpid())
print('Memory after list is created: ' + str(process.memory_info().rss/1000000))
print('Took {} seconds'.format(t2-t1))
Note: You may have to pip install psutil
to get this code to work on your machine.
In the machine on which the code was run, following results were obtained (yours may look slightly different):
$ python perf_list.py
Memory before list is created: 8
Memory after list is created: 334
Took 1.584018 seconds
Before the list was created the process memory was 8 MB, and after the creation of list with 1 million items, the occupied memory jumped to 334 MB. Also, the time it took to create the list was 1.58 seconds.
Now, let's repeat the above process but replace the list with generator. Execute the following script:
import time
import random
import os
import psutil
car_names = ['Audi', 'Toyota', 'Renault', 'Nissan', 'Honda', 'Suzuki']
colors = ['Black', 'Blue', 'Red', 'White', 'Yellow']
def car_list_gen(cars):
for i in range(cars):
car = {
'id':i,
'name':random.choice(car_names),
'color':random.choice(colors)
}
yield car
# Get used memory
process = psutil.Process(os.getpid())
print('Memory before list is created: ' + str(process.memory_info().rss/1000000))
# Call the car_list_gen function and time how long it takes
t1 = time.clock()
for car in car_list_gen(1000000):
pass
t2 = time.clock()
# Get used memory
process = psutil.Process(os.getpid())
print('Memory after list is created: ' + str(process.memory_info().rss/1000000))
print('Took {} seconds'.format(t2-t1))
Here we have to use the for car in car_list_gen(1000000)
loop to ensure that all 1000000 cars are actually generated.
Following results were obtained by executing the above script:
$ python perf_gen.py
Memory before list is created: 8
Memory after list is created: 40
Took 1.365244 seconds
From the output, you can see that by using generators the memory difference is much smaller than before (from 8 MB to 40 MB) since the generators do not store the items in memory. Furthermore, the time taken to call the generator function was a bit faster as well at 1.37 seconds, which is about 14% faster than the list creation.
Conclusion
Hopefully from this article you have a better understanding of the yield
keyword, including how it's used, what it's used for, and why you'd want to use it. Python generators are a great way to improve the performance of your programs and they're very simple to use, but understanding when to use them is the challenge for many novice programmers.