Introduction
Python has a lot of built-in tools that allow us to iterate and transform data. A great example is the itertools
module, which offers several convenient iteration functions. Each of these iterator-building functions (they generate iterators) can be used on their own, or combined.
The module was inspired by functional languages such as APL, Haskell, and SPL and the elements within itertools
from Python's iterator algebra.
Iterable vs Iterator
Before we dive into the iteration, let's first define the distinction between two important terms: iterable and iterator.
An iterable is an object which can be iterated over. When using the iter()
function, an iterator is being generated. Generally speaking, most sequences are iterable, such as lists, tuples, strings, etc.
An iterator is also an object, which is used to iterate over an iterable and an iterator can also iterate over itself. This is done by using the next()
method, passing in the iterator that we're trying to traverse.
The next()
method returns the next element of an iterable object. An iterator can be generated from an iterable (using iter()
):
list = [1,2,3,4,5]
iterator = iter(list)
print(iterator)
This results in:
<list_iterator object at 0x0000018E393A0F28>
Now, let's access the next()
element (starting at the first one) using our iterator
:
print(next(iterator))
This results in:
1
This is practically what happens under the hood of the for
loop - it calls iter()
on the collection you're iterating over, and after that, the next()
element is accessed n
times.
In this tutorial, we'll be taking a look at a few Python iteration tools:
The count() Function
The count(start, step)
function creates an iterator and is used to generate evenly-spaced values, where the space between them is defined by the step
argument. The start
argument defines the starting value of the iterator - and these are set to start=0
and step=1
by default.
Without a braking condition, the count()
function will continue counting indefinitely (on a system with indefinite memory):
from itertools import count
iterator_count = count(start=0, step=5)
for i in iterator_count:
if(i == 25):
break
print(i)
Note: Using count()
like this is unusual. You'd typically chain it with other methods, such as zip()
, map()
, or imap()
.
The iterator iterates over itself here, printing values in steps of 5:
0
5
10
15
20
Given its generative nature, this function is most commonly used with other functions that expect new or generate sequences.
For example, when using zip()
to zip together multiple items of a list, you might want to annotate them via a positional index. While zipping, we'd use count()
to generate values for these indices:
from itertools import count
list = ['John', 'Marie', 'Jack', 'Anna']
for i in zip(count(), list):
print(i)
This results in:
(0, 'John')
(1, 'Marie')
(2, 'Jack')
(3, 'Anna')
Advice: If you'd like to read more about the zip()
function, s well as some other commonly used functions alongside it - read our guide on Python Iteration Tools - filter(), islice(), map() and zip().
The cycle() Function
The cycle()
function accepts an iterable and generates an iterator, which contains all of the iterable's elements. In addition to these elements, it contains a copy of each element.
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
Once we iterate through to the end of the elements, we start iterating through the copies. While iterating through the copies, new copies are made. Once the first set of copies runs out - we iterate through the new set.
This process is repeated indefinitely.
Note: Given this fact, using cycle()
, especially for longer sequences is extremely memory-burdening.
Beware of infinite, recursive creation logic, since you'll easily run out of memory to house it all:
from itertools import cycle
list = [1,2,3,4]
iterator = cycle(list)
for i in iterator:
print(i)
This results in:
1
2
3
4
1
2
3
4
...
Until we terminate the program or run out of memory. That being said - you should always have an exit/termination condition for the cycle()
function.
Given the fact that cycle()
can cycle through any iterable, we can easily apply it to strings and tuples as well:
from itertools import cycle
string = "This is a random string"
iterator = cycle(string)
for i in iterator:
print(i)
This results in an endless sequence of:
T
h
i
s
i
s
a
r
a
n
d
o
...
The chain() Function
The chain()
function is used to chain multiple iterables together, by generating an iterator that traverses them sequentially, one after the other:
result = list(chain([1, 2, 3],
["one", "two", "three"],
"String",
("this", "is", "a", "tuple")))
print(result)
The output will be:
[1, 2, 3, 'one', 'two', 'three', 'S', 't', 'r', 'i', 'n', 'g', 'this', 'is', 'a', 'tuple']
Here, we've got four different types of iterables - each one being chained together.
Even though ["one", "two", "three"]
is a list of strings, chain()
treats this as a list and simply chains its elements without calling a subsequent chain()
for each of the strings. On the other hand, "String"
is broken down into its constituent characters.
The former can be achieved with another method, derived from the chain()
function - chain.from_iterable()
:
result2 = list(chain(["one", "two", "three"]))
result3 = list(chain.from_iterable(["one", "two", "three"]))
print(result2)
print(result3)
The chain()
function behaves the same as we've previously observed - it chains the elements as they are. On the other hand, the chain.from_iterable()
method treats each element as an iterable and returns its constituent elements alongside other elements broken down in the same fashion:
['one', 'two', 'three']
['o', 'n', 'e', 't', 'w', 'o', 't', 'h', 'r', 'e', 'e']
Commonly, you'd use chain.from_iterable()
to calculate the sum of digits, contained within several collections that you first chain together, and then calculate the sum()
for:
from itertools import chain
number_list = [[1, 2, 3],[4, 5, 6],[7, 8, 9]]
result = list(chain.from_iterable(number_list))
print(sum(result))
Each element of the number_list
collection is another list. Since lists are iterable, the chain.from_iterable()
call breaks these down into a single list containing elements from [1..9]
, after which we calculate their sum()
and print the result:
45
Conclusion
The itertools
module introduces us to several useful convenience functions for working with iterables and iteration.
Many of these can be used as standalone convenience functions, but they're most commonly chained with other functions to transform data.