Python has a lot of built-in tools that allow us to iterate and transform data. A great example is the
itertools module, which offers several convenient iteration functions. Each of these iterator-building functions (they generate iterators) can be used on their own, or combined.
The module was inspired by functional languages such as APL, Haskell and SPL and the elements within
itertools form Python's iterator algebra.
Iterable vs Iterator
Before we dive into the iteration, let's first define the distinction between two important terms: iterable and iterator.
An iterable is an object which can be iterated over. When using the
iter() function, an iterator is being generated. Generally speaking, most sequences are iterable, such as lists, tuples, strings, etc.
An iterator is also an object, which is used to iterate over an iterable and an iterator can also iterate over itself. This is done by using the
next() method, passing in the iterator that we're trying to traverse.
next() method returns next element of an iterable object. An iterator can be generated from an iterable (using
list = [1,2,3,4,5] iterator = iter(list) print(iterator)
This resuls in:
<list_iterator object at 0x0000018E393A0F28>
Now, let's access the
next() element (starting at the first one) using our
This results in:
This is practically what happens under the hood of the
for loop - it calls
iter() on the collection you're iterating over, and and after that, the
next() element is accessed
In this tutorial, we'll be taking a look at a few Python iteration tools:
The count() Function
count(start, step) function creates an iterator, and is used to generate evenly-spaced values, where the space between them is defined by the
step argument. The
start argument defines the starting value of the iterator - and these are set to
step=1 by default.
Without a breaking condition, the
count() function will continue counting indefinitely (on a system with indefinite memory):
from itertools import count iterator_count = count(start=0, step=5) for i in iterator_count: if(i == 25): break print(i)
count() like this is unusual. You'd typically chain it with other methods, such as
The iterator iterates over itself here, printing values in steps of 5:
0 5 10 15 20
Given its generative nature, this function is most commonly used with other functions that expect new or generates sequences.
For example, when using
zip() to zip together multiple items of a list, you might want to annotate them via a positional index. While zipping, we'd use
count() to generate values for these indices:
from itertools import count list = ['John', 'Marie', 'Jack', 'Anna'] for i in zip(count(), list): print(i)
Which results in:
(0, 'John') (1, 'Marie') (2, 'Jack') (3, 'Anna')
If you'd like to read more about the
zip() function, s well as some other commonly used functions alongside it - read our guide on Python Iteration Tools - filter(), islice(), map() and zip().
The cycle() Function
cycle() function accepts an iterable and generates an iterator, which contains all of the iterable's elements. In addition to these elements, it contains a copy of each element.
Once we iterate through to the end of the elemenst, we start iterating through the copies. While iterating through the copies, new copies are made. Once the first set of copies runs out - we iterate through the new set.
This process is repeated indefinitely.
Note: Given this fact, using
cycle(), especially for longer sequences is extremely memory-burdening. Beware of infinite, recursive creation logic, since you'll easily run out of memory to house it all:
from itertools import cycle list = [1,2,3,4] iterator = cycle(list) for i in iterator: print(i)
This results in:
1 2 3 4 1 2 3 4 ...
Until we terminate the program or run out of memory. That being said - you should always have an exit/termination condition for the
Given the fact that
cycle() can cycle through any iterable, we can easily apply it to strings and tuples as well:
from itertools import cycle string = "This is a random string" iterator = cycle(string) for i in iterator: print(i)
This results in an endless sequence of:
T h i s i s a r a n d o ...
The chain() Function
chain() function is used to chain multiple iterables together, by generating an iterator that traverses them sequentially, one after the other:
result = list(chain([1, 2, 3], ["one", "two", "three"], "String", ("this", "is", "a", "tuple"))) print(result)
The output will be:
[1, 2, 3, 'one', 'two', 'three', 'S', 't', 'r', 'i', 'n', 'g', 'this', 'is', 'a', 'tuple']
Here, we've got four different types of iterables - each one being chained together.
["one", "two", "three"] is a list of strings,
chain() treats this as a list and simply chains its elements without calling a subsequent
chain() for each of the strings. On the other hand,
"String" is broken down into its constituent characters.
The former can be achieved with another method, derived from the
chain() function -
result2 = list(chain(["one", "two", "three"])) result3 = list(chain.from_iterable(["one", "two", "three"])) print(result2) print(result3)
chain() function behaves the same as we've previously observed - it chains the elements as they are. On the other hand, the
chain.from_iterable() method treats each element as an iterable and returns its constituent elements alongside other elements broken down in the same fashion:
['one', 'two', 'three'] ['o', 'n', 'e', 't', 'w', 'o', 't', 'h', 'r', 'e', 'e']
Commonly, you'd use
chain.from_iterable() to calculate the sum of digits, contained within a several collections that you first chain together, and then calculate the
from itertools import chain number_list = [[1, 2, 3],[4, 5, 6],[7, 8, 9]] result = list(chain.from_iterable(number_list)) print(sum(result))
Each element of the
number_list collection is another list. Since lists are iterable, the
chain.from_iterable() call breaks these down into a single list containing elements from
[1..9], after which we calculate their
sum() and print the result:
itertools module introduces us to several useful convenience functions for working with iterables and iteration.
Many of these can be used as standalone convenience functions, but they're most commonly chained with other functions to transform data.