Introduction to Python's Collections Module

Introduction

Collections in Python are containers that are used to store collections of data, for example, list, dict, set, tuple etc. These are built-in collections. Several modules have been developed that provide additional data structures to store collections of data. One such module is the Python collections module.

Python collections module was introduced to improve the functionalities of the built-in collection containers. The Python collections module was first introduced in its 2.4 release. This tutorial is based on its latest stable release (3.7 version).

collections Module

In this tutorial we will discuss 6 of the most commonly used data structures from the Python collections module. They are as follows:

The Counter

Counter is a subclass of the dictionary object. The Counter() function in the collections module takes an iterable or a mapping as the argument and returns a Dictionary. In this dictionary, a key is an element in the iterable or the mapping and value is the number of times that element exists in the iterable or the mapping.

You have to import the Counter class before you can create a counter instance.

from collections import Counter

Create Counter Objects

There are multiple ways to create counter objects. The simplest way is to use the Counter() function without any arguments.

cnt = Counter()

You can pass an iterable (list) to the Counter() function to create a counter object.

list = [1,2,3,4,1,2,6,7,3,8,1]
Counter(list)

Finally, the Counter() function can take a dictionary as an argument. In this dictionary, the value of a key should be the 'count' of that key.

Counter({1:3,2:4})

You can access any counter item with its key as shown below:

list = [1,2,3,4,1,2,6,7,3,8,1]
cnt = Counter(list)
print(cnt[1])

when you print cnt[1], you will get the count of 1.

Output:

3

In the above examples, cnt is an object of the Counter class which is a subclass of dict. So it has all the methods of the dict class.

Apart from that, Counter has three additional functions:

  1. Elements
  2. Most_common([n])
  3. Subtract([interable-or-mapping])

The element() Function

You can get the items of a Counter object with elements() function. It returns a list containing all the elements in the Counter object.

Look at the following example:

cnt = Counter({1:3,2:4})
print(list(cnt.elements()))

Output:

[1, 1, 1, 2, 2, 2, 2]

Here, we create a Counter object with a dictionary as an argument. In this Counter object, count of 1 is 3 and count of 2 is 4. The elements() function is called using cnt object which returns an iterator which is passed as an argument to the list.

The iterator repeats 3 times over 1 returning three '1's, and repeats four times over 2 returning four '2's to the list. Finally, the list is printed using the print function.

The most_common() Function

The Counter() function returns a dictionary which is unordered. You can sort it according to the number of counts in each element using the most_common() function of the Counter object.

list = [1,2,3,4,1,2,6,7,3,8,1]
cnt = Counter(list)
print(cnt.most_common())

Output:

[(1, 3), (2, 2), (3, 2), (4, 1), (6, 1), (7, 1), (8, 1)]

You can see that the most_common function returns a list, which is sorted based on the count of the elements. 1 has a count of three, therefore it is the first element of the list.

The subtract() Function

The subtract() takes iterable (list) or a mapping (dictionary) as an argument and deducts elements count using that argument. Check the following example:

cnt = Counter({1:3,2:4})
deduct = {1:1, 2:2}
cnt.subtract(deduct)
print(cnt)

Output:

Counter({1: 2, 2: 2})

You can notice that the cnt object we first created has a count of 3 for '1' and count of 4 for '2'. The deduct dictionary has the value '1' for key '1' and value '2' for key '2'. The subtract() function deducted 1 count from key '1' and 2 counts from key '2'.

The defaultdict

The defaultdict works exactly like a python dictionary, except it does not throw KeyError when you try to access a non-existent key.

Instead, it initializes the key with the element of the data type that you pass as an argument at the creation of defaultdict. The data type is called default_factory.

Import defaultdict

First, you have to import defaultdict from collections module before using it:

from collections import defaultdict

Create a defaultdict

You can create a defaultdict with the defaultdict() constructor. You have to specify a data type as an argument. Check the following code:

nums = defaultdict(int)
nums['one'] = 1
nums['two'] = 2
print(nums['three'])

Output:

0

In this example, int is passed as the default_factory. Notice that you only pass int, not int(). Next, the values are defined for the two keys, namely, one and two, but in the next line we try to access a key that has not been defined yet.

In a normal dictionary, this will force a KeyError. But defaultdict initializes the new key with default_factory's default value which is 0 for int. Hence, when the program is executed, 0 will be printed. This particular feature of initializing non-existent keys can be exploited in various situations.

For example, let's say you want the count of each name in a list of names given as "Mike, John, Mike, Anna, Mike, John, John, Mike, Mike, Britney, Smith, Anna, Smith".

from collections import defaultdict

count = defaultdict(int)
names_list = "Mike John Mike Anna Mike John John Mike Mike Britney Smith Anna Smith".split()
for names in names_list:
    count[names] +=1
print(count)

Output:

defaultdict(<class 'int'>, {'Mike': 5, 'Britney': 1, 'John': 3, 'Smith': 2, 'Anna': 2})

First, we create a defaultdict with int as default_factory. The names_list includes a set of names which repeat several times. The split() function returns a list from the given string. It breaks the string whenever a white-space is encountered and returns words as elements of the list. In the loop, each item in the list is added to the defaultdict named as count and initialized to 0 based on default_factory. If the same element is encountered again, as the loop continues, the count of that element will be incremented.

The OrderedDict

OrderedDict is a dictionary where keys maintain the order in which they are inserted, which means if you change the value of a key later, it will not change the position of the key.

Import OrderedDict

To use OrderedDict you have to import it from the collections module.

from collections import OrderedDict

Create a OrderedDict

You can create an OrderedDict object with the OrderedDict() constructor. In the following code, You create an OrderedDict without any arguments. After that some items are inserted into it.

od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
print(od)

Output:

OrderedDict([('a', 1), ('b', 2), ('c', 3)])

You can access each element using a loop as well. Take a look at the following code:

for key, value in od.items():
    print(key, value)

Output:

a 1
b 2
c 3

Following example is an interesting use case of OrderedDict with Counter. Here, we create a Counter from a list and insert elements to an OrderedDict based on their count.

Most frequently occurring letter will be inserted as the first key and the least frequently occurring letter will be inserted as the last key.

list = ["a","c","c","a","b","a","a","b","c"]
cnt = Counter(list)
od = OrderedDict(cnt.most_common())
for key, value in od.items():
    print(key, value)

Output:

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

a 4
c 3
b 2

The deque

The deque is a list optimized for inserting and removing items.

Import the deque

You have to import the deque class from the collections module before using it.

from collections import deque

Creating a deque

You can create a deque with deque() constructor. You have to pass a list as an argument.

list = ["a","b","c"]
deq = deque(list)
print(deq)

Output:

deque(['a', 'b', 'c'])

Inserting Elements to deque

You can easily insert an element to the deq we created at either of the ends. To add an element to the right of the deq, you have to use the append() method.

If you want to add an element to the start of the deq, you have to use the appendleft() method.

deq.append("d")
deq.appendleft("e")
print(deq)deque

Output:

deque(['e', 'a', 'b', 'c', 'd'])

You can notice that d is added at the end of deq and e is added to the start of the deq.

Removing Elements from the deque

Removing elements is similar to inserting elements. You can remove an element the similar way you insert elements. To remove an element from the right end, you can use the pop() function and to remove an element from the left, you can use popleft().

deq.pop()
deq.popleft()
print(deq)

Output:

deque(['a', 'b', 'c'])

You can notice that both the first and last elements are removed from the deq.

Clearing a deque

If you want to remove all elements from a deque, you can use the clear() function.

list = ["a","b","c"]
deq = deque(list)
print(deq)
print(deq.clear())

Output:

deque(['a', 'b', 'c'])
None

You can see in the output, at first there is a queue with three elements. Once we apply the clear() function, the deque is cleared and you see none in the output.

Counting Elements in a deque

If you want to find the count of a specific element, use the count(x) function. You have to specify the element for which you need to find the count, as the argument.

list = ["a","b","c"]
deq = deque(list)
print(deq.count("a"))

Output:

1

In the above example, the count of 'a' is 1. Hence '1' printed.

The ChainMap

ChainMap is used to combine several dictionaries or mappings. It returns a list of dictionaries.

Import chainmap

You have to import ChainMap from the collections module before using it.

from collections import ChainMap

Create a ChainMap

To create a chain map we can use the ChainMap() constructor. We have to pass the dictionaries we are going to combine as an argument set.

dict1 = { 'a' : 1, 'b' : 2 }
dict2 = { 'c' : 3, 'b' : 4 }
chain_map = ChainMap(dict1, dict2)
print(chain_map.maps)

Output:

[{'b': 2, 'a': 1}, {'c': 3, 'b': 4}]

You can see a list of dictionaries as the output. You can access chain map values by key name.

print(chain_map['a'])

Output:

1

'1' is printed as the value of key 'a' is 1. Another important point is that ChainMap updates its values when its associated dictionaries are updated. For example, if you change the value of 'c' in dict2 to '5', you will notice the change in ChainMap as well.

dict2['c'] = 5
print(chain_map.maps)

Output:

[{'a': 1, 'b': 2}, {'c': 5, 'b': 4}]

Getting Keys and Values from ChainMap

You can access the keys of a ChainMap with keys() function. Similarly, you can access the values of elements with values() function, as shown below:

dict1 = { 'a' : 1, 'b' : 2 }
dict2 = { 'c' : 3, 'b' : 4 }
chain_map = ChainMap(dict1, dict2)
print (list(chain_map.keys()))
print (list(chain_map.values()))

Output:

['b', 'a', 'c']
[2, 1, 3]

Notice that the value of the key 'b' in the output is the value of key 'b' in dict1. As a rule of thumb, when one key appears in more than one associated dictionary, ChainMap takes the value for that key from the first dictionary.

Adding a New Dictionary to ChainMap

If you want to add a new dictionary to an existing ChainMap, use the new_child() function. It creates a new ChainMap with the newly added dictionary.

dict3 = {'e' : 5, 'f' : 6}
new_chain_map = chain_map.new_child(dict3)
print(new_chain_map)

Output:

ChainMap({'f': 6, 'e': 5}, {'a': 1, 'b': 2}, {'b': 4, 'c': 3})

Notice that a new dictionary is added to the beginning of the ChainMap list.

The namedtuple()

The namedtuple() returns a tuple with names for each position in the tuple. One of the biggest problems with ordinary tuples is that you have to remember the index of each field of a tuple object. This is obviously difficult. The namedtuple was introduced to solve this problem.

Import namedtuple

Before using namedtuple, you have to import it from the collections module.

from collections import namedtuple

Create a namedtuple

from collections import namedtuple

Student = namedtuple('Student', 'fname, lname, age')
s1 = Student('John', 'Clarke', '13')
print(s1.fname)

Output:

Student(fname='John', lname='Clarke', age='13')

In this example, a namedtuple object Student has been declared. You can access the fields of any instance of a Student class by the defined field name.

Creating a namedtuple Using a list

The namedtuple() function requires each value to be passed to it separately. Instead, you can use _make() to create a namedtuple instance with a list. Check the following code:

s2 = Student._make(['Adam','joe','18'])
print(s2)

Output:

Student(fname='Adam', lname='joe', age='18')

Create a New Instance Using Existing Instance

The _asdict() function can be used to create an OrderedDict instance from an existing instance.

s2 = s1._asdict()
print(s2)

Output:

OrderedDict([('fname', 'John'), ('lname', 'Clarke'), ('age', '13')])

Changing Field Values with the _replace() Function

To change the value of a field of an instance, the _replace() function is used. Remember that, _replace() function creates a new instance. It does not change the value of the existing instance.

s2 = s1._replace(age='14')
print(s1)
print(s2)

Output:

Student(fname='John', lname='Clarke', age='13')
Student(fname='John', lname='Clarke', age='14')

Conclusion

With that, we conclude our tutorial on the Collections module. We have discussed all the important topics in the collection module. The Python collection module still needs improvements if we compare it with Java's Collection library. Therefore, we can expect a lot of changes in upcoming versions.

References

Last Updated: August 2nd, 2023
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Project

Building Your First Convolutional Neural Network With Keras

# python# artificial intelligence# machine learning# tensorflow

Most resources start with pristine datasets, start at importing and finish at validation. There's much more to know. Why was a class predicted? Where was...

David Landup
David Landup
Details
Course

Data Visualization in Python with Matplotlib and Pandas

# python# pandas# matplotlib

Data Visualization in Python with Matplotlib and Pandas is a course designed to take absolute beginners to Pandas and Matplotlib, with basic Python knowledge, and...

David Landup
David Landup
Details

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms