Introduction
Collections in Python are containers that are used to store collections of data, for example, list, dict, set, tuple etc. These are built-in collections. Several modules have been developed that provide additional data structures to store collections of data. One such module is the Python collections module.
Python collections module was introduced to improve the functionalities of the built-in collection containers. The Python collections
module was first introduced in its 2.4 release. This tutorial is based on its latest stable release (3.7 version).
collections
Module
In this tutorial we will discuss 6 of the most commonly used data structures from the Python collections
module. They are as follows:
The Counter
Counter is a subclass of the dictionary object. The Counter()
function in the collections
module takes an iterable or a mapping as the argument and returns a Dictionary. In this dictionary, a key is an element in the iterable or the mapping and value is the number of times that element exists in the iterable or the mapping.
You have to import the Counter
class before you can create a counter
instance.
from collections import Counter
Create Counter Objects
There are multiple ways to create counter
objects. The simplest way is to use the Counter()
function without any arguments.
cnt = Counter()
You can pass an iterable (list) to the Counter()
function to create a counter
object.
list = [1,2,3,4,1,2,6,7,3,8,1]
Counter(list)
Finally, the Counter()
function can take a dictionary as an argument. In this dictionary, the value of a key should be the 'count' of that key.
Counter({1:3,2:4})
You can access any counter item with its key as shown below:
list = [1,2,3,4,1,2,6,7,3,8,1]
cnt = Counter(list)
print(cnt[1])
when you print cnt[1]
, you will get the count of 1.
Output:
3
In the above examples, cnt
is an object of the Counter
class which is a subclass of dict
. So it has all the methods of the dict
class.
Apart from that, Counter
has three additional functions:
- Elements
- Most_common([n])
- Subtract([interable-or-mapping])
The element() Function
You can get the items of a Counter
object with elements()
function. It returns a list containing all the elements in the Counter
object.
Look at the following example:
cnt = Counter({1:3,2:4})
print(list(cnt.elements()))
Output:
[1, 1, 1, 2, 2, 2, 2]
Here, we create a Counter
object with a dictionary as an argument. In this Counter object, count of 1 is 3 and count of 2 is 4. The elements()
function is called using cnt
object which returns an iterator which is passed as an argument to the list.
The iterator repeats 3 times over 1 returning three '1's, and repeats four times over 2 returning four '2's to the list. Finally, the list is printed using the print
function.
The most_common() Function
The Counter()
function returns a dictionary which is unordered. You can sort it according to the number of counts in each element using the most_common()
function of the Counter
object.
list = [1,2,3,4,1,2,6,7,3,8,1]
cnt = Counter(list)
print(cnt.most_common())
Output:
[(1, 3), (2, 2), (3, 2), (4, 1), (6, 1), (7, 1), (8, 1)]
You can see that the most_common
function returns a list, which is sorted based on the count of the elements. 1 has a count of three, therefore it is the first element of the list.
The subtract()
Function
The subtract()
takes iterable (list) or a mapping (dictionary) as an argument and deducts elements count using that argument. Check the following example:
cnt = Counter({1:3,2:4})
deduct = {1:1, 2:2}
cnt.subtract(deduct)
print(cnt)
Output:
Counter({1: 2, 2: 2})
You can notice that the cnt
object we first created has a count of 3 for '1' and count of 4 for '2'. The deduct
dictionary has the value '1' for key '1' and value '2' for key '2'. The subtract()
function deducted 1 count from key '1' and 2 counts from key '2'.
The defaultdict
The defaultdict
works exactly like a python dictionary, except it does not throw KeyError
when you try to access a non-existent key.
Instead, it initializes the key with the element of the data type that you pass as an argument at the creation of defaultdict
. The data type is called default_factory
.
Import defaultdict
First, you have to import defaultdict
from collections
module before using it:
from collections import defaultdict
Create a defaultdict
You can create a defaultdict
with the defaultdict()
constructor. You have to specify a data type as an argument. Check the following code:
nums = defaultdict(int)
nums['one'] = 1
nums['two'] = 2
print(nums['three'])
Output:
0
In this example, int
is passed as the default_factory
. Notice that you only pass int
, not int()
. Next, the values are defined for the two keys, namely, one
and two
, but in the next line we try to access a key that has not been defined yet.
In a normal dictionary, this will force a KeyError
. But defaultdict
initializes the new key with default_factory
's default value which is 0 for int
. Hence, when the program is executed, 0 will be printed. This particular feature of initializing non-existent keys can be exploited in various situations.
For example, let's say you want the count of each name in a list of names given as "Mike, John, Mike, Anna, Mike, John, John, Mike, Mike, Britney, Smith, Anna, Smith".
from collections import defaultdict
count = defaultdict(int)
names_list = "Mike John Mike Anna Mike John John Mike Mike Britney Smith Anna Smith".split()
for names in names_list:
count[names] +=1
print(count)
Output:
defaultdict(<class 'int'>, {'Mike': 5, 'Britney': 1, 'John': 3, 'Smith': 2, 'Anna': 2})
First, we create a defaultdict
with int
as default_factory
. The names_list
includes a set of names which repeat several times. The split()
function returns a list from the given string. It breaks the string whenever a white-space is encountered and returns words as elements of the list. In the loop, each item in the list is added to the defaultdict
named as count
and initialized to 0 based on default_factory
. If the same element is encountered again, as the loop continues, the count of that element will be incremented.
The OrderedDict
OrderedDict
is a dictionary where keys maintain the order in which they are inserted, which means if you change the value of a key later, it will not change the position of the key.
Import OrderedDict
To use OrderedDict
you have to import it from the collections
module.
from collections import OrderedDict
Create a OrderedDict
You can create an OrderedDict object with the OrderedDict()
constructor. In the following code, You create an OrderedDict
without any arguments. After that some items are inserted into it.
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
print(od)
Output:
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
You can access each element using a loop as well. Take a look at the following code:
for key, value in od.items():
print(key, value)
Output:
a 1
b 2
c 3
Following example is an interesting use case of OrderedDict
with Counter
. Here, we create a Counter
from a list and insert elements to an OrderedDict
based on their count.
Most frequently occurring letter will be inserted as the first key and the least frequently occurring letter will be inserted as the last key.
list = ["a","c","c","a","b","a","a","b","c"]
cnt = Counter(list)
od = OrderedDict(cnt.most_common())
for key, value in od.items():
print(key, value)
Output:
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
a 4
c 3
b 2
The deque
The deque
is a list optimized for inserting and removing items.
Import the deque
You have to import the deque
class from the collections
module before using it.
from collections import deque
Creating a deque
You can create a deque
with deque()
constructor. You have to pass a list as an argument.
list = ["a","b","c"]
deq = deque(list)
print(deq)
Output:
deque(['a', 'b', 'c'])
Inserting Elements to deque
You can easily insert an element to the deq
we created at either of the ends. To add an element to the right of the deq
, you have to use the append()
method.
If you want to add an element to the start of the deq
, you have to use the appendleft()
method.
deq.append("d")
deq.appendleft("e")
print(deq)deque
Output:
deque(['e', 'a', 'b', 'c', 'd'])
You can notice that d
is added at the end of deq
and e
is added to the start of the deq
.
Removing Elements from the deque
Removing elements is similar to inserting elements. You can remove an element the similar way you insert elements. To remove an element from the right end, you can use the pop()
function and to remove an element from the left, you can use popleft()
.
deq.pop()
deq.popleft()
print(deq)
Output:
deque(['a', 'b', 'c'])
You can notice that both the first and last elements are removed from the deq
.
Clearing a deque
If you want to remove all elements from a deque
, you can use the clear()
function.
list = ["a","b","c"]
deq = deque(list)
print(deq)
print(deq.clear())
Output:
deque(['a', 'b', 'c'])
None
You can see in the output, at first there is a queue with three elements. Once we apply the clear()
function, the deque
is cleared and you see none
in the output.
Counting Elements in a deque
If you want to find the count of a specific element, use the count(x)
function. You have to specify the element for which you need to find the count, as the argument.
list = ["a","b","c"]
deq = deque(list)
print(deq.count("a"))
Output:
1
In the above example, the count of 'a' is 1. Hence '1' printed.
The ChainMap
ChainMap
is used to combine several dictionaries or mappings. It returns a list of dictionaries.
Import chainmap
You have to import ChainMap
from the collections
module before using it.
from collections import ChainMap
Create a ChainMap
To create a chain map we can use the ChainMap()
constructor. We have to pass the dictionaries we are going to combine as an argument set.
dict1 = { 'a' : 1, 'b' : 2 }
dict2 = { 'c' : 3, 'b' : 4 }
chain_map = ChainMap(dict1, dict2)
print(chain_map.maps)
Output:
[{'b': 2, 'a': 1}, {'c': 3, 'b': 4}]
You can see a list of dictionaries as the output. You can access chain map values by key name.
print(chain_map['a'])
Output:
1
'1' is printed as the value of key 'a' is 1. Another important point is that ChainMap
updates its values when its associated dictionaries are updated. For example, if you change the value of 'c' in dict2
to '5', you will notice the change in ChainMap
as well.
dict2['c'] = 5
print(chain_map.maps)
Output:
[{'a': 1, 'b': 2}, {'c': 5, 'b': 4}]
Getting Keys and Values from ChainMap
You can access the keys of a ChainMap
with keys()
function. Similarly, you can access the values of elements with values()
function, as shown below:
dict1 = { 'a' : 1, 'b' : 2 }
dict2 = { 'c' : 3, 'b' : 4 }
chain_map = ChainMap(dict1, dict2)
print (list(chain_map.keys()))
print (list(chain_map.values()))
Output:
['b', 'a', 'c']
[2, 1, 3]
Notice that the value of the key 'b' in the output is the value of key 'b' in dict1
. As a rule of thumb, when one key appears in more than one associated dictionary, ChainMap
takes the value for that key from the first dictionary.
Adding a New Dictionary to ChainMap
If you want to add a new dictionary to an existing ChainMap
, use the new_child()
function. It creates a new ChainMap
with the newly added dictionary.
dict3 = {'e' : 5, 'f' : 6}
new_chain_map = chain_map.new_child(dict3)
print(new_chain_map)
Output:
ChainMap({'f': 6, 'e': 5}, {'a': 1, 'b': 2}, {'b': 4, 'c': 3})
Notice that a new dictionary is added to the beginning of the ChainMap
list.
The namedtuple()
The namedtuple()
returns a tuple with names for each position in the tuple. One of the biggest problems with ordinary tuples is that you have to remember the index of each field of a tuple object. This is obviously difficult. The namedtuple
was introduced to solve this problem.
Import namedtuple
Before using namedtuple
, you have to import it from the collections
module.
from collections import namedtuple
Create a namedtuple
from collections import namedtuple
Student = namedtuple('Student', 'fname, lname, age')
s1 = Student('John', 'Clarke', '13')
print(s1.fname)
Output:
Student(fname='John', lname='Clarke', age='13')
In this example, a namedtuple
object Student
has been declared. You can access the fields of any instance of a Student
class by the defined field name.
Creating a namedtuple
Using a list
The namedtuple()
function requires each value to be passed to it separately. Instead, you can use _make()
to create a namedtuple
instance with a list. Check the following code:
s2 = Student._make(['Adam','joe','18'])
print(s2)
Output:
Student(fname='Adam', lname='joe', age='18')
Create a New Instance Using Existing Instance
The _asdict()
function can be used to create an OrderedDict
instance from an existing instance.
s2 = s1._asdict()
print(s2)
Output:
OrderedDict([('fname', 'John'), ('lname', 'Clarke'), ('age', '13')])
Changing Field Values with the _replace()
Function
To change the value of a field of an instance, the _replace()
function is used. Remember that, _replace()
function creates a new instance. It does not change the value of the existing instance.
s2 = s1._replace(age='14')
print(s1)
print(s2)
Output:
Student(fname='John', lname='Clarke', age='13')
Student(fname='John', lname='Clarke', age='14')
Conclusion
With that, we conclude our tutorial on the Collections module. We have discussed all the important topics in the collection
module. The Python collection
module still needs improvements if we compare it with Java's Collection library. Therefore, we can expect a lot of changes in upcoming versions.