Introduction
In this guide, we'll walk you through the differences between two of Python's most popular data structures - Dictionaries and Arrays. Each of these provide a specific way of arranging your data, with pros and cons for certain tasks and knowing when to use which will allow you to leverage the built-in functionalities.
Note: This guide assumes Python 3.x, and most of it is oriented at versions after that. We will, however, also note some key differences for Python 2.x.
Guide to Python Arrays
An Array is one of the fundamental data structures in computer science - a sequence of 0..n elements, where each element has an index.
Most arrays have a fixed size, so they take a chunk of memory every time a new one is created:
Here, we've got a simple array consisting of 7 elements. Indexing typically starts at 0
, and each element has a positional index that we can use to access it. This makes the array's access time complexity O(1).
Most of Python's arrays are dynamically typed, which means that the objects of an array have a type, but the array itself is not restricted to only one type - you can have an array consisting of an integer, a string, an object, or even of another array that's heterogeneously mixed as well.
There are 6 important types of arrays in Python: list
, tuple
, str
, bytes
, bytearray
, and array.array
.
When talking about each of them, there are a few key properties we'll take into account:
- Whether they're dynamic or not dynamic
- Whether they're statically or dynamically typed
- Whether they're mutable or immutable
Python Lists
A list in Python is dynamic (non-fixed size), dynamically typed (elements not restricted to a single type), and mutable (elements can be changed in-place).
In Python, a list is defined by declaring its elements within squared brackets []
. Let's go ahead and define a list:
my_list = [1, 2, 3, "Mark", "John", "Emma"]
print(my_list)
It contains a few integers and a few strings, denoting names. Since lists are dynamically typed, this is allowed:
[1, 2, 3, 'Mark', 'John', 'Emma']
Since lists are dynamic, we can change the number of elements by adding a new one, for example:
my_list.append(4)
my_list.append("Peter")
print(my_list)
This results in our list having 8 elements, instead of the 6 we defined in the beginning:
[1, 2, 3, 'Mark', 'John', 'Emma', 4, 'Peter']
Now, let's try replacing an element and adding a new one. We'll check the ID of the list (reference in memory) to confirm that it's not switched out under the hood with a new copy that contains either added elements or replaced ones:
my_list = [1, 2, 3, "Mark", "John", "Emma", 4, "Peter"]
# Print original list and its ID
print('Original list: ', my_list)
print('ID of object in memory: ', id(my_list))
# Modify existing element and add a new one
my_list[4] = "Anna"
my_list.append("Dan")
# Print changed list and its ID
print('Changed list: ', my_list)
print('ID of object in memory: ', id(my_list))
Running this code results in:
Original list: [1, 2, 3, 'Mark', 'John', 'Emma', 4, 'Peter']
ID of object in memory: 140024176315840
Changed list: [1, 2, 3, 'Mark', 'Anna', 'Emma', 4, 'Peter', 'Dan']
ID of object in memory: 140024176315840
The fact that my_list
points to the same object in memory (140024176315840
) further goes to show how lists are mutable.
Python's lists can even store functions in a sequence:
def f1():
return "Function one"
def f2():
return "Function two"
def f3():
return "Function three"
list_of_functions = [f1, f2, f3]
print(list_of_functions)
Which will result in:
[<function f1 at 0x0000016531807488>, <function f2 at 0x00000165318072F0>, <function f3 at 0x0000016531807400>]
Our output consists of functions at the given addresses. Now let's try and access a function and run it:
print(list_of_functions[0]())
Since the first element of this list is f1()
, we'd expect its appropriate print()
statement to run:
Function one
Lists are the most commonly used type of arrays in Python. They are easy to use and intuitive. Additionally, their time complexity for accessing elements is O(1).
Python Tuples
A tuple in Python is non-dynamic (fixed size), dynamically typed (elements not restricted to a single type), and immutable (elements cannot be changed in-place).
In addition to that, we use regular brackets ()
when defining them:
my_tuple = (1, 2, 3, "Mark", "John", "Emma")
print(my_tuple)
Since tuples are dynamically typed, we can have elements of different types present within them:
(1, 2, 3, 'Mark', 'John', 'Emma')
Since tuples are non-dynamic, they have a fixed size, and we can't append()
elements to them in-place, since this changes their size. Thus, tuples don't have an append()
method.
We can, however, create a new tuple consisting of smaller tuples, which again is of fixed size:
my_tuple = (1, 2, 3)
another_tuple = ("Mark", "John", "Emma")
print('Original tuple: ', my_tuple)
print('ID of object in memory: ', id(my_tuple))
my_tuple = my_tuple + another_tuple
print('New tuple: ', my_tuple)
print('ID of object in memory: ', id(my_tuple))
We've assigned the same variable reference to a new object created to contain both of these tuples together - even though the reference variable is the same, it points to a totally different object in memory:
Original tuple: (1, 2, 3)
ID of object in memory: 139960147395136
New tuple: (1, 2, 3, 'Mark', 'John', 'Emma')
ID of object in memory: 139960147855776
The time complexity for accessing items in a tuple is also O(1).
Python Strings
In Python 3, the str
type (short for String) is overhauled from Python 2. In Python 2, it used to represent both text and bytes, but since Python 3 - these two are totally different data types.
A string in Python is non-dynamic (fixed size), statically typed (elements restricted to a single type), and immutable (elements cannot be changed in-place).
A sequence of bytes (in human-readable characters), enclosed within parentheses ""
is used to define a string:
my_str = "qwerty"
print(my_str)
This will result in:
qwerty
We can access elements via standard array indexing, but can't change them:
print(my_str[0])
my_str[0] = "p"
This will result in:
q
TypeError: 'str' object does not support item assignment
In fact - strings are recursive. When we declare a string using characters - a string for each character is formed, which is then added to a list of strings that constitute another string.
my_str
has a length of 5, and is made up of five individual strings, of length 1
:
my_str = "abcde"
print(len(my_str)) # Check the length of our str
print(type(my_str)) # Check the type of our str
print(my_str[0]) # Letter 'a'
print(len(my_str[0])) # Check the length of our letter
print(type(my_str[0])) # Check the type of our letter 'a'
This results in:
5
<class 'str'>
a
1
<class 'str'>
Both our 'character' and string are of the same class - str
.
Similar to tuples, we can concatenate strings - which results in a new string consisting of the two smaller ones:
my_str = "qwerty"
my_str2 = "123"
result = my_str + my_str2
print(result)
And the result is:
qwerty123
Again, strings only support characters and we cannot mix in other types:
my_str = "qwerty"
my_str2 = 123
result = my_str + my_str2
print(result)
Which will result in:
TypeError: can only concatenate str (not "int") to str
However, int
, as well as every other type can be casted (converted) into a string representation:
my_str = "qwerty"
my_str2 = str(123) # int 123 is now casted to str
result = my_str + my_str2
print(result)
This will result in:
qwerty123
With this method, you can get away with printing, for example, int
s and string
s in the same line:
my_str = "qwerty"
print("my_str's length is: " + len(my_str)) # TypeError
print("my_str's length is: " + str(len(my_str))) # String concatenation resulting in 'my_str's length is: 6'
Python Bytes
Bytes in Python are non-dynamic (fixed size), statically typed (elements restricted to a single type), and immutable (elements cannot be changed in-place).
A bytes
object consists of multiple single bytes or integers, ranging from 0
to 255
(8-bit).
Defining a bytes
object is slightly different from other arrays since we explicitly have to cast a tuple into bytes
:
my_bytes = bytes((0, 1, 2))
print(my_bytes)
This will result in:
b'\x00\x01\x02'
If the tuple contains elements of different types, a TypeError
is thrown:
my_bytes = bytes((0, 1, 2, 'string'))
TypeError: 'str' object cannot be interpreted as an integer
When working with str
objects, an array of bytes must be encoded with a charset, otherwise, it'll be ambiguous as to what they represent:
my_str = "This is a string"
my_bytes = bytes(my_str) # this will result in an error TypeError: string argument without an encoding
my_bytes = bytes(my_str, 'utf-8')
print(my_bytes) # this will print out my_str normally
If you're unfamiliar with how encoding bytes works - read our guide on How to Convert Bytes to String in Python.
Furthermore, a bytes
array of integers can be mutable when casted to another array type called the bytearray
.
Python Bytearray
A bytearray in Python is dynamic (non-fixed size), statically typed (elements restricted to a single type), and mutable (elements can be changed in-place).
my_byte_array = bytearray((0, 1, 2))
Now, we can try to add elements to this array, as well as change an element:
my_byte_array = bytearray((0, 1, 2))
print(my_byte_array)
print("ByteArray ID: ", id(my_byte_array))
my_byte_array.append(3)
print(my_byte_array)
print("ByteArray ID: ", id(my_byte_array))
my_byte_array[3] = 50
print(my_byte_array)
print("ByteArray ID: ", id(my_byte_array))
This results in:
bytearray(b'\x00\x01\x02')
ByteArray ID: 140235112668272
bytearray(b'\x00\x01\x02\x03')
ByteArray ID: 140235112668272
bytearray(b'\x00\x01\x022')
ByteArray ID: 140235112668272
These all have the same object ID - pointing to the same object in memory being changed.
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
A bytearray
can be casted back to a bytes
array; though, keep in mind that it's an expensive operation that takes O(n) time.
Python array.array
So far, we've been working with built-in types. However, another type of array exists, in the array
module.
This array
is dynamic (non-fixed size), statically typed (elements restricted to a single type), and mutable (can be changed in-place). We need to explicitly note the type we'll be using in an array
and these types are C-style types: 32-bit integers, floating point numbers, doubles, etc.
Each of these has a marker - i
for integers, f
for floats, and d
for doubles. Let's make an integer array via the array
module:
import array
my_array = array.array("i", (1, 2, 3, 4))
Some of the more used C-like types:
Guide to Python Dictionaries
The Dictionary is a central data structure in Python. It stores data in key-value pairs.
Due to this, it can also be called a map, hash map, or a lookup table.
There are a few different variants of a dictionary:
dict
collections.defaultdict
collections.OrderedDict
collections.ChainMap
Dictionaries rely on hash values, that identify keys for the lookup operation. A hashtable contains many hash values which never change during the lifetime of a hashtable.
To learn more about dictionaries in Python, read our "Guide to Dictionaries in Python".
Hashable Type and Hash Values
Every object has a hash value, and the hash()
method can be used to retrieve it. This value isn't constant and is calculated at runtime, though if a == b
, hash(a)
will always be equal to hash(b)
:
random_string = "This is a random string"
a = 23
b = 23.5
print(hash(random_string))
print(hash(a))
print(hash(b))
This code will result in something along the lines of:
4400833007061176223
23
1152921504606846999
Numeric values that are equal have the same hash value, regardless of their type:
a = 23
b = 23.0
print(hash(a))
print(hash(b))
Results in:
23
23
This mechanism is what makes dictionaries blazingly fast in Python - unique identifiers for each element, giving them a lookup time of O(1).
Python Dictionary
The contents of a dictionary (dict
type) are defined within curly braces {}
. The syntax resembles JSON, given the key-value pairs:
my_dict = {
"name": "Mike James",
"age": 32,
"country": "United Kingdom"
}
A dictionary can have an arbitrary number of pairs and keys should be hashable without any duplicate keys (duplicate keys will result in the same hash). In such cases, the first key will be rejected and the dictionary will only actually contain the second key.
Since dictionaries are mutable, we can add a new key-value pair just by 'accessing' a non-existent key and setting its value:
my_dict["countries_visited"] = ["Spain", "Portugal", "Russia"]
print(my_dict)
This will result in:
{'name': 'Mike James', 'age': 34, 'country': 'United Kingdom', 'countries_visited': ['Spain', 'Portugal', 'Russia']}
Python's core dict
will probably solve most of your problems, but if not, there are a few dictionary types that can be imported from a library called collections
.
Python DefaultDict
A problem that you can encounter when using a dict
is trying to access the value of a key that doesn't exist.
For example, in our previous demonstration, if we accessed print(my_dict["zip_code"])
, we would get a KeyError: zip_code
as zip_code
doesn't exist.
This is when defaultdict
comes into play, as it requests a default_factory
- a function that returns the default value if a key is not present. This way, a defaultdict
can never raise a KeyError
:
from collections import defaultdict
# default_factory
def safe_function():
return "Value not defined"
my_dict = defaultdict(safe_function)
my_dict["name"] = "Mark James"
my_dict["age"] = 32
print(my_dict["country"]) # This will output Value not defined and not raise a KeyError
This, as expected, results in:
Value not defined
Defining defaultdict
values is different from the core dict
class because every key-value pair must be defined 'manually' which is more tedious than the JSON-like syntax.
Python ChainMap
This type of dictionary allows us to connect multiple dictionaries into one - to chain them. When accessing data, it will look for a key one by one until it finds the first correct one:
from collections import ChainMap
my_dict1 = {
"name": "Mike James",
"age": 32
}
my_dict2 = {
"name": "James Mike",
"country": "United Kingdom",
"countries_visited": ["Spain", "Portugal", "Russia"]
}
my_dict_result = ChainMap(my_dict1, my_dict2)
print(my_dict_result)
This results in a ChainMap
:
ChainMap({'name': 'Mike James', 'age': 32}, {'name': 'James Mike', 'country': 'United Kingdom', 'countries_visited': ['Spain', 'Portugal', 'Russia']})
We can also define duplicate keys. 'name'
is present in both dictionaries. However, when we try to access the 'name'
key:
print(my_dict_result['name'])
It finds the first matching key:
Mike James
Also, keep in mind that these can still raise a KeyError
since we are now working with a core dict
.
Python OrderedDict
Note: As of Python 3.6, dictionaries are insertion-ordered by default.
The OrderedDict
is used when you'd like to maintain the order of insertion of key-value pairs in a dictionary. dict
doesn't guarantee this, and you may end up with a different order of insertion than chronological.
If this isn't an important thing - you can comfortably use a dictionary. If this is important, though, such as when dealing with dates, you'll want to use an OrderedDict
instead:
from collections import OrderedDict
ordered_dict = OrderedDict()
ordered_dict['a'] = 1
ordered_dict['b'] = 2
ordered_dict['c'] = 3
ordered_dict['d'] = 4
print(ordered_dict)
This results in:
OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
Note: Even though dict
objects preserve the insertion order as of Python 3.6 - use OrderedDict
if insertion order is required. Your code won't guarantee insertion order across other Python versions (prior ones) if you use a regular dict
.
Dictionary Methods vs Array Methods
Now that we got the hang of things, we should cover all the methods that these two types have implemented in them. There are four basic operations that could be done to data: access (get), update, add, and delete.
Let's define an array and dictionary that we'll be experimenting with:
example_dict = {
"id": 101,
"name": "Marc Evans",
"date_of_birth": "13.02.1993.",
"city": "Chicago",
"height": 185,
}
example_array = [1, 2, 3, "red", "green", "yellow", "blue", 4]
Getting Data
Dictionary
There are multiple ways to access data in a dictionary:
-
Referring to a key name -
my_dict["key_name"]
:print(example_dict["name"]) # Output: Marc Evans
-
Calling the
get()
method -my_dict.get("key_name")
:print(example_dict.get("city")) # Output: Chicago
-
Accessing all keys in a dictionary -
my_dict.keys()
- returns a list of keys:print(example_dict.keys()) # Output: dict_keys(['id', 'name', 'date_of_birth', 'city', 'height'])
-
Accessing all values in a dictionary -
my_dict.values()
- returns a list of values:print(example_dict.values()) # Output: dict_values([101, 'Marc Evans', '13.02.1993.', 'Chicago', 185])
-
Accessing all key-value pairs:
my_dict.items()
- returns a tuple of key-value pairs:print(example_dict.items()) # Output: dict_items([('id', 101), ('name', 'Marc Evans'), ('date_of_birth', '13.02.1993.'), ('city', 'Chicago'), ('height', 185)]
Array
There is only one way to get data from an array:
-
By referring to an element's index -
my_array[index_number]
:print(example_array[3]) # Output: red
Updating Data
Dictionary
There are 2 ways to update data in a dictionary:
-
Directly setting a new value to a certain key -
my_dict["key"] = new_value
:example_dict["height"] = 190 print(example_dict["height"]) # Output: 190
-
Calling the
update()
method -my_dict.update({"key": new_value})
- method's arguments must be a dictionary:example_dict.update({"height": 190}) print(example_dict["height"]) # Output: 190
Array
If an array is mutable, it can be changed in a similar fashion as getting data:
-
By referring to an element's index and setting a different value:
my_array[index_number] = new_value
example_array[3] = "purple" print(example_array) # Output: [1, 2, 3, 'purple', 'green', 'yellow', 4, 'blue']
Add data
Dictionary
There are 2 ways to add data to a dictionary:
-
Setting a value to a new key, which will automatically create a key-value pair and add it:
my_dict["new_key"] = value
:example_dict["age"] = 45 print(example_dict) # Output: {'id': 101, 'name': 'Marc Evans', 'date_of_birth': '13.02.1993.', 'city': 'Chicago', 'height': 185, 'age': 45}
-
Calling the
update()
method -my_dict.update({"new_key": value})
:example_dict.update({"age": 45})
Array
There are a couple of ways to add data to an array (though, an array must be mutable):
-
Calling the
append()
method -my_array.append(new_element)
- it addsnew_element
to the end ofmy_array
:example_array.append("gray") print(example_array) # Output: [1, 2, 3, "purple", "green", "yellow", "blue", 4, "gray"]
-
Calling a method
insert()
-my_array.insert(index_number, new_element)
- inserts anew_element
at the positionindex_number
:example_array.insert(0, 0) print(example_array) # Output: [0, 1, 2, 3, "purple", "green", "yellow", "blue", 4, "gray"]
-
Calling the
extend()
method -my_array.extend(my_array2)
- inserts elements ofmy_array2
to the end ofmy_array
:example_array2 = [5, 6] example_array.extend(example_array2) print(example_array) # Output: [0, 1, 2, 3, "purple", "green", "yellow", "blue", 4, "gray", 5, 6]
Deleting Data
Dictionary
There are multiple ways to delete data from a dictionary:
-
Calling a method
pop()
-my_dict.pop("key_name")
- takes the name of the key to be deletedexample_dict.pop("name") print(example_dict) # {'id': 101, 'date_of_birth': '13.02.1993.', 'city': 'Chicago', 'height': 185}
-
Calling the
popitem()
method -my_dict.popitem()
- in Python 3.7+, it deletes the last added key-value pair, and in Python versions below 3.7 it deletes a random key-value pair:example_dict.popitem() print(example_dict) #{'id': 101, 'name': 'Marc Evans', 'date_of_birth': '13.02.1993.', 'city': 'Chicago'}
-
Using
del
keyword -del my_dict["key_name"]
del example_dict['name'] print(example_dict) # {'id': 101, 'date_of_birth': '13.02.1993.', 'city': 'Chicago', 'height': 185} # del dict deletes the entire dictionary del example_dict print(example_dict) # NameError: name 'example_dict' is not defined
-
Calling the
clear()
method -my_dict.clear()
- it empties the dictionary, but it will still exist as an empty one{}
example_dict.clear() print(example_dict) # {}
Array
There are a few ways to delete data from an array:
-
Calling a method
pop()
-my_array.pop(index_number)
- deletes an element at the specifiedindex_number
:example_array.pop(2) print(example_array) # [1, 2, 'red', 'green', 'yellow', 'blue', 4]
-
Calling the
remove()
method -my_array.remove(value)
- deletes the first item with the specifiedvalue
:example_array.remove(2) print(example_array) # [1, 3, 'red', 'green', 'yellow', 'blue', 4]
-
Calling a method
clear()
-my_array.clear()
- just like in a dictionary, it removes all the elements from an array, leaving an empty one[]
:example_array.clear() print(example_array) # []
Conclusion
In this comprehensive guide, we embarked on a deep dive into Python's diverse data structures, specifically focusing on arrays and dictionaries. Our exploration led us through the world of Python's list, tuple, string, byte, bytearray, and array.array, each showcasing their unique strengths and use cases. On the dictionary side, we navigated the intricacies of hashable types, Python's native dictionary, and other dictionary-like structures such as defaultdict, ChainMap, and OrderedDict.
The choice between dictionaries and arrays isn't purely binary; rather, it's contingent upon the problem at hand. Dictionaries are exceptionally efficient when it comes to associating keys with values and quickly retrieving data using those keys. They are versatile, allowing for the use of various data types as keys, provided they are hashable. On the other hand, arrays (and array-like structures) are sequential and indexed, making them ideal for ordered data, numerical operations, and when the order of elements is paramount.
Furthermore, the methods associated with both structures serve distinct purposes, from data retrieval and updating to addition and deletion. As developers, understanding the subtleties of these methods empowers us to write cleaner, more efficient code.
In summary, both arrays and dictionaries are foundational to Python and have their specific niches. As with any tool, understanding when and how to use them is key. Whether you're performing iterative operations on a list or mapping keys to values in a dictionary, Python's rich standard library has you covered.