Graphs in Python - Theory and Implementation - Representing Graphs in Code

Representing Graphs in Code

Graphs in Python can be represented in several different ways. The most notable ones are adjacency matrices, adjacency lists, and lists of edges. In this guide, we'll cover all of them. When implementing graphs, you can switch between these types of representations at your leisure.

First of all, we'll quickly recap graph theory, then explain data structures you can use to represent a graph, and, finally, give you a practical implementation for each representation.

Note: The implementations found in this lesson can be found in the following GitHub repo.

What Is a Graph - In Short

Let's quickly skim over basic definitions regarding graphs once again. A graph is a data structure you can use to model hierarchy and relationships between objects. It consists of a set of nodes and a set of edges. Nodes represent individual objects, while edges illustrate relationships between those objects.

Note: Again, the terms 'node' and 'vertex' ('nodes' and 'vertices') are often used interchangeably. In this course, we've opted to use the term node, but it has the same meaning as the term vertex.

If every edge in a graph illustrates a two-way connection, we call that graph undirected. On the other hand, if you can traverse each edge in only one direction, the graph is directed.

Not all nodes of a graph need to be connected with others. If you can access each node from any other node in a graph - we call that graph connected. But sometimes there are some nodes you can't access from any other node in a graph - that's what disconnected graphs are all about. The common misconception is that every graph has to be connected, but the reality of the matter is that it doesn't - in fact, a graph can contain no edges, just nodes:

From the implementation standpoint, the one last thing we need to cover is the weight of an edge. It is a numeric value assigned to the edge describing how much it costs to traverse that edge. The smaller the weight of an edge, the cheaper it is to traverse it. Based on that, graphs with weights assigned to their edges are called weighted graphs:

Armed with the fundamental knowledge, you can dive deeper into ways to implement a graph!

Three Most Common Ways to Represent a Graph

In this section, we'll go over the most common ways you can represent a graph. We'll explain the intuition behind each of them and give you some illustrative examples. Afterward, you can use that knowledge to implement a graph in Python.

Generally speaking, nodes of any given graph are labeled by numbers (starting from zero) for the sake of the simpler implementation. That's why we'll be using that notation in the following examples and implementations.

We'll use the following weighted directed graph as the example in the following sections:

Note: We've chosen a weighted directed graph as the example because it illustrates most of the implementation nuances. Generally speaking, switching between weighted and unweighted graphs is pretty straight-forward. Switching between directed and undirected graphs is also a pretty easy thing to do. We'll cover each of those topics in the following sections when needed.

List of Edges

A list of edges is probably the simplest way to represent a graph, but since it lacks a proper structure, it is often used just for illustrative purposes. We'll use it to explain some graph algorithms because it provides little to no overhead and allows us to focus on the algorithm implementation, rather than the implementation of the graph itself.

As you already know, each edge connects two nodes and may have a weight assigned to it. Therefore, each edge is represented by a list in the following way: [node1, node2, weight], where weight is an optional property (not required if you have an unweighted graph). As its name suggests, a list of edges stores a graph as a list of edges represented in the described way.

Let's take a look at one directed weighted graph and its list of edges:

As you can see, a list of edges is effectively a table. Each row of that table represents one edge - two of its nodes and the weight of that edge. Since this is a weighted graph, the order of nodes in the edge representation illustrates the direction of the edge. You can traverse the edge only from node1 to node2.

On the other hand, you have two approaches for dealing with undirected graphs. The first approach is to add two rows for each node - one for each edge direction. That approach is space-inefficient, but you must use it if you are using both directed and undirected graphs interchangeably. The second approach can be used only if you are certain that you will deal only with undirected graphs. In that case, you can consider each edge to be undirected and keep the representation the same - one row for each edge.

PROS:

• Simple and easy to understand
• Great for illustrative purposes
• Represents a graph per definition (a set of nodes and set of edges)

CONS:

• Not structured in any shape or form
• Not eligible for any real-world application
• Not efficient
• Not versatile enough to represent both directed and undirected graphs interchangeably

An adjacency matrix is one of the most popular ways to represent a graph because it's the easiest one to understand and implement and works reasonably well for many applications. It uses an nxn matrix to represent a graph (n is the number of nodes in a graph). In other words, the number of rows and columns is equal to the number of nodes in a graph.

Initially, every field of the matrix is set to a special value you choose- inf, 0, -1, False, etc., suggesting that there are no nodes present in the graph. After the initial stage, you can add every edge of the graph by filling up the appropriate field by 1 (for unweighted graphs) or the edge weight (for weighted graphs).

The mentioned matrix is essentially a table with each row and column representing one node of the graph. For example, row 3 refers to node 3 - that's why you can use an adjacency matrix to represent only graphs with number labeled nodes.

Note: In fact, you can alter the implementation of an adjacency matrix so it can handle differently named nodes, but the additional work needed usually annuls the potential gains. That's why we've opted to use the simpler implementation - only number labeled nodes.

The field which is the intersection of row i and column j refers to a presence or a weight of the edge between nodes i and j. For example, if you fill-up the field which is the intersections of row 1 and column 4, it indicates that there is the edge connecting nodes 1 and 4 (in that specific order). If a graph is weighted, you fill up that field with the weight of the edge or 1 in a case of an unweighted graph.

In the case of undirected graphs, you must add two entries for each edge - one for each direction.

If the previous explanation wasn't clear enough, let's try to simplify it by showing how to create an adjacency matrix on the example of the following graph:

There are n nodes in this graph. Therefore, we've created a table with n rows and columns and initialized all cells with 0 - the special value indicating that there are no edges between any two nodes. Since the example graph is weighted and directed, you need to:

• Scan every edge in a graph
• Determine the start and the end node of that edge
• Determine the weight of that edge
• Fill up the appropriate field of the matrix with the weight value

Let's use the edge 3-4 as an example. The start node is 3 and the end node is 4, therefore you know that you need to fill up the field that is the intersection of row 3 and column 4. From the image, you can read that the weight of the edge is 11, therefore you fill-up the appropriate field with 11. Now you've marked the presence of the edge 3-4. That process is repeated until you've marked every edge in a graph.

PROS:

• Low lookup time - you can determine whether an edge exists in the time of O(1)
• Adding/removing edges takes O(1) time
• Easy to implement and understand

CONS:

• Takes more space O(num_of_nodes²)
• Adding node takes O(num_of_nodes²) time
• Costly to find adjacent nodes of the selected node - O(num_of_nodes)
• Costly to traverse a graph - O(num_of_nodes²)
• Costly to label/enumerate edges - O(num_of_nodes²)

An adjacency list is the most efficient way to store a graph. It allows you to store only edges that are present in a graph, which is the opposite of an adjacency matrix, which explicitly stores all possible edges - both existent and non-existent. An adjacency matrix is initially developed to represent only unweighted graphs, but in the most effective way possible - using only one array.

As you can see in the illustration below, we can represent our example graph using just an array of 12 integer values. Compare that to the adjacency matrix - it consists of n² elements (n is the number of nodes in a graph), where the adjacency list takes only n+e elements (e is the number of edges). Much more space-efficient if a graph is not dense (has a small number of edges).

The problem is that an adjacency list is harder to understand compared to the adjacency matrix, so if you haven't interpreted them before, follow along carefully:

The first thing you need to know to construct an adjacency list is the number of nodes in a graph. In our example graph, there are 5 nodes, so the first 5 places in a list represent those nodes - e.g. element with the index 1 represents a node 1. After reserving the first 5 places in a list, you can start filling up the list. The value on the index i refers to the index in the list where you can find indices of adjacent nodes to the node i.

For example, the value on the index 0 is 5, which means that you should look at the index 5 in the adjacency list to find which nodes are connected to node 0 - those are nodes 0, 1, and 2. But how did we know when to stop looking for adjacent nodes? It's pretty simple! Take a look at the value on the index next to 0 in the list. The next index is 1, it represents node 1, and its value is 8, meaning that you can find nodes adjacent to node 1 starting from index 8 in the adjacency list. Therefore, you can find nodes adjacent to node 0 as values of the list between indices 5 and 8.

To understand this structure more readily, you can reorder elements of the adjacency list in a more structured way. If you do that, you can see that the resulting structure looks a lot like a linked list:

Furthermore, the structure of the linked list resembles a lot on dictionaries (or maps). It has a set of keys - nodes, and a set of values for each key - a set of nodes adjacent to the key node. If you want to represent a weighted graph, you must find a way to store weight besides the adjacent node (as you can see in the following illustration). But we'll cover implementation details in later sections.

The following illustration shows you the adjacency list of the example graph - both with and without weights:

When you take a look at the weighted adjacent list, you can easily construct the set of edges of the example graph. Node 0 has three adjacent nodes - 0, 1, 2, meaning that graph has edges 0-0, 0-1, and 0-2. The weight of those edges can also be read from the adjacency list. The weight of edge 0-0 is 25, the weight of edge 0-1 is 5, and so on, for every edge in the graph.

PROS:

• Cheap to find adjacent nodes of the selected node - O(1)
• Efficient for less dense graphs (low number of edges compared to the number of nodes)
• You can use it for both letter and number labeled nodes
• Low cost of traversing a graph - O(length_of_list)
• Low cost of labeling/enumerating edges - O(length_of_list)

CONS:

• High lookup time - O(length_of_list)
• High cost of removing an edge - O(length_of_list) (logical extension of high lookup time)

Note: The length_of_list is equal to the sum of the number of nodes and number of edges in a graph.

How to Implement a Graph in Python

Now you know how to represent a graph with the most common data structures! The next thing to do is to implement those data structures in Python. The goal of this guide is to give you as universal of a graph implementation as possible, but still make it lightweight. That's what allows you to focus on implementing graph algorithms instead of a graph as a data structure. Ideally, you would have a wrapper class representing a graph data structure that you can use to wrap any graph algorithm method you want to implement later.

Note: Simple implementations that we'll create in this guide should cover you in all non-highly-specific use cases. For example, we'll assume that all nodes are labeled by numbers starting from zero. But if you need more comprehensive implementations, we got you covered! You can find complete implementations in the following GitHub repo.

The Graph class will store the graph representation, as well as all other methods you might need to manipulate a graph. Let's take a look at its general structure and dive into implementation afterward:

class Graph:
# Constructor
# Number of edges

# Methods for removing edges

# Methods for searching a graph
# BFS, DFS, Dijkstra, A*...

# Methods for finding a minimum spanning tree
# Prim's algorithm, Kruskal's algorithm, Borůvka's algorithm...

# Other interesting methods


As you can see, there are several, say, "sections" you should implement in a Graph class. The most important section and the one we'll be focusing on in this section is the Constructor section. That's where you should put your graph implementation (data structure representation). After that, you can implement any graph-related algorithm as a method in this class. Alternatively, you can implement any graph-traversal/graph-search algorithm as a standalone method, and pass in the graph itself. The difference is, quite literally, just in whether the algorithm references self (parent graph) or the passed in graph.

Let's take a look at how to implement each of the mentioned graph representations in the Graph class. In this section, we'll use the previously shown example graph to test each implementation:

How to Implement a List of Edges in Python

As we've stated before, a list of edges doesn't have many real-world applications, but it is often used for illustrative purposes. You can use it when you need a simple implementation of a graph that doesn't unnecessarily complicate the implementation of an algorithm.

Let's take a look at the implementation of the Graph class that uses a list of edges to represent a graph:

class Graph:
# Constructor
def __init__(self, num_of_nodes, directed=True):
self.m_num_of_nodes = num_of_nodes
self.m_directed = directed

# Different representations of a graph
self.m_list_of_edges = []

# Add edge to a graph
# Add the edge from node1 to node2
self.m_list_of_edges.append([node1, node2, weight])

# If a graph is undirected, add the same edge,
# but also in the opposite direction
if not self.m_directed:
self.m_list_of_edges.append([node1, node2, weight])

# Print a graph representation
def print_edge_list(self):
num_of_edges = len(self.m_list_of_edges)
for i in range(num_of_edges):
print("edge ", i+1, ": ", self.m_list_of_edges[i])


As you can see, this implementation is pretty simple. A graph is represented as a list of edges, where each edge is represented by a list in the following fashion: [node1, node2, weight]. Therefore, a graph is effectively a matrix, where each row represents one edge.

Let's create our example graph and see how does the list of edges stores it.

graph = Graph(5)

graph.print_edge_list()


First of all, the example graph has 5 nodes, therefore you create a graph with 5 nodes using the constructor. Then you add all edges to the created graph and print the graph. That will output the following:

edge  1 :  [0, 0, 25]
edge  2 :  [0, 1, 5]
edge  3 :  [0, 2, 3]
edge  4 :  [1, 3, 1]
edge  5 :  [1, 4, 15]
edge  6 :  [4, 2, 7]
edge  7 :  [4, 3, 11]


As you can see, this output is in line with the example list of edges we've shown in previous sections:

Note: If you wanted to make this graph undirected, you should the constructor in the following way: graph = Graph(5, directed=False).

How to Implement an Adjacency Matrix in Python

An adjacency matrix is essentially a simple nxn matrix, where n is the number of nodes in a graph. Therefore, we'll implement it as the matrix with num_of_nodes rows and columns. We'll use a list comprehension to construct it and initialize all fields to 0.

In this case, 0 is a special value referring to the fact that there are no edges in a graph initially. Adding edges is pretty simple, we just need to mark the appropriate field of the matrix with 1 or weight depending on whether a graph is weighted or not:

class Graph:
def __init__(self, num_of_nodes, directed=True):
self.m_num_of_nodes = num_of_nodes
self.m_directed = directed

# Create a matrix with num_of_nodes rows and columns
self.m_adj_matrix = [[0 for column in range(num_of_nodes)]
for row in range(num_of_nodes)]

if not self.m_directed:



Now let's test this implementation in the previously described way:

graph = Graph(5)

graph.print_edge_list()


Running the code above will yield the following output:

[25, 5, 3, 0, 0]
[0, 0, 0, 1, 15]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 7, 11, 0]


As expected, the output is the same matrix like the one we've shown in the previous sections:

Note: If you wanted to make this graph undirected, you should the constructor in the following way: graph = Graph(5, directed=False). In that case, the adjacency matrix will symmetric.

How to Implement an Adjacency List in Python

As we've explained in the previous sections, the best way to represent an adjacency list in Python is by using a dictionary - it has a set of keys and corresponding values.

We'll create one key for each node, and a set of adjacent nodes for each key. That way, we'll effectively create a set of adjacent nodes for each node in a graph. In essence, an adjacent node represents the edge between the key node and the adjacent node, therefore we need to assign a weight to each edge. That's why we'll represent each adjacent node as a tuple - a pair of the name of the adjacent node and the weight of that edge:

class Graph:
def __init__(self, num_of_nodes, directed=True):
self.m_num_of_nodes = num_of_nodes
self.m_nodes = range(self.m_num_of_nodes)

# Define the type of a graph
self.m_directed = directed

self.m_adj_list = {node: set() for node in self.m_nodes}

if not self.m_directed:



Again, let's test the implementation in the previously described way:

graph = Graph(5)

graph.print_edge_list()


The output is identically the same as the linked list described in the previous sections:

node 0 :  {(2, 3), (0, 25), (1, 5)}
node 1 :  {(3, 1), (4, 15)}
node 2 :  set()
node 3 :  set()
node 4 :  {(2, 7), (3, 11)}


Note: If you wanted to make this graph undirected, you should the constructor in the following way: graph = Graph(5, directed=False).

Conclusion

After reading this guide, you might ask yourself - What graph representation should I ultimately use? But the answer, as usual, isn't a straightforward one - it depends.

Each graph representation has its pros and cons - it shines in one use-case, but has low performance in others. That's why we've given you a comprehensive overview of graph representations. After reading this guide, you should have an intuitive understanding of graph representations, know how to implement each of them, and when to use each based on a handy list of pros and cons. Therefore, this guide should be a good starting point if you want to dive deeper into implementing graph-related algorithms.

On the other hand, if you want to dive deeper into the implementation of the graph itself, we advise you to take a look at the following GitHub repo. There you can find more in-depth and comprehensive implementations of graph data structures.

Lessson 2/10
You must first start the course before tracking progress.
Mark completed