Breadth-First Search (BFS) Algorithm
In this lesson, we will go over the theory behind the algorithm and the Python implementation of Breadth-First Search and Traversal. First, we'll be focusing on node search, before delving into graph traversal using the BFS algorithm, as the two main tasks you can employ it for.
Note: We're assuming an adjacency-list implemented graph in the lesson.
Breadth-First Search - Theory
Breadth-First Search (BFS) traverses the graph systematically, level by level, forming a BFS tree along the way.
If we start our search from node v
(the root node of our graph or tree data structure), the BFS algorithm will first visit all the neighbors of node v
(it's child nodes, on level one), in the order that is given in the adjacency list. Next, it takes the child nodes of those neighbors (level two) into consideration, and so on.
This algorithm can be used for both graph traversal and search. When searching for a node that satisfies a certain condition (target node), the path with the shortest distance from the starting node to the target node. The distance is defined as the number of branches traversed.
Breadth-First Search can be used to solve many problems such as finding the shortest path between two nodes, determining the levels of each node, and even solving puzzle games and mazes.
While it's not the most efficient algorithm for solving large mazes and puzzles - and it's outshined by algorithms such as Dijkstra's Algorithm and A* - it still plays an important role in the bunch and depending on the problem at hand - DFS and BFS can outperform their heuristic cousins.
Breadth-First Search - Algorithm
When implementing BFS, we usually use a FIFO structure like a Queue
to store nodes that will be visited next.
Note: To use a Queue
in Python, we need to import the corresponding Queue
class from the queue
module.
We need to pay attention to not fall into infinity loops by revisiting the same nodes over and over, which can easily happen with graphs that have cycles. Having that in mind, we'll be keeping track of the nodes that have been visited. That information doesn't have to be explicitly saved, we can simply keep track of the parent nodes, so we don't accidentally go back to one after it's been visited.
To sum up the logic, the BFS Algorithm steps look like this:
- Add the root/start node to the
Queue
. - For every node, set that they don't have a defined parent node.
- Until the
Queue
is empty:- Extract the node from the beginning of the
Queue
. - Perform output processing.
- For every neighbor of the current node that doesn't have a defined parent (is not visited), add it to the
Queue
, and set the current node as their parent.
- Extract the node from the beginning of the
Output processing is performed depending on the purpose behind the graph search. When searching for a target node, output processing is usually testing if the current node is equal to the target node. This is the step on which you can get creative!
Breadth-First Search Implementation - Target Node Search
Let's first start out with search - and search for a target node. Besides the target node, we'll need a start node as well. The expected output is a path that leads us from the start node to the target node.
With those in mind, and taking the steps of the algorithm into account, we can implement it. We'll define a Graph
class to "wrap" the implementation of the BFS search.
Graph
contains a graph representation - in this case an adjacency matrix, and all methods you might need when working with graphs. We'll implement both BFS search and BFS traversal as methods of that class:
from queue import Queue
class Graph:
# Constructor
def __init__(self, num_of_nodes, directed=True):
self.m_num_of_nodes = num_of_nodes
self.m_nodes = range(self.m_num_of_nodes)
# Directed or Undirected
self.m_directed = directed
# Graph representation - Adjacency list
# We use a dictionary to implement an adjacency list
self.m_adj_list = {node: set() for node in self.m_nodes}
# Add edge to the graph
def add_edge(self, node1, node2, weight=1):
self.m_adj_list[node1].add((node2, weight))
if not self.m_directed:
self.m_adj_list[node2].add((node1, weight))
# Print the graph representation
def print_adj_list(self):
for key in self.m_adj_list.keys():
print("node", key, ": ", self.m_adj_list[key])
After implementing a wrapper class, we can implement BFS search as one of its methods:
def bfs(self, start_node, target_node):
# Set of visited nodes to prevent loops
visited = set()
queue = Queue()
# Add the start_node to the queue and visited list
queue.put(start_node)
visited.add(start_node)
# start_node has not parents
parent = dict()
parent[start_node] = None
# Perform step 3
path_found = False
while not queue.empty():
current_node = queue.get()
if current_node == target_node:
path_found = True
break
for (next_node, weight) in self.m_adj_list[current_node]:
if next_node not in visited:
queue.put(next_node)
parent[next_node] = current_node
visited.add(next_node)
# Path reconstruction
path = []
if path_found:
path.append(target_node)
while parent[target_node] is not None:
path.append(parent[target_node])
target_node = parent[target_node]
path.reverse()
return path
When we're reconstructing the path (if it is found), we're going backward from the target node, through its parents, re-tracing all the way to the start node. Additionally, we reverse the path for our own intuition of going from the start_node
towards the target_node
, though, this step is optional.
On the other hand, if there is no path, the algorithm will return an empty list.
With the previously explained implementation in mind, we can test it by running the BFS search on the example graph:
Let's recreate this graph using our Graph
class. It is an undirected graph with 6 nodes, so we'll instantiate it as:
graph = Graph(6, directed=False)
Next, we need to add all edges of the graph to the instance of the Graph
class we've created:
graph.add_edge(0, 1)
graph.add_edge(0, 2)
graph.add_edge(0, 3)
graph.add_edge(0, 4)
graph.add_edge(1, 2)
graph.add_edge(2, 3)
graph.add_edge(2, 5)
graph.add_edge(3, 4)
graph.add_edge(3, 5)
graph.add_edge(4, 5)
Now, let's see how the Graph
class internally represents our example graph.
graph.print_adj_list()
This will print the adjacency list used to represent a graph we've created:
node 0 : {(3, 1), (1, 1), (4, 1), (2, 1)}
node 1 : {(0, 1), (2, 1)}
node 2 : {(0, 1), (1, 1), (5, 1), (3, 1)}
node 3 : {(0, 1), (5, 1), (4, 1), (2, 1)}
node 4 : {(0, 1), (5, 1), (3, 1)}
node 5 : {(3, 1), (4, 1), (2, 1)}
At this moment, we have created a graph and understand how it's stored as an adjacency matrix. With all that in mind, we can perform a search itself. Say we'd like to search for node 5
starting from node 0
:
path = []
path = graph.bfs(0, 5)
print(path)
Running this code results in:
[0, 3, 5]
After taking a quick look at the example graph, we can see that the shortest path between 0
and 5
is indeed [0, 3, 5]
. Though, you could also traverse [0, 2, 5]
and [0, 4, 5]
. These alternative paths are, fundamentally, the same distance as [0, 3, 5]
- however, consider how BFS compares nodes. It "scans" from left to right and 3
is the first node on the left-hand side of the adjacency list that leads to 5
, so this path is taken instead of the others.
This is a characteristic of BFS you'll want to anticipate. It'll search from left to right - and won't find an equally valid path if it's found "after" the first one.
Note: There are cases in which a path between two nodes cannot be found. This scenario is typical for disconnected graphs, where there are at least two nodes that are not connected by a path.
Here's what a disconnected graph looks like:
If we were to try and perform a search for a path between nodes 0
and 3
in this graph, that search would be unsuccessful, and an empty path would be returned.
Breadth-First Implementation - Graph Traversal
Breadth-First Traversal is a special case of Breadth-First Search that traverses the whole graph, instead of searching for a target node. The algorithm stays the same as we've defined it before, the difference being that we don't check for a target node and we don't need to find a path that leads to it.
This simplifies the implementation significantly - let's just print out each node being traversed to gain an intuition of how it passes through the nodes:
def bfs_traversal(self, start_node):
visited = set()
queue = Queue()
queue.put(start_node)
visited.add(start_node)
while not queue.empty():
current_node = queue.get()
print(current_node, end = " ")
for (next_node, weight) in self.m_adj_list[current_node]:
if next_node not in visited:
queue.put(next_node)
visited.add(next_node)
Note: This method should be implemented as part of the Graph
class implemented before.
Now, let's define the following example graph in the previously shown way:
# Create an instance of the `Graph` class
# This graph is undirected and has 5 nodes
graph = Graph(5, directed=False)
# Add edges to the graph
graph.add_edge(0, 1)
graph.add_edge(0, 2)
graph.add_edge(1, 2)
graph.add_edge(1, 4)
graph.add_edge(2, 3)
Finally, let's run the code:
graph.bfs_traversal(0)
Running this code will print nodes in the order that BFS scanned them:
0 1 2 4 3
Step by step
Let's dive into this example a bit deeper and see how the algorithm works step by step. As we start the traversal from the start node 0
, it is put into the visited
set and into the queue
as well. While we still have nodes in the queue
, we extract the first one, print it, and check all of its neighbors.
When going through the neighbors, we check if each of them is visited, and if not we add them to the queue
and mark them as visited:
Steps | Queue | Visited |
---|---|---|
Add start node 0 | [0] | {0} |
Visit 0, add 1 & 2 to Queue | [1, 2] | {0} |
Visit 1, add 4 to Queue | [2, 4] | {0, 2} |
Visit 2, add 3 to Queue | [4, 3] | {0, 1, 2} |
Visit 4, no unvisited neighbours | [3] | {0, 1, 1, 4} |
Visit 3, no unvisited neighbours | [ ] | {0, 1, 2, 4, 3} |
Time complexity
During Breadth-First Traversal, every node is visited exactly once, and every branch is also viewed once in case of a directed graph, that is, twice if the graph is undirected. Therefore, the time complexity of the BFS algorithm is O(|V| + |E|), where V is a set of the graph's nodes, and E is a set consisting of all of its branches (edges).
Conclusion
In this lesson, we've explained the theory behind the Breadth-First Search algorithm and defined its steps.
We've depicted the Python implementation of both Breadth-First Search and Breadth-First Traversal, and tested them on example graphs to see how they work step by step. Finally, we've explained the time complexity of this algorithm.