Introduction
Graphs are a convenient way to store certain types of data. The concept was ported from mathematics and appropriated for the needs of computer science.
Due to the fact that many things can be represented as graphs, graph traversal has become a common task, especially used in data science and machine learning.
How does Prim's Algorithm Work?
Prim's Algorithm was designed to find a Minimum Spanning Tree (MST) for a connected, weighted undirected graph. This means that the algorithm finds a "tree" (a structure that has no cycles) that connects all of the vertices via a subset of all available edges that have the smallest weight.
Alike Dijkstra's algorithm, Prim's is a greedy algorithm, but Prim's allows negative-weighted edges.
At the end of the algorithm, we'll loop through our array that contains the lowest-cost edges and add them up, getting the value of the MST within our graph.
We'll be discussing how every step of this algorithm works, but a rough sketch of the algorithm can be laid out. Assuming we have a weighted graph G
with a set of vertices (nodes) V
and a set of edges E
:
- We choose one of the nodes
s
as the starting node, and set the distance froms
tos
as0
. - We'll assign a number from node
s
to every other node, marking it as infinity at the beginning. This number will change and update as we progress along the algorithm. - Every node
s
will also have a number representing the "parent" node, from which we connect it in the MST. This number is initialized as-1
, and every other node except the starting node will have a number different from-1
associated with it by the end of Prim's algorithm. - For every node
s
we'll find the minimum edge connecting a node that is not already included in the MST. Since Prim's is a greedy algorithm, once we enter the node we're sure that we've chosen the shortest path connecting it to its parent. We repeat this step until all of the nodes are added to the MST. - Finally, we loop through our MST array and add up the edges, getting the value of the MST.
Visualizing Prim's Algorithm
Let's quickly visualize a simple example - and manually use Prim's Algorithm to find a Minimum Spanning Tree on the following graph:
We'll have 5 nodes, numbered 0 through 4, and on each of the edges the number represents the weight of that edge. Let's describe the INF/-1
pair: -1
at the beginning represents the parent from which there is an edge connecting to the current node that is of weight INF
. Of course, as the algorithm progresses, these values will also get updated.
Let's say that 0
will be our starting node. We mentioned earlier that when we choose our starting node, we need to set the distance from itself as 0
. Since 0
is the node with the minimal edge to itself, we can safely assume that 0
belongs in the MST and we'll add it. After that little change the graph looks as follows:
White nodes represent the ones we added to the MST.
The next step is the one that makes Prim's algorithm what it is. We loop through all of the neighbors of the node 0
, checking for a few things along the way:
- If the edge exists at all
- If the neighbor node is already added to the MST
- If the cost of the edge leading to the neighbor is lower than the current smallest-cost edge leading to that neighbor
The first neighbor of 0
is 1
. The edge connecting them has a weight of 1
. The edge exists, and the current node 1
is not in the MST, so the only thing left is to check if the edge from 0
to 1
is the smallest weighted edge leading to node 1
. Obviously, 1
is less than INF
, so we update the distance/parent pair of node 1
to 1/0
.
We follow exactly the same steps for every other neighbor of node 0
, after which we choose the node with the minimal edge weight to be added to the MST, and mark it blue. That node here is 1
.
Now we have the following graph:
The node we're considering now is 1
. As we've done with node 0
, we check all of the neighbors of node 1
.
Node 0
is already added to the MST, so we skip that one.
Node 2
is the next neighbor, and the weight of the edge leading to it from node 1
is 2
. This edge has a smaller weight than the one that previously led to that node, which had weight of 5
and came from node 0
.
The same is with the other neighbor node 4
: the weight of the edge leading to it from node 1
is 1
, and previously the smallest weighted edge leading to node 4
from node 0
was 4
.
We choose the next node that isn't added to the MST and has the smallest weighted edge from node 1
. That node here is node 4
.
After the update we have the following graph:
As we consider node 4
, we see that we can't update any of the current edges. Namely, both neighbors of node 4
already belong to the MST, so there is nothing to update there, and we just move along in the algorithm without doing anything in this step.
We continue looking for a node that is connected to a node belonging to the MST and has the smallest weighted edge possible. That node is currently 2
, and it connects to node 1
via the edge that has the weight of 2
. The graph looks as follows:
Both nodes 0
and 1
already belong in the MST, so the only possible node we can go to is 3
. The weight of the edge leading to node 3
from node 2
is 4
, which is obviously less than the previous 10
leading from node 0
. We update that, getting the following graph:
With this, we've visited and added all of the existing nodes to the MST, and because Prim's is a greedy algorithm, this means we've found our MST.
Let's recollect; the edges that were added to the array that keeps track of our MST are the following:
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
- Edge
0-1
of weight1
- Edge
1-2
of weight2
- Edge
1-4
of weight1
- Edge
2-3
of weight4
All that's left is to add up all of the edges making up the MST, after which we get that the value of the MST for the graph in our example is 8
, and we wrap up the execution of the algorithm here.
The time complexity of Prim's algorithm is O((|E| + |V|)log|V|)
, where |E|
is the number of edges in the graph, and |V|
is the number of vertices (nodes) in the graph.
Implementing Prim's Algorithm in Java
With the general idea, and the visualization out of the way - let's implement Prim's algorithm in Java.
As usual, we'll be using the weighted graph implementation from our previous piece: Representing Graphs in Code. However, we'll need to slightly modify it to fit our needs in implementing Prim's algorithm.
In this guide we'll be using the adjacency matrix approach. Note that we can implement Prim's algorithm just as well using adjacency lists, but the matrix approach is just slightly easier, and the code becomes shorter and more readable.
An important thing to note for later on is that, when we've initialized our adjacency matrix, all of the places that don't have a weight assigned to them will automatically be initialized as 0
.
Implementation of the Graph Class
First off, we'll start by adding three new arrays to our Graph
class:
public class Graph {
private int numOfNodes;
private boolean directed;
private boolean weighted;
private double[][] matrix;
private double[] edges;
private double[] parents;
private boolean[] includedInMST;
private boolean[][] isSetMatrix;
// ...
}
Let's briefly go over what each of these arrays represent:
edges
represents an array that holds the values of edges belonging to the MST that connect a node to their parent.parents
gives us information about the parent of every single node.includedInMST
tells us if a node we're checking for already belongs in the MST.
Then, we'll add these to the constructor along with the previously declared variables:
public Graph(int numOfNodes, boolean directed, boolean weighted) {
this.directed = directed;
this.weighted = weighted;
this.numOfNodes = numOfNodes;
// Simply initializes our adjacency matrix to the appropriate size
matrix = new double[numOfNodes][numOfNodes];
isSetMatrix = new boolean[numOfNodes][numOfNodes];
edges = new double[numOfNodes];
parents = new double[numOfNodes];
includedInMST = new boolean[numOfNodes];
for(int i = 0; i < numOfNodes; i++){
edges[i] = Double.POSITIVE_INFINITY;
parents[i] = -1;
includedInMST[i] = false;
}
}
We've allocated numOfNodes
space for each of our individual arrays. An important step here is the initialization:
- The distance to every single node at the beginning is set to
Double.POSITIVE_INFINITY
. This essentially means that we haven't yet reached the node from any other node, hence the distance to it isInfinity
. This number also representsInfinity
as a data type in Java. - Since none of the nodes are reached when the algorithm begins, the parent of every single node is set to
-1
, indicating that the specific node has no parent it's reached from. The reason we can set the value of parents to-1
is that we label the nodes from0
ton-1
wheren
is the number of nodes, so it logically makes no sense to have a node-1
. - At the beginning of the algorithm, none of the nodes belong to the MST, so it's only logical to include none of them, i.e. set the value of every single member in
includedInMST
tofalse
.
The addEdge()
and printMatrix()
methods stay the same, as they are both self-explanatory to what they do; we won't go into that deeper.
However, we do require additional getters and setters that will allow us to change the aforementioned arrays. Those are the following:
public int getNumOfNodes() {
return numOfNodes;
}
public double getEdges(int i) {
return edges[i];
}
public void setEdges(double edge, int node) {
this.edges[node] = edge;
}
public boolean getIncludedInMST(int i) {
return includedInMST[i];
}
public void setIncludedInMST(int node) {
this.includedInMST[node] = true;
}
public double[][] getMatrix() {
return matrix;
}
public void setParents(double parent, int node) {
this.parents[node] = parent;
}
public double getParents(int i) {
return parents[i];
}
If any of these getters/setters isn't intuitive - each of the getters and setters will additionally be explained as we use them when implementing Prim's algorithm.
With this, we've completed the adaptation of the implementation of a weighted Graph
, and we can move on to the algorithm itself.
Implementation of Prim's Algorithm
With a Graph
ready, we can go ahead and implement the algorithm that'll run on top of it. Let's initialize a Graph
with a set of nodes and their edges. We'll use the same set of nodes and edges as in the visualization from an earlier section:
public class Prim {
public static void main(String[] args){
Graph graph = new Graph(5, false, true);
graph.addEdge(0, 1, 1);
graph.addEdge(0, 2, 5);
graph.addEdge(0, 3, 10);
graph.addEdge(0, 4, 4);
graph.addEdge(1, 2, 2);
graph.addEdge(1, 4, 1);
graph.addEdge(2, 3, 4);
// ...
}
}
Printing out this matrix using graph.printMatrix()
outputs the following:
/ 1.0 5.0 10.0 4.0
1.0 / 2.0 / 1.0
5.0 2.0 / 4.0 /
10.0 / 4.0 / /
4.0 1.0 / / /
We also need a method named minEdgeNotIncluded()
that finds the minimum weighted edge leading to a neighbor that isn't already included in the MST:
public static int minEdgeNotIncluded(Graph graph){
double min = Double.POSITIVE_INFINITY;
int minIndex = -1;
int numOfNodes = graph.getNumOfNodes();
for(int i = 0; i < numOfNodes; i++){
if(!graph.getIncludedInMST(i) && graph.getEdges(i) < min){
minIndex = i;
min = graph.getEdges(i);
}
}
return minIndex;
}
At the beginning, we set min
to Infinity
indicating that we haven't found the minimum edge yet. Variable minIndex
represents the node that the minimum edge we're looking for connects to, and we initialize it to -1
at the beginning. Afterwards, we loop through all the nodes, looking for a node that isn't already included in the MST, after which we check if the edge connecting to that node is smaller than our current min
edge.
Finally, we're ready to implement Prim's algorithm:
public class Prim {
public static void main(String[] args){
// Initialized and added the graph earlier
int startNode = 0;
// Distance from the start node to itself is 0
graph.setEdges(0, startNode);
for(int i = 0; i < graph.getNumOfNodes()-1; i++){
int node = minEdgeNotIncluded(graph);
graph.setIncludedInMST(node);
double[][] matrix = graph.getMatrix();
for(int v = 0; v < graph.getNumOfNodes(); v++){
if(matrix[node][v] != 0 &&
!graph.getIncludedInMST(v) &&
matrix[node][v] < graph.getEdges(v)){
graph.setEdges(matrix[node][v], v);
graph.setParents(node, v);
}
}
}
double cost = 0;
for(int i = 0; i < graph.getNumOfNodes(); i++){
if(i != startNode){
cost += graph.getEdges(i);
}
}
System.out.println(cost);
}
}
The code itself might be a little confusing, so let's dive into it and explain what every section of it does.
First off, we choose our startNode
to be 0
. Remember, we need a node to start from, and that node could be any node from the set, but for this example it'll be 0
. We set the distance from node 0
to itself to be 0
.
In the for
loop, for every single i
from 0
to n-1
we look for a node s
so that the edge i-s
is the smallest edge from i
. After we've found the corresponding node, since Prim's is a greedy algorithm we're sure that there is no smaller edge from node i
to any other node besides s
, so we add s
to the MST.
The next thing is going through all of the neighbors of node s
. Let's recall how non-initialized weights are treated in an adjacency matrix:
All of the places in our adjacency matrix that haven't been assigned a weight will automatically be initialized as
0
.
This is important because any (negative or positive) number at the position matrix[i][j]
indicates that an edge exists between nodes i
and j
, while 0
indicates the absence of it.
So, the conditions that need to be met for an edge (and a node) to be added to the MST are the following three:
- We check if the value
matrix[i][j]
is different from0
, and if it is we know the edge exists, and that value represents the weight between nodesi
andj
. - We check if the neighbor has already been added to the MST. If so, we skip that node and go on to the next neighbor.
- If the value of the edge from node
i
to nodej
is smaller than the already existing value from a different node to nodej
, we update the pair of distance/parent to reflect the situation, i.e. distance becomes the value of the edgei-j
and the parent from which we arrive at nodej
is nodei
.
That about sums up how Prim's algorithm works. All that is left to do is to go through the edges
array and add up all of the edges that make up the MST, finding its value. That is exactly what the last part of our code does, and stores the result in the cost
variable.
Let's wrap up the algorithm with the output of the MST:
System.out.println("MST consists of the following edges:");
for(int i = 1; i < graph.getNumOfNodes(); i++){
System.out.println("edge: (" + (int)graph.getParents(i) + ", " + i + "), weight: " + graph.getEdges(i));
}
Let's run it and see the output:
MST consists of the following edges:
edge: (0, 1), weight: 1.0
edge: (1, 2), weight: 2.0
edge: (2, 3), weight: 4.0
edge: (1, 4), weight: 1.0
Conclusion
In this guide, we've covered and explained how to use Prim's Algorithm to find a Minimum-Spanning Tree(MST) in Java.
Prim's, alongside Kruskal's algorithm is one of the two most commonly used to solve this problem, which finds its use in fields such as designing computer networks, telecommunication networks and networks in general.