Graph Theory and Graph-Related Algorithm's Theory and Implementation

Graph Theory and Graph-Related Algorithm's Theory and Implementation

Graphs are an extremely versatile data structure. More so than most people realize! Graphs can be used to model practically anything, given their nature of modeling relationships and hierarchies.

Nature and human creators are extremely hierarchical.

Words in a sentence, and sentences in a book can be graphs - represented as a grid. Pixels in an image can be a graph as well. A graph can represent a sequence of steps you want to take, or a sequence of operations an algorithm runs. A graph can represent a molecule, their arangements, up to humans and their relationships, up to streets and intersections, up to cities, countries, planets!

From an atomic scale to a universe scale - graphs can be used to model any sort of relationship or hierarchy.

Many common problems in the world can be represented in graphs. Where to open up restaurants to cover an area with guaranteed delivery times, with the minimum investment? Where to route a driver to avoid congestion and get to a target place as soon as possible? How to organize the flow of electrical energy in a city's grids? There isn't a shortage of problems that are inherently graph-based.

Graphs have been an important computational model in the past, and still are today. It's not that you university just wants you to learn them - they're genuinely one of the most useful data structures around. Recently, Graphs have been explored in newer contexts as well. For instance, Graph Attention Networks outlies a new type of artificial neural networks that operate on graphs, written by Petar Veličković et al. in 2018 has been cited over 7300 times. It's found major application in the fields of computational chemistry and drug discovery, and the idea is implemented in practically all frameworks dealing with modeling molecules as graphs.

Getting started with graphs can be daunting. Literature can be scattered and searching online typically involves various different implementations and a non-continuity between the material you're reading.

For that reason, we're compiling an introductory course to Graphs in Python, in an attempt to standardize the fundamentals to help you form a solid basis on graph theory. The course is intended to expand through time, with new algorithms being added in.

Graphs Simplified

Graphs can be complicated more than warranted. Fundamentally - they're fairly simple, but given the interdisciplinary usage of them, terminology can get confusing, since it overlaps and certain should-be-synonyms suddenly don't refer to the same thing anymore. You'll generally find people disagreeing in terms of terminology in the field, and especially online. If English is not your native language, and you've studied graph theory in a different language - translator freedom and loss of context between translation can make it even more confusing.

What's true?

Like with most things - it's hard to tell. You can't refer to the "original source" since scientific fields change through time, and the standard of "truth" changes with them. The best you could do is accept that there are various (slightly different) definitions and terminologies and abstain from semantic arguments. The point is fairly straightforward.

Each graph consists of nodes (also called vertices) and edges also sometimes known as arcs. This is where the first semantic argument can be raised! Computer Scientists more commonly call these nodes and edges. Mathematicians more commonly use the terms vertex and edge but also arc, and make a distinction between an edge (undirected) and an arc (directed). Additionally - not all combinatorialists and geometrists are in agreement here either. Official literature from universities also differs here. Some don't ascribe a difference between edges and arcs, while some do.

Note: Throughout the course, we'll be using the Computer Science notation of node and edge for the sake of consistency.

The point is: we can look at nodes/vertices as "chunks of data" and edges/arcs as "relations between those chunks of data".

You'll oftentimes also see this notation:

$$
G = (V, E)
$$

Or:

$$
G = (N, E)
$$

A graph (G), consists of vertices/nodes (V/N) and eges (E). Can you guess which formula was written by a mathematician and which by a computer scientist?

If we want to point out that two chunks of data are related to each other in some way - we'd connect them with an edge. Nodes are usually represented as circles with an identifying mark on them, so that we can distinguish between nodes, and edges are represented via lines (or arrows, as we'll see later) that connect two nodes. This is just a human-interpretable abstraction.

A form of the graph data structure is often used when making maps for a city's bus or metro lines. For instance, if you've ever been to a metro/bus station - you'd see a simplified map of the city, with a bunch of stations. Each station is a node in a graph, and the lines that a metro/bus follows are edges of that graph:

This graph shows what different bus stops we have and how they are connected. For example, we can see that, using this bus line, to get from Bus Stop A to Bus Stop C we'd have to go through Bus Stop B - we can't get there any other way.

Weighted Graphs

Let's say we wanted to add information regarding how long it took on average (in minutes) to get from one bus stop to another. We'd add "weights" to our edges:

Weights can represent any arbitrary measurable. In the case of a map represented as a graph - it could be the time it takes to traverse that edge. Or - it could be the current congestion. In the case of a molecule, the weight could be the strength of the connection between atoms. In the case of a power grid, it could be the resistance of the cable or the loss of energy when transferring electricity. This depends highly on the specific problem you're modeling with a graph, and the domain of that problem.

Let's get back to the bus stops! Now our graph shows even more information than before, we see how long it takes to get from one stop to another. We can deduce that getting from Bus Stop A to Bus Stop C would take 23 minutes since we'd need to "traverse" the edge connecting Bus Stop A and Bus Stop B that has a weight of 15 and the (Bus Stop B -> Bus Stop C) edge, which has a weight of 8.

Directed and Undirected Graphs

Now we'll look at another graph concept - directed and undirected graphs. Let's say that our bus line didn't have the same stops in both directions.

More precisely, when our bus starts from Bus Stop A it goes through B, C, and ends at D; however, in the other direction it doesn't go through C but an altogether different bus stop E, before then going through B. We might make something like this:

As the spoiler in the picture suggests - this approach isn't correct. Just looking at this graph we can't understand which bus stops the bus goes through in which direction, since nothing's stopping us from getting from B to E even though we should only be able to get from E to B.

This is where directed graphs come into play. The edges in directed graphs are represented via arrows instead of lines, an arrow pointing from A to B means "we can get from A to B but not vice versa". If we wanted to say we can get both from A to B and from B to A, we'd use a double-sided arrow.

Now, the cycle is no longer ambiguous:

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

Now it's clear that we can't go from B to E or from D to C, etc. It's clear that we can make graphs as simple or as complex as we want to. We could go one step further in our example, if for some reason it took a different amount of time to travel between some bus stops depending on the direction we're going.

Connected and Disconnected Graphs

A graph can also be connected and disconnected. A graph is connected if there is a path (which consists of one or more edges) between each pair of nodes. On the other hand, a graphs is disconnected if there is a pair of nodes which can't aren't connected by a path of edges.

Note: A frequent misconception is that graphs need to be connected, or at least that every node needs to be connected to another in some way. This isn't true. A graph can just as easily be consisted of any number of nodes without a single edge between them. However, many traversal algorithms require graphs to be connected to work properly.

Applications of Graphs

Here are a few examples of data that can be conveniently stored/represented using graphs, keeping in mind that this is just a grain of sand on the beach:

  • Modeling nature: As stated earlier - many things in nature is naturally modeled as a graph, from molecules to galaxies.
  • A set of people and their affiliations: This can be done using an undirected graph, since we're assuming that if person A knows person B, the same is true vice versa (hopefully). The different people would be represented by nodes, and if person A knows person B there will be an edge connecting the two nodes.
  • The structure of a website: Different pages on a website are nodes, and there is a directed edge between page A and page B if a link to page B exists on page A. For example the home page often has links to an About Us and Contact page.
  • Mutually dependent tasks: We can represent what tasks need to be completed before task A and what tasks cannot be completed before A has been completed in the following way - if there is a directed edge from B to A, then A cannot be completed before B is completed; if there is a directed edge from A to C, then C cannot be completed before A is completed:

Here we can see that we can put on glasses, socks, pants and a t-shirt regardless of other clothes. But we can't put on shoes before putting on pants and socks. And we can't go-out before doing everything else, except putting on glasses, since they might not be necessary. These types of graphs are easily sorted and we can see a feasible order of task execution using Topological Sorting.

Graphs Compared to Trees

If you are familiar with trees, graphs are easy to understand - trees are a special kind of graph, i.e. graphs with certain restrictions. Every tree is a graph, but not every graph is a tree.

By definition, trees are undirected graphs that:

  • Are acyclic, i.e. we can't traverse back to any starting node we choose without back-tracking
  • A cycle is created by adding any edge into the existing tree
  • The path between any two nodes is unique
  • There is a distinct root node
  • The graph is connected (there's a path between every two nodes)

Or in very simplified terms - trees are connected graphs without cycles. For graphs, we just get rid of all these restrictions and keep the nodes and edges concept. We'll get back to trees later, when we start covering algorithms that are meant to operate on trees specifically.

Graphs as a Mathematical Term

Graphs originated from mathematics so it's worth dedicating at least a couple of paragraphs to defining a more formal definition. A graph is an ordered pair G = (N, E), where N is a set of nodes and E is the set of edges.

Undirected Graphs: E ⊆ {{x, y} | (x, y) ∈ N^2}. Sometimes an extra condition is added if we want to avoid edges that have the same start and end point (i.e. x ≠ y).

Directed Graphs: E ⊆ {(x, y) | (x, y) ∈ N^2}, here as well we can add the condition that x ≠ y if we want to avoid edges that start and end in the same node.

As we can see in the previous definitions of the set of edges E, the only difference is that in the case of directed graphs (x,y) is an ordered pair, while in undirected graphs the pair {x,y} is unordered. This means that in an undirected graph, what matters is that x and y are connected, not in "what order". So we could have written {y,x} instead of {x,y} while we could not have written (y,x) instead of (x,y) for a directed graph.

Representing Graphs in Code

Knowing what graphs are, we need a way to represent them in code. This is usually done via Adjacency Lists and Adjacency Matrices. In the following article we present an in-depth look at these implementations in Java:

Depth-First Search (DFS)

Depth-Frist Search (DFS) searches as far as possible along a branch and then backtracks to search as far as possible in the next branch. This means that in the proceeding Graph, it starts off with the first neighbor, and continues down the line as far as possible.

Breadth-First Search (BFS)

Breadth First Search (BFS) is a graph-traversal algorithm, often used for finding the shortest path from a starting node to a goal node.

BFS visits "layer-by-layer". This means that it first visits all the children of the starting (root) node. These children are considered as the "second layer", after which their children are visited in the same manner.

Dijkstra's Algorithm

Dijkstra's Algorithm is a very well-known graph traversal algorithm for finding the shortest path from a given node/vertex to another.

There are several implementations of this algorithm and some even use different data structures and have different applications.

Dijkstra works in a different fashion to the previous algorithms. Instead of starting at the root node, it starts at the goal node and backtracks its way to the starting node.

A* Algorithm

Minimum-Spanning Trees (MST)

DFS vs BFS

A question that most likely came to mind when you were reading about DFS and BFS was:

"What's the practical difference?"

The theory is clear; BFS first visits the nodes closest to our starting node while DFS goes as deep into the graph (as far away from our starting node) as possible before continuing with other nodes close to our starting node.

Let's go through some examples:

  • If all you want to do is check whether two nodes are connected at all in a graph that doesn't give you a clear reason to prefer one or the other of the graph traversal algorithms - either one
  • If you "know" that your two nodes are close to each other if they're connected at all - for example looking for family members in an acquaintance graph - use BFS
  • If you want to count all the possible ways to get from one node to another - DFS
  • If you want the shortest path (the path that goes through the fewest number of nodes) from one node to another - BFS
  • DFS can also be used for some slightly advanced things, like detecting a cycle in a graph, or for checking for strongly connected components (a mathematical term that means "that every node is reachable from every other node") which can later be used for solving 2-satisfiability problems (2-SAT).

Conclusion

Graphs are a convenient way to store certain types of data. Each graph is consisted of nodes (also called vertices) and edges. We can look at nodes as "chunks of data" and edges as "relations between those chunks of data".

Graph traversal algorithms such as BFS, DFS, Dijkstra's Algorithm and A* are very commonly used in many branches of data science and machine learning.

Last Updated: April 5th, 2022
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Olivera PopovićAuthor

LinkedIn: https://rs.linkedin.com/in/227503161
If you need any help - post it in the comments :) That way someone else can reply if I'm busy.

© 2013-2022 Stack Abuse. All rights reserved.