Graph Theory and Graph-Related Algorithms

Graphs are an extremely versatile data structure. More so than most people realize! Graphs can be used to model practically anything, given their nature of modeling relationships and hierarchies.

Nature and human creations are extremely hierarchical.

Words in a sentence, and sentences in a book can be graphs - represented as a grid. Pixels in an image can be a graph as well. A graph can represent a sequence of steps you want to take, or a sequence of operations an algorithm runs. A graph can represent a molecule, their arangements, up to humans and their relationships, up to streets and intersections, up to cities, countries, planets!

From an atomic scale to a universe scale - graphs can be used to model any sort of relationship or hierarchy.

Many common problems in the world can be represented in graphs. Where to open up restaurants to cover an area with guaranteed delivery times, with the minimum investment? Where to route a driver to avoid congestion and get to a target place as soon as possible? How to organize the flow of electrical energy in a city's grids? There isn't a shortage of problems that are inherently graph-based.

Graphs have been an important computational model in the past, and still are today. It's not that you university just wants you to learn them - they're genuinely one of the most useful data structures around. Recently, Graphs have been explored in newer contexts as well. For instance, Graph Attention Networks outlies a new type of artificial neural networks that operate on graphs, written by Petar Veličković et al. in 2018 has been cited over 7300 times. It's found major application in the fields of computational chemistry and drug discovery, and the idea is implemented in practically all frameworks dealing with modeling molecules as graphs.

Getting started with graphs can be daunting. Literature can be scattered and searching online typically involves various different implementations and a non-continuity between the material you're reading.

For that reason, we're compiling an introductory course to Graphs in Python, in an attempt to standardize the fundamentals to help you form a solid basis on graph theory. The course is intended to expand through time, with new algorithms being added in.

Graphs Simplified

Graphs can be complicated more than warranted. Fundamentally - they're fairly simple, but given the interdisciplinary usage of them, terminology can get confusing, since it overlaps and certain should-be-synonyms suddenly don't refer to the same thing anymore. You'll generally find people disagreeing in terms of terminology in the field, and especially online. If English is not your native language, and you've studied graph theory in a different language - translator freedom and loss of context between translation can make it even more confusing.

What's true?

Like with most things - it's hard to tell. You can't refer to the "original source" since scientific fields change through time, and the standard of "truth" changes with them. The best you could do is accept that there are various (slightly different) definitions and terminologies and abstain from semantic arguments. The point is fairly straightforward.

Each graph consists of nodes (also called vertices) and edges also sometimes known as arcs. This is where the first semantic argument can be raised! Computer Scientists more commonly call these nodes and edges. Mathematicians more commonly use the terms vertex and edge but also arc, and make a distinction between an edge (undirected) and an arc (directed). Additionally - not all combinatorialists and geometrists are in agreement here either. Official literature from universities also differs here. Some don't ascribe a difference between edges and arcs, while some do.

Note: Throughout the course, we'll be using the Computer Science notation of node and edge for the sake of consistency.

The point is: we can look at nodes/vertices as "chunks of data" and edges/arcs as "relations between those chunks of data".

You'll oftentimes also see this notation:

$$
G = (V, E)
$$

Or:

$$
G = (N, E)
$$

A graph (G), consists of vertices/nodes (V/N) and eges (E). Can you guess which formula was written by a mathematician and which by a computer scientist?

If we want to point out that two chunks of data are related to each other in some way - we'd connect them with an edge. Nodes are usually represented as circles with an identifying mark on them, so that we can distinguish between nodes, and edges are represented via lines (or arrows, as we'll see later) that connect two nodes. This is just a human-interpretable abstraction.

A form of the graph data structure is often used when making maps for a city's bus or metro lines. For instance, if you've ever been to a metro/bus station - you'd see a simplified map of the city, with a bunch of stations. Each station is a node in a graph, and the lines that a metro/bus follows are edges of that graph:

This graph shows what different bus stops we have and how they are connected. For example, we can see that, using this bus line, to get from Bus Stop A to Bus Stop C we'd have to go through Bus Stop B - we can't get there any other way.

Weighted Graphs

Let's say we wanted to add information regarding how long it took on average (in minutes) to get from one bus stop to another. We'd add "weights" to our edges:

Weights can represent any arbitrary measurable. In the case of a map represented as a graph - it could be the time it takes to traverse that edge. Or - it could be the current congestion. In the case of a molecule, the weight could be the strength of the connection between atoms. In the case of a power grid, it could be the resistance of the cable or the loss of energy when transferring electricity. This depends highly on the specific problem you're modeling with a graph, and the domain of that problem.

Let's get back to the bus stops! Now our graph shows even more information than before, we see how long it takes to get from one stop to another. We can deduce that getting from Bus Stop A to Bus Stop C would take 23 minutes since we'd need to "traverse" the edge connecting Bus Stop A and Bus Stop B that has a weight of 15 and the (Bus Stop B -> Bus Stop C) edge, which has a weight of 8.

Directed and Undirected Graphs

Now we'll look at another graph concept - directed and undirected graphs. Let's say that our bus line didn't have the same stops in both directions.

More precisely, when our bus starts from Bus Stop A it goes through B, C, and ends at D; however, in the other direction it doesn't go through C but an altogether different bus stop E, before then going through B. We might make something like this:

As the spoiler in the picture suggests - this approach isn't correct. Just looking at this graph we can't understand which bus stops the bus goes through in which direction, since nothing's stopping us from getting from B to E even though we should only be able to get from E to B.

This is where directed graphs come into play. The edges in directed graphs are represented via arrows instead of lines, an arrow pointing from A to B means "we can get from A to B but not vice versa". If we wanted to say we can get both from A to B and from B to A, we'd use a double-sided arrow.

Now, the cycle is no longer ambiguous:

Now it's clear that we can't go from B to E or from D to C, etc. It's clear that we can make graphs as simple or as complex as we want to. We could go one step further in our example, if for some reason it took a different amount of time to travel between some bus stops depending on the direction we're going.

Connected and Disconnected Graphs

A graph can also be connected and disconnected. A graph is connected if there is a path (which consists of one or more edges) between each pair of nodes. On the other hand, a graphs is disconnected if there is a pair of nodes which can't aren't connected by a path of edges.

Note: A frequent misconception is that graphs need to be connected, or at least that every node needs to be connected to another in some way. This isn't true. A graph can just as easily be consisted of any number of nodes without a single edge between them. However, many traversal algorithms require graphs to be connected to work properly.

Applications of Graphs

Here are a few examples of data that can be conveniently stored/represented using graphs, keeping in mind that this is just a grain of sand on the beach:

Modeling nature: As stated earlier - many things in nature is naturally modeled as a graph, from molecules to galaxies.
A set of people and their affiliations: This can be done using an undirected graph, since we're assuming that if person A knows person B, the same is true vice versa (hopefully). The different people would be represented by nodes, and if person A knows person B there will be an edge connecting the two nodes.
The structure of a website: Different pages on a website are nodes, and there is a directed edge between page A and page B if a link to page B exists on page A. For example the home page often has links to an About Us and Contact page.
Mutually dependent tasks: We can represent what tasks need to be completed before task A and what tasks cannot be completed before A has been completed in the following way - if there is a directed edge from B to A, then A cannot be completed before B is completed; if there is a directed edge from A to C, then C cannot be completed before A is completed:

Here we can see that we can put on glasses, socks, pants and a t-shirt regardless of other clothes. But we can't put on shoes before putting on pants and socks. And we can't go-out before doing everything else, except putting on glasses, since they might not be necessary. These types of graphs are easily sorted and we can see a feasible order of task execution using Topological Sorting.

Graphs Compared to Trees

If you are familiar with trees, graphs are easy to understand - trees are a special kind of graph, i.e. graphs with certain restrictions. Every tree is a graph, but not every graph is a tree.

By definition, trees are undirected graphs that:

Are acyclic, i.e. we can't traverse back to any starting node we choose without back-tracking
A cycle is created by adding any edge into the existing tree
The path between any two nodes is unique
There is a distinct root node
The graph is connected (there's a path between every two nodes)

Or in very simplified terms - trees are connected graphs without cycles. For graphs, we just get rid of all these restrictions and keep the nodes and edges concept. We'll get back to trees later, when we start covering algorithms that are meant to operate on trees specifically.

Graphs as a Mathematical Term

Graphs originated from mathematics so it's worth dedicating at least a couple of paragraphs to defining a more formal definition. A graph is an ordered pair G = (N, E), where N is a set of nodes and E is the set of edges.

Undirected Graphs: E ⊆ {{x, y} | (x, y) ∈ N^2}. Sometimes an extra condition is added if we want to avoid edges that have the same start and end point (i.e. x ≠ y).

Directed Graphs: E ⊆ {(x, y) | (x, y) ∈ N^2}, here as well we can add the condition that x ≠ y if we want to avoid edges that start and end in the same node.

As we can see in the previous definitions of the set of edges E, the only difference is that in the case of directed graphs (x,y) is an ordered pair, while in undirected graphs the pair {x,y} is unordered. This means that in an undirected graph, what matters is that x and y are connected, not in "what order". So we could have written {y,x} instead of {x,y} while we could not have written (y,x) instead of (x,y) for a directed graph.

In the following lessons - we'll dive into graph representations in code (lists of edges, adjacency matrices and adjacency lists, as well as their implementations). Then, we'll cover some of the most important and widely used algorithms for graph traversal and graph search, starting with the simpler, intuitive ones, and working our way from there:

Breadth-First Search and Depth-First Search
Dijkstra's Algorithm
A*
Prim's Algorithm
Boruvka's Algorithm
Kruskal's Algorithm

Lessson 1/10

You must first start the course before tracking progress.

Mark completed

Previous Next