Graphs are an extremely versatile data structure. More so than most people realize! Graphs can be used to model practically anything, given their nature of modeling relationships and hierarchies.
Nature and human creators are extremely hierarchical.
Words in a sentence, and sentences in a book can be graphs  represented as a grid. Pixels in an image can be a graph as well. A graph can represent a sequence of steps you want to take, or a sequence of operations an algorithm runs. A graph can represent a molecule, their arangements, up to humans and their relationships, up to streets and intersections, up to cities, countries, planets!
From an atomic scale to a universe scale  graphs can be used to model any sort of relationship or hierarchy.
Many common problems in the world can be represented in graphs. Where to open up restaurants to cover an area with guaranteed delivery times, with the minimum investment? Where to route a driver to avoid congestion and get to a target place as soon as possible? How to organize the flow of electrical energy in a city's grids? There isn't a shortage of problems that are inherently graphbased.
Graphs have been an important computational model in the past, and still are today. It's not that you university just wants you to learn them  they're genuinely one of the most useful data structures around. Recently, Graphs have been explored in newer contexts as well. For instance, Graph Attention Networks outlies a new type of artificial neural networks that operate on graphs, written by Petar Veličković et al. in 2018 has been cited over 7300 times. It's found major application in the fields of computational chemistry and drug discovery, and the idea is implemented in practically all frameworks dealing with modeling molecules as graphs.
Getting started with graphs can be daunting. Literature can be scattered and searching online typically involves various different implementations and a noncontinuity between the material you're reading.
For that reason, we're compiling an introductory course to Graphs in Python, in an attempt to standardize the fundamentals to help you form a solid basis on graph theory. The course is intended to expand through time, with new algorithms being added in.
Graphs Simplified
Graphs can be complicated more than warranted. Fundamentally  they're fairly simple, but given the interdisciplinary usage of them, terminology can get confusing, since it overlaps and certain shouldbesynonyms suddenly don't refer to the same thing anymore. You'll generally find people disagreeing in terms of terminology in the field, and especially online. If English is not your native language, and you've studied graph theory in a different language  translator freedom and loss of context between translation can make it even more confusing.
What's true?
Like with most things  it's hard to tell. You can't refer to the "original source" since scientific fields change through time, and the standard of "truth" changes with them. The best you could do is accept that there are various (slightly different) definitions and terminologies and abstain from semantic arguments. The point is fairly straightforward.
Each graph consists of nodes (also called vertices) and edges also sometimes known as arcs. This is where the first semantic argument can be raised! Computer Scientists more commonly call these nodes and edges. Mathematicians more commonly use the terms vertex and edge but also arc, and make a distinction between an edge (undirected) and an arc (directed). Additionally  not all combinatorialists and geometrists are in agreement here either. Official literature from universities also differs here. Some don't ascribe a difference between edges and arcs, while some do.
Note: Throughout the course, we'll be using the Computer Science notation of node and edge for the sake of consistency.
The point is: we can look at nodes/vertices as "chunks of data" and edges/arcs as "relations between those chunks of data".
You'll oftentimes also see this notation:
$$
G = (V, E)
$$
Or:
$$
G = (N, E)
$$
A graph (G
), consists of vertices/nodes (V
/N
) and eges (E
). Can you guess which formula was written by a mathematician and which by a computer scientist?
If we want to point out that two chunks of data are related to each other in some way  we'd connect them with an edge. Nodes are usually represented as circles with an identifying mark on them, so that we can distinguish between nodes, and edges are represented via lines (or arrows, as we'll see later) that connect two nodes. This is just a humaninterpretable abstraction.
A form of the graph data structure is often used when making maps for a city's bus or metro lines. For instance, if you've ever been to a metro/bus station  you'd see a simplified map of the city, with a bunch of stations. Each station is a node in a graph, and the lines that a metro/bus follows are edges of that graph:
This graph shows what different bus stops we have and how they are connected. For example, we can see that, using this bus line, to get from Bus Stop A to Bus Stop C we'd have to go through Bus Stop B  we can't get there any other way.
Weighted Graphs
Let's say we wanted to add information regarding how long it took on average (in minutes) to get from one bus stop to another. We'd add "weights" to our edges:
Weights can represent any arbitrary measurable. In the case of a map represented as a graph  it could be the time it takes to traverse that edge. Or  it could be the current congestion. In the case of a molecule, the weight could be the strength of the connection between atoms. In the case of a power grid, it could be the resistance of the cable or the loss of energy when transferring electricity. This depends highly on the specific problem you're modeling with a graph, and the domain of that problem.
Let's get back to the bus stops! Now our graph shows even more information than before, we see how long it takes to get from one stop to another. We can deduce that getting from Bus Stop A to Bus Stop C would take 23 minutes since we'd need to "traverse" the edge connecting Bus Stop A and Bus Stop B that has a weight of 15 and the (Bus Stop B > Bus Stop C) edge, which has a weight of 8.
Directed and Undirected Graphs
Now we'll look at another graph concept  directed and undirected graphs. Let's say that our bus line didn't have the same stops in both directions.
More precisely, when our bus starts from Bus Stop A it goes through B, C, and ends at D; however, in the other direction it doesn't go through C but an altogether different bus stop E, before then going through B. We might make something like this:
As the spoiler in the picture suggests  this approach isn't correct. Just looking at this graph we can't understand which bus stops the bus goes through in which direction, since nothing's stopping us from getting from B to E even though we should only be able to get from E to B.
This is where directed graphs come into play. The edges in directed graphs are represented via arrows instead of lines, an arrow pointing from A to B means "we can get from A to B but not vice versa". If we wanted to say we can get both from A to B and from B to A, we'd use a doublesided arrow.
Now, the cycle is no longer ambiguous:
Free eBook: Git Essentials
Check out our handson, practical guide to learning Git, with bestpractices, industryaccepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
Now it's clear that we can't go from B to E or from D to C, etc. It's clear that we can make graphs as simple or as complex as we want to. We could go one step further in our example, if for some reason it took a different amount of time to travel between some bus stops depending on the direction we're going.
Connected and Disconnected Graphs
A graph can also be connected and disconnected. A graph is connected if there is a path (which consists of one or more edges) between each pair of nodes. On the other hand, a graphs is disconnected if there is a pair of nodes which can't aren't connected by a path of edges.
Note: A frequent misconception is that graphs need to be connected, or at least that every node needs to be connected to another in some way. This isn't true. A graph can just as easily be consisted of any number of nodes without a single edge between them. However, many traversal algorithms require graphs to be connected to work properly.
Applications of Graphs
Here are a few examples of data that can be conveniently stored/represented using graphs, keeping in mind that this is just a grain of sand on the beach:
 Modeling nature: As stated earlier  many things in nature is naturally modeled as a graph, from molecules to galaxies.
 A set of people and their affiliations: This can be done using an undirected graph, since we're assuming that if person A knows person B, the same is true vice versa (hopefully). The different people would be represented by nodes, and if person A knows person B there will be an edge connecting the two nodes.
 The structure of a website: Different pages on a website are nodes, and there is a directed edge between page A and page B if a link to page B exists on page A. For example the home page often has links to an About Us and Contact page.
 Mutually dependent tasks: We can represent what tasks need to be completed before task A and what tasks cannot be completed before A has been completed in the following way  if there is a directed edge from B to A, then A cannot be completed before B is completed; if there is a directed edge from A to C, then C cannot be completed before A is completed:
Here we can see that we can put on glasses, socks, pants and a tshirt regardless of other clothes. But we can't put on shoes before putting on pants and socks. And we can't goout before doing everything else, except putting on glasses, since they might not be necessary. These types of graphs are easily sorted and we can see a feasible order of task execution using Topological Sorting.
Graphs Compared to Trees
If you are familiar with trees, graphs are easy to understand  trees are a special kind of graph, i.e. graphs with certain restrictions. Every tree is a graph, but not every graph is a tree.
By definition, trees are undirected graphs that:
 Are acyclic, i.e. we can't traverse back to any starting node we choose without backtracking
 A cycle is created by adding any edge into the existing tree
 The path between any two nodes is unique
 There is a distinct root node
 The graph is connected (there's a path between every two nodes)
Or in very simplified terms  trees are connected graphs without cycles. For graphs, we just get rid of all these restrictions and keep the nodes and edges concept. We'll get back to trees later, when we start covering algorithms that are meant to operate on trees specifically.
Graphs as a Mathematical Term
Graphs originated from mathematics so it's worth dedicating at least a couple of paragraphs to defining a more formal definition. A graph is an ordered pair G = (N, E)
, where N
is a set of nodes and E
is the set of edges.
Undirected Graphs: E ⊆ {{x, y}  (x, y) ∈ N^2}
. Sometimes an extra condition is added if we want to avoid edges that have the same start and end point (i.e. x ≠ y
).
Directed Graphs: E ⊆ {(x, y)  (x, y) ∈ N^2}
, here as well we can add the condition that x ≠ y
if we want to avoid edges that start and end in the same node.
As we can see in the previous definitions of the set of edges E
, the only difference is that in the case of directed graphs (x,y)
is an ordered pair, while in undirected graphs the pair {x,y}
is unordered. This means that in an undirected graph, what matters is that x
and y
are connected, not in "what order". So we could have written {y,x}
instead of {x,y}
while we could not have written (y,x)
instead of (x,y)
for a directed graph.
Representing Graphs in Code
Knowing what graphs are, we need a way to represent them in code. This is usually done via Adjacency Lists and Adjacency Matrices. In the following article we present an indepth look at these implementations in Java:
DepthFirst Search (DFS)
DepthFrist Search (DFS) searches as far as possible along a branch and then backtracks to search as far as possible in the next branch. This means that in the proceeding Graph, it starts off with the first neighbor, and continues down the line as far as possible.
BreadthFirst Search (BFS)
Breadth First Search (BFS) is a graphtraversal algorithm, often used for finding the shortest path from a starting node to a goal node.
BFS visits "layerbylayer". This means that it first visits all the children of the starting (root) node. These children are considered as the "second layer", after which their children are visited in the same manner.
Dijkstra's Algorithm
Dijkstra's Algorithm is a very wellknown graph traversal algorithm for finding the shortest path from a given node/vertex to another.
There are several implementations of this algorithm and some even use different data structures and have different applications.
Dijkstra works in a different fashion to the previous algorithms. Instead of starting at the root node, it starts at the goal node and backtracks its way to the starting node.
A* Algorithm
MinimumSpanning Trees (MST)

Graphs in Python: Minimum Spanning Trees  Borůvka's Algorithm

Graphs in Python: Minimum Spanning Trees  Kruskal's Algorithm

Graphs in Java: Minimum Spanning Trees  Borůvka's Algorithm (coming soon!)

Graphs in Java: Minimum Spanning Trees  Kruskal's Algorithm (coming soon!)
DFS vs BFS
A question that most likely came to mind when you were reading about DFS and BFS was:
"What's the practical difference?"
The theory is clear; BFS first visits the nodes closest to our starting node while DFS goes as deep into the graph (as far away from our starting node) as possible before continuing with other nodes close to our starting node.
Let's go through some examples:
 If all you want to do is check whether two nodes are connected at all in a graph that doesn't give you a clear reason to prefer one or the other of the graph traversal algorithms  either one
 If you "know" that your two nodes are close to each other if they're connected at all  for example looking for family members in an acquaintance graph  use BFS
 If you want to count all the possible ways to get from one node to another  DFS
 If you want the shortest path (the path that goes through the fewest number of nodes) from one node to another  BFS
 DFS can also be used for some slightly advanced things, like detecting a cycle in a graph, or for checking for strongly connected components (a mathematical term that means "that every node is reachable from every other node") which can later be used for solving 2satisfiability problems (2SAT).
Conclusion
Graphs are a convenient way to store certain types of data. Each graph is consisted of nodes (also called vertices) and edges. We can look at nodes as "chunks of data" and edges as "relations between those chunks of data".
Graph traversal algorithms such as BFS, DFS, Dijkstra's Algorithm and A* are very commonly used in many branches of data science and machine learning.