In this guide - we'll take a look at how to calculate the Euclidean distance between two points in Python, using Numpy.
What is Euclidean Distance?
Euclidean distance is a fundamental distance metric pertaining to systems in Euclidean space.
Euclidean space is the classical geometrical space that you get familiar with in the Math class, typically bound to 3 dimensions. Though, it can also be prescribed to any non-negative integer dimension as well.
Euclidean distance is the shortest line between two points in Euclidean space.
The name comes from Euclid, who is widely recognized as "the father of geometry", as this was the only space people at the time would typically conceive of. Through time, different types of space have been observed in Physics and Mathematics, such as Affine space, and non-Euclidean spaces and geometry are very unintuitive for our cognitive perception.
In 3-dimensional Euclidean space, the shortest line between two points will always be a straight line between them, though this doesn't hold for higher dimensions.
Given this fact, Euclidean distance isn't always the most useful metric to keep track of when dealing with many dimensions, and we'll focus on 2D and 3D Euclidean space to calculate the Euclidean distance.
Measuring distance for high-dimensional data is typically done with other distance metrics such as Manhattan distance.
Generally speaking, Euclidean distance has major usage in development of 3D worlds, as well as Machine Learning algorithms that include distance metrics, such as K-Nearest Neighbors. Typically, Euclidean distance will represent how similar two data points are - assuming some clustering based on other data has already been performed.
Mathematical Formula
The mathematical formula for calculating the Euclidean distance between 2 points in 2D space:
$$
d(p,q) = \sqrt[2]{(q_1-p_1)^2 + (q_2-p_2)^2 }
$$
The formula is easily adapted to 3D space, as well as any dimension:
$$
d(p,q) = \sqrt[2]{(q_1-p_1)^2 + (q_2-p_2)^2 + (q_3-p_3)^2 }
$$
The general formula can be simplified to:
$$
d(p,q) = \sqrt[2]{(q_1-p_1)^2 + ... + (q_n-p_n)^2 }
$$
A sharp eye may notice the similarity between Euclidean distance and Pythagoras' Theorem:
$$
C^2 = A^2 + B^2
$$
$$
d(p,q)^2 = (q_1-p_1)^2 + (q_2-p_2)^2
$$
There in fact is a relationship between these - Euclidean distance is calculated via Pythagoras' Theorem, given the Cartesian coordinates of two points.
Because of this, Euclidean distance is sometimes known as Pythagoras' distance, as well, though, the former name is much more well-known.
Note: The two points are vectors, but the output should be a scalar (which is the distance).
We'll be using NumPy to calculate this distance for two points, and the same approach is used for 2D and 3D spaces:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection = '3d')
ax.scatter(0, 0, 0)
ax.scatter(3, 3, 3)
plt.show()
Calculating Euclidean Distance in Python with NumPy
First, we'll need to install the NumPy library:
$ pip install numpy
Now, let's import it and set up our two points, with the Cartesian coordinates as (0, 0, 0) and (3, 3, 3):
import numpy as np
# Initializing the points
point_1 = np.array((0, 0, 0))
point_2 = np.array((3, 3, 3))
Now, instead of performing the calculation manually, let's utilize the helper methods of NumPy to make this even easier!
np.sqrt() and np.sum()
The operations and mathematical functions required to calculate Euclidean Distance are pretty simple: addition, subtraction, as well as the square root function. Multiple additions can be replaced with a sum, as well:
$$
d(p,q) = \sqrt[2]{(q_1-p_1)^2 + (q_2-p_2)^2 + (q_3-p_3)^2 }
$$
NumPy provides us with a np.sqrt()
function, representing the square root function, as well as a np.sum()
function, which represents a sum. With these, calculating the Euclidean Distance in Python is simple and intuitive:
# Get the square of the difference of the 2 vectors
square = np.square(point_1 - point_2)
# Get the sum of the square
sum_square = np.sum(square)
This gives us a pretty simple result:
(0-3)^2 + (0-3)^2 + (0-3)^2
Which is equal to 27. All that's left is to get the square root of that number:
# The last step is to get the square root and print the Euclidean distance
distance = np.sqrt(sum_square)
print(distance)
This results in:
5.196152422706632
In true Pythonic spirit, this can be shortened to just a single line:
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
distance = np.sqrt(np.sum(np.square(point_1 - point_2)))
And you can even use the built-in pow()
and sum()
methods of the math
module of Python instead, though they require you to hack around a bit with the input, which is conveniently abstracted using NumPy, as the pow()
function only works with scalars (each element in the array individually), and accepts an argument - to which power you're raising the number.
This approach, though, intuitively looks more like the formula we've used before:
from math import *
distance = np.sqrt(sum(pow(a-b, 2) for a, b in zip(point_1, point_2)))
print(distance)
This also results in:
5.196152422706632
np.linalg.norm()
The np.linalg.norm()
function represents a Mathematical norm. In essence, a norm of a vector is it's length. This length doesn't have to necessarily be the Euclidean distance, and can be other distances as well. Euclidean distance is the L2 norm of a vector (sometimes known as the Euclidean norm) and by default, the norm()
function uses L2 - the ord
parameter is set to 2.
If you were to set the ord
parameter to some other value p, you'd calculate other p-norms. For instance, the L1 norm of a vector is the Manhattan distance!
With that in mind, we can use the np.linalg.norm()
function to calculate the Euclidean distance easily, and much more cleanly than using other functions:
distance = np.linalg.norm(point_1-point_2)
print(distance)
This results in the L2/Euclidean distance being printed:
5.196152422706632
L2 normalization and L1 normalization are heavily used in Machine Learning to normalize input data.
If you'd like to learn more about feature scaling - read our Guide to Feature Scaling Data with Scikit-Learn!
np.dot()
We can also use a Dot Product to calculate the Euclidean distance. In Mathematics, the Dot Product is the result of multiplying two equal-length vectors and the result is a single number - a scalar value. Because of the return type, it's sometimes also known as a "scalar product". This operation is often called the inner product for the two vectors.
To calculate the dot product between 2 vectors you can use the following formula:
$$
\vec{p} \cdot \vec{q} = {(q_1-p_1) + (q_2-p_2) + (q_3-p_3) }
$$
With NumPy, we can use the np.dot()
function, passing in two vectors.
If we calculate a Dot Product of the difference between both points, with that same difference - we get a number that's in a relationship with the Euclidean Distance between those two vectors. Extracting the square root of that number nets us the distance we're searching for:
# Take the difference between the 2 points
diff = point_1 - point_2
# Perform the dot product on the point with itself to get the sum of the squares
sum_square = np.dot(diff, diff)
# Get the square root of the result
distance = np.sqrt(sum_square)
print(distance)
Of course, you can shorten this to a one-liner as well:
distance = np.sqrt(np.dot(point_1-point_2, point_1-point_2))
print(distance)
5.196152422706632
Using the Built-In math.dist()
Python has its built-in method, in the math
module, that calculates the distance between 2 points in 3d space. However, this only works with Python 3.8 or later.
math.dist()
takes in two parameters, which are the two points, and returns the Euclidean distance between those points.
Note: Please note that the two points must have the same dimensions (i.e both in 2d or 3d space).
Now, to calculate the Euclidean Distance between these two points, we just chuck them into the dist()
method:
import math
distance = math.dist(point_1, point_2)
print(distance)
5.196152422706632
Conclusion
Euclidean distance is a fundamental distance metric pertaining to systems in Euclidean space.
Euclidean space is the classical geometrical space that you get familiar with in the Math class, typically bound to 3 dimensions. Though, it can also be prescribed to any non-negative integer dimension as well.
Euclidean distance is the shortest line between two points in Euclidean space.
The metric is used in many contexts within data mining, machine learning, and several other fields, and is one of the fundamental distance metrics.