Guide to Multidimensional Scaling in Python with Scikit-Learn

Guide to Multidimensional Scaling in Python with Scikit-Learn


In this guide, we'll dive into a dimensionality reduction, data embedding and data visualization technique known as Multidimensional Scaling (MDS).

We'll be utilizing Scikit-Learn to perform Multidimensional Scaling, as it has a wonderfully simple and powerful API. Throughout the guide, we'll be using the Olivetti faces dataset from AT&T to illustrate the embedding of data in a lower-dimensional space.

By the end of the guide, you'll have a firm grasp on Multidimensional Scaling, as well as its hyperparameters and how they impact the technique.

What is Multidimensional Scaling?

MDS is a non-linear technique for embedding data in a lower-dimensional space.

It maps points residing in a higher-dimensional space to a lower-dimensional space while preserving the distances between those points as much as possible. Because of this, the pairwise distances between points in the lower-dimensional space are matched closely to their actual distances.

The following figure is an example of a possible mapping of points from 3D to 2D and 1D space. The pairwise distances of the three points in 3D space are exactly preserved in the 2D space but not in the 1D space. If we run MDS, it would ensure a minimal difference between the actual pairwise distances and the pairwise distances of the mapped points:

MDS can be used as a preprocessing step for dimensionality reduction in classification and regression problems.

Other than Multidimensional Scaling, you can also use other Dimensionality Reduction techniques, such as Principal Component Analysis (PCA) or Singular Value Decomposition (SVD). If you'd like to read about both of them, as well has how to use them to your advantage, read our Guide to Dimensionality Reduction in Python with Scikit-Learn!

MDS is not only an effective technique for dimensionality reduction but also for data visualization. It maintains the same clusters and patterns of high-dimensional data in the lower-dimensional space so you can boil down, say, a 5-dimensional dataset to a 3-dimensional dataset which you can interpret much more easily and naturally.

Normally the distance measure used in MDS is the Euclidean distance, however, any other suitable dissimilarity metric can be used when applying MDS.

There are two main ways to implement MDS:

  • Metric MDS / Classical MDS: This version of MDS aims to preserve the pairwise distance/dissimilarity measure as much as possible.
  • Non-Metric MDS: This method is applicable when only the ranks of a dissimilarity metric are known. MDS then maps the objects so that the ranks are preserved as much as possible.

Performing Multidimensional Scaling in Python with Scikit-Learn

The Scikit-Learn library's sklearn.manifold module implements manifold learning and data embedding techniques. We'll be using the MDS class of this module. The embeddings are determined using the stress minimization using majorization (SMACOF) algorithm. Some of the important parameters for setting up the MDS object are (this is not an exhaustive list):

  • n_components: Number of dimensions to map the points to. The default value is 2.
  • metric: A Boolean variable with a default value of True for metric MDS and False for its non-metric version.
  • dissimilarity: The default value is euclidean, which specifies Euclidean pairwise distances. The other possible value is precomputed. Using precomputed requires the computation of the pairwise distance matrix and using this matrix as an input to the fit() or fit_transform() function.

The four attributes associated with an MDS object are:

  • embedding_: Location of points in the new space.
  • stress_: Goodness-of-fit statistic used in MDS.
  • dissimilarity_matrix_: The matrix of pairwise distances/dissimilarity.
  • n_iter_: Number of iterations pertaining to the best goodness-of-fit measure.

Like all other classes for dimensionality reduction in scikit-learn, the MDS class also implements the fit() and fit_transform() methods.

A Simple Illustration

In this section, we show how to apply MDS using a very simple example. We'll add the import section first:

from sklearn.manifold import MDS
from matplotlib import pyplot as plt
import sklearn.datasets as dt
import seaborn as sns         
import numpy as np
from sklearn.metrics.pairwise import manhattan_distances, euclidean_distances
from matplotlib.offsetbox import OffsetImage, AnnotationBbox

The code below sets up an MDS object and calls its method fit_transform(). This method returns the embedded points in 2D space. Let's print the resulting mapping:

X = np.array([[0, 0, 0], [0, 0, 1], [1, 1, 1], [0, 1, 0], [0, 1, 1]])
mds = MDS(random_state=0)
X_transform = mds.fit_transform(X)
[[ 0.72521687  0.52943352]
 [ 0.61640884 -0.48411805]
 [-0.9113603  -0.47905115]
 [-0.2190564   0.71505714]
 [-0.21120901 -0.28132146]]

Since the embeddings are created based on the stress minimization algorithm, we can also take a look at the stress variable:

stress = mds.stress_

This results in:


Another method of applying MDS is by constructing a distance matrix and applying MDS directly to this matrix as shown in the code below. This method is useful when a distance measure other than Euclidean distance is required. The code below computes the pairwise Manhattan distances (also called the city block distance or L1 distance) and transforms the data via MDS.

Note the dissimilarity argument has been set to precomputed:

dist_manhattan = manhattan_distances(X)
mds = MDS(dissimilarity='precomputed', random_state=0)
# Get the embeddings
X_transform_L1 = mds.fit_transform(dist_manhattan)

This results in:

[[ 0.9847767   0.84738596]
 [ 0.81047787 -0.37601578]
 [-1.104849   -1.06040621]
 [-0.29311254  0.87364759]
 [-0.39729303 -0.28461157]]

Though, this doesn't help us gain a good intuition as to what just happened. Humans aren't that good at crunching numbers. To gain a better understanding of the entire process, let's plot the original points and their embeddings created by preserving Euclidean distances. An original point and its corresponding embedded point are both shown in the same color:

colors = ['r', 'g', 'b', 'c', 'm']
size = [64, 64, 64, 64, 64]
fig = plt.figure(2, (10,4))
ax = fig.add_subplot(121, projection='3d')
plt.scatter(X[:,0], X[:,1], zs=X[:,2], s=size, c=colors)
plt.title('Original Points')

ax = fig.add_subplot(122)
plt.scatter(X_transform[:,0], X_transform[:,1], s=size, c=colors)
plt.title('Embedding in 2D')
fig.subplots_adjust(wspace=.4, hspace=0.5)

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

The plot on the right keeps the relative distances generally intact - purple, green and blue are close together, and their relative position to each other is approximately the same when compared to cyan and red.

Practical Multidimensional Scaling On Olivetti Faces Dataset From AT&T

As a practical illustration of MDS, we'll use the Olivetti faces dataset from AT&T to show the embeddings in a space with dimensions as low as 2D. The dataset has 10 64x64 bitmap images per person, each image acquired with varying facial expressions or lighting conditions.

MDS preserves the patterns in data so that different face images of the same person are close to each other in the 2D space and far away from another person's mapped face.

To avoid clutter, we'll take only the faces of 4 distinct people and apply MDS to them.

Before fetching the dataset and applying MDS, let's write a small function, mapData(), that takes the input arguments, i.e., the pairwise distance matrix dist_matrix, raw data matrix X, the class variable y, the Boolean variable metric and title for the graph.

The function applies MDS to the distance matrix and displays the transformed points in 2D space, with the same colored points indicating the mapped image of the same person. In a second figure, it also displays the image of each face on the graph where it is mapped in the lower-dimensional space.

We'll demonstrate MDS with different distance measures along with non-metric MDS:

def mapData(dist_matrix, X, y, metric, title):
    mds = MDS(metric=metric, dissimilarity='precomputed', random_state=0)
    # Get the embeddings
    pts = mds.fit_transform(dist_matrix)
    # Plot the embedding, colored according to the class of the points
    fig = plt.figure(2, (15,6))
    ax = fig.add_subplot(1,2,1)    
    ax = sns.scatterplot(x=pts[:, 0], y=pts[:, 1],
                         hue=y, palette=['r', 'g', 'b', 'c'])

    # Add the second plot
    ax = fig.add_subplot(1,2,2)
    # Plot the points again
    plt.scatter(pts[:, 0], pts[:, 1])
    # Annotate each point by its corresponding face image
    for x, ind in zip(X, range(pts.shape[0])):
        im = x.reshape(64,64)
        imagebox = OffsetImage(im, zoom=0.3,
        i = pts[ind, 0]
        j = pts[ind, 1]
        ab = AnnotationBbox(imagebox, (i, j), frameon=False)

The code below fetches the Olivetti faces dataset and extracts examples with labels < 4:

faces = dt.fetch_olivetti_faces()
X_faces =
y_faces =
ind = y_faces < 4
X_faces = X_faces[ind,:]
y_faces = y_faces[ind]

And without further ado, let's load the data in and run our mapData() function on it!

Using the Euclidean Pairwise Distances

The mapping of the Olivetti faces dataset using Euclidean distances is shown below. Euclidean distance is the default distance for MDS because of how versatile and commonly used it is:

dist_euclid = euclidean_distances(X_faces)
mapData(dist_euclid, X_faces, y_faces, True, 
        'Metric MDS with Euclidean')

We can see a nice mapping of 64x64 images to a two-dimensional space, where the class of each image is well separated from the rest in most cases. It's worth taking a moment to appreciate the fact that images residing in a 64x64 dimension space can be reduced to a two dimensional space, and still retain their informational value.

Using the Manhattan Pairwise Distances

For comparison, we can perform MDS on the same data using the Manhatten pairwise distances. The code below uses the Manhatten distance matrix as an input to mapData():

dist_L1 = manhattan_distances(X_faces)
mapData(dist_L1, X_faces, y_faces, True, 
        'Metric MDS with Manhattan')

We can see the mapping is quite similar to the one obtained via Euclidean distances. Each class is nicely separated in the lower-dimensional space, though they're offset a bit differently on the plot.

Performing Non-Metric Multidimensional Scaling

As a final example, we'll show non-metric MDS on the same dataset using Euclidean distances and see how it compares with the corresponding metric version:

mapData(dist_euclid, X_faces, y_faces, False, 
        'Non-metric MDS with Euclidean')

There are quite a lot of hiccups here. We can see that this version of MDS does not perform so well on the Olivetti faces dataset.

This is mainly because of the quantitative nature of data.

Non-metric MDS maintains the ranked distances between objects rather than the actual distances.

The n_components Parameter in MDS

One of the important hyper-parameters involved in MDS is the size of the lower-dimensional space in which the points are embedded.

This would be very relevant when MDS is used as a preprocessing step for dimensionality reduction.

The question arises:

Just how many dimensions do you pick, so that you reduce the dimensionality the most you can, without losing important information?

A simple method to choose a value of this parameter is to run MDS on different values of n_components and plot the stress_ value for each embedding. Given that the stress_ value decreases with higher dimensions - you pick a point that has a fair tradeoff between stress_ and n_components.

The code below runs MDS by varying the dimensions from 1-20 and plots the corresponding stress_ attribute for each embedding:

stress = []
# Max value for n_components
max_range = 21
for dim in range(1, max_range):
    # Set up the MDS object
    mds = MDS(n_components=dim, dissimilarity='precomputed', random_state=0)
    # Apply MDS
    pts = mds.fit_transform(dist_euclid)
    # Retrieve the stress value
# Plot stress vs. n_components    
plt.plot(range(1, max_range), stress)
plt.xticks(range(1, max_range, 2))

We can see that increasing the value of n_components decreases the stress value at the beginning and then the curve levels off. There's almost no difference between 18 and 19 dimensions, but there's a huge difference between 1 and 2 dimensions.

The elbow of the curve is a good choice for the optimal value of n_components. In this case the value can be taken at 4, which is an amazing 0.09% reduction of features/attributes.


This guide was an introduction to Multidimensional Scaling in Python, using Scikit-Learn. We've taken a look at how Multidimensional Scaling works, its hyperparameters, which vartiations exist and then applied it on a practical dataset.

We've used the Olivetti Faces dataset, from AT&T and illustrated that images residing in a 64x64 dimensional space can be mapped to a two-dimensional space, and still retain the individual patterns or clusters across images.

Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Mehreen SaeedAuthor

I am an educator and I love mathematics and data science!

Want a remote job?

    Prepping for an interview?

    • Improve your skills by solving one coding problem every day
    • Get the solutions the next morning via email
    • Practice on actual problems asked by top companies, like:

    Better understand your data with visualizations

    With over 330+ pages, you'll learn the ins and outs of visualizing data in Python with popular libraries like Matplotlib, Seaborn, Bokeh, and more.

    © 2013-2021 Stack Abuse. All rights reserved.