In this guide, we'll dive into a dimensionality reduction, data embedding and data visualization technique known as Multidimensional Scaling (MDS).
We'll be utilizing Scikit-Learn to perform Multidimensional Scaling, as it has a wonderfully simple and powerful API. Throughout the guide, we'll be using the Olivetti faces dataset from AT&T to illustrate the embedding of data in a lower-dimensional space.
By the end of the guide, you'll have a firm grasp on Multidimensional Scaling, as well as its hyperparameters and how they impact the technique.
What is Multidimensional Scaling?
MDS is a non-linear technique for embedding data in a lower-dimensional space.
It maps points residing in a higher-dimensional space to a lower-dimensional space while preserving the distances between those points as much as possible. Because of this, the pairwise distances between points in the lower-dimensional space are matched closely to their actual distances.
The following figure is an example of a possible mapping of points from 3D to 2D and 1D space. The pairwise distances of the three points in 3D space are exactly preserved in the 2D space but not in the 1D space. If we run MDS, it would ensure a minimal difference between the actual pairwise distances and the pairwise distances of the mapped points:
MDS can be used as a preprocessing step for dimensionality reduction in classification and regression problems.
Other than Multidimensional Scaling, you can also use other Dimensionality Reduction techniques, such as Principal Component Analysis (PCA) or Singular Value Decomposition (SVD). If you'd like to read about both of them, as well has how to use them to your advantage, read our Guide to Dimensionality Reduction in Python with Scikit-Learn!
MDS is not only an effective technique for dimensionality reduction but also for data visualization. It maintains the same clusters and patterns of high-dimensional data in the lower-dimensional space so you can boil down, say, a 5-dimensional dataset to a 3-dimensional dataset which you can interpret much more easily and naturally.
Normally the distance measure used in MDS is the Euclidean distance, however, any other suitable dissimilarity metric can be used when applying MDS.
There are two main ways to implement MDS:
- Metric MDS / Classical MDS: This version of MDS aims to preserve the pairwise distance/dissimilarity measure as much as possible.
- Non-Metric MDS: This method is applicable when only the ranks of a dissimilarity metric are known. MDS then maps the objects so that the ranks are preserved as much as possible.
Performing Multidimensional Scaling in Python with Scikit-Learn
The Scikit-Learn library's
sklearn.manifold module implements manifold learning and data embedding techniques. We'll be using the
MDS class of this module. The embeddings are determined using the stress minimization using majorization (SMACOF) algorithm. Some of the important parameters for setting up the
MDS object are (this is not an exhaustive list):
n_components: Number of dimensions to map the points to. The default value is 2.
metric: A Boolean variable with a default value of
Truefor metric MDS and
Falsefor its non-metric version.
dissimilarity: The default value is
euclidean, which specifies Euclidean pairwise distances. The other possible value is
precomputedrequires the computation of the pairwise distance matrix and using this matrix as an input to the
The four attributes associated with an
MDS object are:
embedding_: Location of points in the new space.
stress_: Goodness-of-fit statistic used in MDS.
dissimilarity_matrix_: The matrix of pairwise distances/dissimilarity.
n_iter_: Number of iterations pertaining to the best goodness-of-fit measure.
Like all other classes for dimensionality reduction in
MDS class also implements the
A Simple Illustration
In this section, we show how to apply MDS using a very simple example. We'll add the import section first:
from sklearn.manifold import MDS from matplotlib import pyplot as plt import sklearn.datasets as dt import seaborn as sns import numpy as np from sklearn.metrics.pairwise import manhattan_distances, euclidean_distances from matplotlib.offsetbox import OffsetImage, AnnotationBbox
The code below sets up an
MDS object and calls its method
fit_transform(). This method returns the embedded points in 2D space. Let's print the resulting mapping:
X = np.array([[0, 0, 0], [0, 0, 1], [1, 1, 1], [0, 1, 0], [0, 1, 1]]) mds = MDS(random_state=0) X_transform = mds.fit_transform(X) print(X_transform)
[[ 0.72521687 0.52943352] [ 0.61640884 -0.48411805] [-0.9113603 -0.47905115] [-0.2190564 0.71505714] [-0.21120901 -0.28132146]]
Since the embeddings are created based on the stress minimization algorithm, we can also take a look at the
stress = mds.stress_ print(stress)
This results in:
Another method of applying MDS is by constructing a distance matrix and applying MDS directly to this matrix as shown in the code below. This method is useful when a distance measure other than Euclidean distance is required. The code below computes the pairwise Manhattan distances (also called the city block distance or L1 distance) and transforms the data via MDS.
dissimilarity argument has been set to
dist_manhattan = manhattan_distances(X) mds = MDS(dissimilarity='precomputed', random_state=0) # Get the embeddings X_transform_L1 = mds.fit_transform(dist_manhattan)
This results in:
[[ 0.9847767 0.84738596] [ 0.81047787 -0.37601578] [-1.104849 -1.06040621] [-0.29311254 0.87364759] [-0.39729303 -0.28461157]]
Though, this doesn't help us gain a good intuition as to what just happened. Humans aren't that good at crunching numbers. To gain a better understanding of the entire process, let's plot the original points and their embeddings created by preserving Euclidean distances. An original point and its corresponding embedded point are both shown in the same color:
colors = ['r', 'g', 'b', 'c', 'm'] size = [64, 64, 64, 64, 64] fig = plt.figure(2, (10,4)) ax = fig.add_subplot(121, projection='3d') plt.scatter(X[:,0], X[:,1], zs=X[:,2], s=size, c=colors) plt.title('Original Points') ax = fig.add_subplot(122) plt.scatter(X_transform[:,0], X_transform[:,1], s=size, c=colors) plt.title('Embedding in 2D') fig.subplots_adjust(wspace=.4, hspace=0.5) plt.show()
Free eBook: Git Essentials
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
The plot on the right keeps the relative distances generally intact - purple, green and blue are close together, and their relative position to each other is approximately the same when compared to cyan and red.
Practical Multidimensional Scaling On Olivetti Faces Dataset From AT&T
As a practical illustration of MDS, we'll use the Olivetti faces dataset from AT&T to show the embeddings in a space with dimensions as low as 2D. The dataset has 10 64x64 bitmap images per person, each image acquired with varying facial expressions or lighting conditions.
MDS preserves the patterns in data so that different face images of the same person are close to each other in the 2D space and far away from another person's mapped face.
To avoid clutter, we'll take only the faces of 4 distinct people and apply MDS to them.
Before fetching the dataset and applying MDS, let's write a small function,
mapData(), that takes the input arguments, i.e., the pairwise distance matrix
dist_matrix, raw data matrix
X, the class variable
y, the Boolean variable
title for the graph.
The function applies MDS to the distance matrix and displays the transformed points in 2D space, with the same colored points indicating the mapped image of the same person. In a second figure, it also displays the image of each face on the graph where it is mapped in the lower-dimensional space.
We'll demonstrate MDS with different distance measures along with non-metric MDS:
def mapData(dist_matrix, X, y, metric, title): mds = MDS(metric=metric, dissimilarity='precomputed', random_state=0) # Get the embeddings pts = mds.fit_transform(dist_matrix) # Plot the embedding, colored according to the class of the points fig = plt.figure(2, (15,6)) ax = fig.add_subplot(1,2,1) ax = sns.scatterplot(x=pts[:, 0], y=pts[:, 1], hue=y, palette=['r', 'g', 'b', 'c']) # Add the second plot ax = fig.add_subplot(1,2,2) # Plot the points again plt.scatter(pts[:, 0], pts[:, 1]) # Annotate each point by its corresponding face image for x, ind in zip(X, range(pts.shape)): im = x.reshape(64,64) imagebox = OffsetImage(im, zoom=0.3, cmap=plt.cm.gray) i = pts[ind, 0] j = pts[ind, 1] ab = AnnotationBbox(imagebox, (i, j), frameon=False) ax.add_artist(ab) plt.title(title) plt.show()
The code below fetches the Olivetti faces dataset and extracts examples with labels < 4:
faces = dt.fetch_olivetti_faces() X_faces = faces.data y_faces = faces.target ind = y_faces < 4 X_faces = X_faces[ind,:] y_faces = y_faces[ind]
And without further ado, let's load the data in and run our
mapData() function on it!
Using the Euclidean Pairwise Distances
The mapping of the Olivetti faces dataset using Euclidean distances is shown below. Euclidean distance is the default distance for MDS because of how versatile and commonly used it is:
dist_euclid = euclidean_distances(X_faces) mapData(dist_euclid, X_faces, y_faces, True, 'Metric MDS with Euclidean')
We can see a nice mapping of 64x64 images to a two-dimensional space, where the class of each image is well separated from the rest in most cases. It's worth taking a moment to appreciate the fact that images residing in a 64x64 dimension space can be reduced to a two dimensional space, and still retain their informational value.
Using the Manhattan Pairwise Distances
For comparison, we can perform MDS on the same data using the Manhatten pairwise distances. The code below uses the Manhatten distance matrix as an input to
dist_L1 = manhattan_distances(X_faces) mapData(dist_L1, X_faces, y_faces, True, 'Metric MDS with Manhattan')
We can see the mapping is quite similar to the one obtained via Euclidean distances. Each class is nicely separated in the lower-dimensional space, though they're offset a bit differently on the plot.
Performing Non-Metric Multidimensional Scaling
As a final example, we'll show non-metric MDS on the same dataset using Euclidean distances and see how it compares with the corresponding metric version:
mapData(dist_euclid, X_faces, y_faces, False, 'Non-metric MDS with Euclidean')
There are quite a lot of hiccups here. We can see that this version of MDS does not perform so well on the Olivetti faces dataset.
This is mainly because of the quantitative nature of data.
Non-metric MDS maintains the ranked distances between objects rather than the actual distances.
The n_components Parameter in MDS
One of the important hyper-parameters involved in MDS is the size of the lower-dimensional space in which the points are embedded.
This would be very relevant when MDS is used as a preprocessing step for dimensionality reduction.
The question arises:
Just how many dimensions do you pick, so that you reduce the dimensionality the most you can, without losing important information?
A simple method to choose a value of this parameter is to run MDS on different values of
n_components and plot the
stress_ value for each embedding. Given that the
stress_ value decreases with higher dimensions - you pick a point that has a fair tradeoff between
The code below runs MDS by varying the dimensions from 1-20 and plots the corresponding
stress_ attribute for each embedding:
stress =  # Max value for n_components max_range = 21 for dim in range(1, max_range): # Set up the MDS object mds = MDS(n_components=dim, dissimilarity='precomputed', random_state=0) # Apply MDS pts = mds.fit_transform(dist_euclid) # Retrieve the stress value stress.append(mds.stress_) # Plot stress vs. n_components plt.plot(range(1, max_range), stress) plt.xticks(range(1, max_range, 2)) plt.xlabel('n_components') plt.ylabel('stress') plt.show()
We can see that increasing the value of
n_components decreases the stress value at the beginning and then the curve levels off. There's almost no difference between 18 and 19 dimensions, but there's a huge difference between 1 and 2 dimensions.
The elbow of the curve is a good choice for the optimal value of
n_components. In this case the value can be taken at 4, which is an amazing 0.09% reduction of features/attributes.
Going Further - Hand-Held End-to-End Project
Your inquisitive nature makes you want to go further? We recommend checking out our Guided Project: "Hands-On House Price Prediction - Machine Learning in Python".
In this guided project - you'll learn how to build powerful traditional machine learning models as well as deep learning models, utilize Ensemble Learning and traing meta-learners to predict house prices from a bag of Scikit-Learn and Keras models.
Using Keras, the deep learning API built on top of Tensorflow, we'll experiment with architectures, build an ensemble of stacked models and train a meta-learner neural network (level-1 model) to figure out the pricing of a house.
Deep learning is amazing - but before resorting to it, it's advised to also attempt solving the problem with simpler techniques, such as with shallow learning algorithms. Our baseline performance will be based on a Random Forest Regression algorithm. Additionally - we'll explore creating ensembles of models through Scikit-Learn via techniques such as bagging and voting.
This is an end-to-end project, and like all Machine Learning projects, we'll start out with - with Exploratory Data Analysis, followed by Data Preprocessing and finally Building Shallow and Deep Learning Models to fit the data we've explored and cleaned previously.
This guide was an introduction to Multidimensional Scaling in Python, using Scikit-Learn. We've taken a look at how Multidimensional Scaling works, its hyperparameters, which vartiations exist and then applied it on a practical dataset.
We've used the Olivetti Faces dataset, from AT&T and illustrated that images residing in a 64x64 dimensional space can be mapped to a two-dimensional space, and still retain the individual patterns or clusters across images.