Numpy Matrix & Vector Multiplication
NumPy (oftentimes also written as Numpy) is a linear algebra library, which is the basis for most of the scientific computing done in Python. Most libraries for data science, analysis and machine learning are built on top of NumPy as the basis for linear algebra, and also have tight coupling and conversion options between data types and NumPy arrays.
Code written in Numpy is actually executed by optimized C code, which gives a significant speed up in the execution.
NumPy's basic data structure is the N-dimensional array, aptly named ndarray
. Objects in the ndarray
must have the same type, and the arrays are of fixed size (due to memory allocation of the underlying C code). When you change the size of an ndarray
, Numpy actually deletes the original array and creates a new one.
Note: Both matrices and vectors can be represented by an ndarray
- just recall that a vector is a one dimensional matrix. Matrix and vector multiplication is a simple but immensely useful computation, and is applied in applied mathematics, statistics, physics, economics and engineering. In essence - matrix and vector multiplication is one of the most important operations in computational applications of linear algebra - deep learning included.
Matrix and Vector Multiplication in NumPy
In order to fully exploit NumPy's capabilities, our code should be written in vectorized form - that is, whenever possible, substituting loops with Numpy operations. One of the basic building blocks for doing this is matrix multiplication.
Recalling linear algebra, given a matrix
a
with shape(i,j)
and a matrixb
with shape(j,k)
,a.dot(b)
will output a matrix with shape(i,k)
.
Vector multiplication reduces to the case when i==1
and k == 1
. Here's an example of matrix multiplication:
import numpy as np
a = np.ones((3,4))
b = np.ones((4,5))
c = a.dot(b)
print(c.shape)
print(c)
(3, 5)
[[4. 4. 4. 4. 4.]
[4. 4. 4. 4. 4.]
[4. 4. 4. 4. 4.]]
And here's an example of vector multiplication:
a = np.ones((1,4))
b = np.ones((4,1))
c = a.dot(b)
print(c.shape)
print(c)
(1, 1)
[[4.]]
Vectorized NumPy Matrix Multiplication
One common anti-pattern when working with Numpy is to iterate over rows of an ndarray
using Python loops. This slows down your code, because it prevents Numpy from executing the whole logic inside its optimized C code.
For example, suppose you have a basket of three products with profit rates 10%, 20% and 30%, and you also have an array with the daily revenue for each product. You want to calculate total profit. One (bad) way to this would be:
profit_rate = np.array([0.1, 0.2, 0.3])
daily_revenues = np.array([[3,4,5,6],[9,7,5,3],[1,2,3,4]])
total_profit = 0
for i, revenue in enumerate(daily_revenues):
total_profit += + np.sum(revenue*profit_rate[i])
print(total_profit)
9.600000000000001
It is better if we vectorize our code and write everything using vector/matrix multiplication and NumPy functions:
profit_rate = np.array([0.1, 0.2, 0.3])
daily_revenues = np.array([[3,4,5,6],[9,7,5,3],[1,2,3,4]])
print(np.sum(profit_rate.dot(daily_revenues)))
9.600000000000001
How much of a difference does this make?
profit_rate = np.array([0.1, 0.2, 0.3])
daily_revenues = np.array([[3,4,5,6],[9,7,5,3],[1,2,3,4]])
def get_profit(profit_rate, daily_revenues):
total_profit = 0
for i, revenue in enumerate(daily_revenues):
total_profit += np.sum(revenue*profit_rate[i])
return total_profit
def get_profit_vectorized(profit_rate, daily_revenues):
return np.sum(profit_rate.dot(daily_revenues))
%timeit get_profit(profit_rate, daily_revenues)
20 µs ± 1.49 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit get_profit_vectorized(profit_rate, daily_revenues)
4.37 µs ± 22 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Around 5 times faster when using vectorized operations!