NumPy (oftentimes also written as Numpy) is a linear algebra library, which is the basis for most of the scientific computing done in Python. Most libraries for data science, analysis and machine learning build on top of NumPy as the basis for linear algebra, and also have tight coupling and conversion options between data types and NumPy arrays.
Code written in Numpy is actually executed by optimized C code, which gives a significant speed up in the execution.
NumPy's basic data structure is the N-dimensional array, aptly named
ndarray. Objects in the
ndarray must have the same type, and the arrays are of fixed size (due to memory allocation of the underlying C code). When you change the size of an
ndarray, Numpy actually deletes the original array and creates a new one.
Note: Both matrices and vectors can be represented by an
ndarray - just recall that a vector is a one dimensional matrix. Matrix and vector multiplication is a simple but immensly useful computation, and is applied in applied mathematics, statistics, physics, economics and engineering. In essense - matrix and vector multiplication is one of the most important operations in computational applications of linear algebra - deep learning included.
Matrix and Vector Multiplication in NumPy
In order to fully exploit NumPy's capabilities, our code should be written in vectorized form - that is, whenever possible, substituting loops with Numpy operations. One of the basic building blocks for doing this is matrix multiplication.
Recalling linear algebra, given a matrix
(i,j)and a matrix
a.dot(b)will output an matrix with shape
Vector multiplication reduces to the case when
k == 1. Here's an example of matrix multiplication:
import numpy as np a = np.ones((3,4)) b = np.ones((4,5)) c = a.dot(b) print(c.shape) print(c)
(3, 5) [[4. 4. 4. 4. 4.] [4. 4. 4. 4. 4.] [4. 4. 4. 4. 4.]]
And here's an example of vector multiplication:
a = np.ones((1,4)) b = np.ones((4,1)) c = a.dot(b) print(c.shape) print(c)
(1, 1) [[4.]]
Vectorized NumPy Matrix Multiplication
One common anti-pattern when working with Numpy is to iterate over rows of an
ndarray using Python loops. This slows down your code, because it prevents Numpy from executing the whole logic inside its optimized C code.
For example, suppose you have a basket of three products with profit rates 10%, 20% and 30%, and you also have an array with the daily revenue for each product. You want to calculate total profit. One (bad) way to this would be:
profit_rate = np.array([0.1, 0.2, 0.3]) daily_revenues = np.array([[3,4,5,6],[9,7,5,3],[1,2,3,4]]) total_profit = 0 for i, revenue in enumerate(daily_revenues): total_profit += + np.sum(revenue*profit_rate[i]) print(total_profit)
It is better if we vectorize our code and write everything using vector/matrix multiplication and NumPy functions:
profit_rate = np.array([0.1, 0.2, 0.3]) daily_revenues = np.array([[3,4,5,6],[9,7,5,3],[1,2,3,4]]) print(np.sum(profit_rate.dot(daily_revenues)))
How much of a difference does this make?
profit_rate = np.array([0.1, 0.2, 0.3]) daily_revenues = np.array([[3,4,5,6],[9,7,5,3],[1,2,3,4]]) def get_profit(profit_rate, daily_revenues): total_profit = 0 for i, revenue in enumerate(daily_revenues): total_profit += np.sum(revenue*profit_rate[i]) return total_profit def get_profit_vectorized(profit_rate, daily_revenues): return np.sum(profit_rate.dot(daily_revenues))
timeit get_profit(profit_rate, daily_revenues) 20 µs ± 1.49 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
timeit get_profit_vectorized(profit_rate, daily_revenues) 4.37 µs ± 22 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Around 5 times faster when using vectorized operations!
Building Your First Convolutional Neural Network With Keras# python# artificial intelligence# machine learning# tensorflow
Most resources start with pristine datasets, start at importing and finish at validation. There's much more to know. Why was a class predicted? Where was...