Imagine you have a playlist of your favorite songs on your phone. This playlist is a list where each song is placed in a specific order. You can play the first song, skip to the second, jump to the fifth, and so on. This playlist is a lot like an array in computer programming.
Arrays stand as one of the most fundamental and widely used data structures.
In essence, an array is a structured way to store multiple items (like numbers, characters, or even other arrays) in a specific order, and you can quickly access, modify, or remove any item if you know its position (index).
In this guide, we'll give you a comprehensive overview of the array data structure. First of all, we'll take a look at what arrays are and what are their main characteristics. We'll then transition into the world of Python, exploring how arrays are implemented, manipulated, and applied in real-world scenarios.
Understanding the Array Data Structure
Arrays are among the oldest and most fundamental data structures used in computer science and programming. Their simplicity, combined with their efficiency in certain operations, makes them a staple topic for anyone delving into the realm of data management and manipulation.
An array is a collection of items, typically of the same type, stored in contiguous memory locations.
This contiguous storage allows arrays to provide constant-time access to any element, given its index. Each item in an array is called an element, and the position of an element in the array is defined by its index, which usually starts from zero.
For instance, consider an array of integers:
[10, 20, 30, 40, 50]. Here, the element
20 has an index of
There are multiple advantages of using arrays to store our data. For example, due to their memory layout, arrays allow for O(1) (constant) time complexity when accessing an element by its index. This is particularly beneficial when we need random access to elements. Additionally, arrays are stored in contiguous memory locations, which can lead to better cache locality and overall performance improvements in certain operations. Another notable advantage of using arrays is that, since arrays have a fixed size once declared, it's easier to manage memory and avoid unexpected overflows or out-of-memory errors.
Note: Arrays are especially useful in scenarios where the size of the collection is known in advance and remains constant, or where random access is more frequent than insertions and deletions.
On the other side, arrays come with their own set of limitations. One of the primary limitations of traditional arrays is their fixed size. Once an array is created, its size cannot be changed. This can lead to issues like wasted memory (if the array is too large) or the need for resizing (if the array is too small). Besides that, inserting or deleting an element in the middle of an array requires shifting of elements, leading to O(n) time complexity for these operations.
To sum this all up, let's illustrate the main characteristics of arrays using the song playlist example from the beginning of this guide. An array is a data structure that:
Is Indexed: Just like each song on your playlist has a number (1, 2, 3, ...), each element in an array has an index. But, in most programming languages, the index starts at 0. So, the first item is at index 0, the second at index 1, and so on.
Has Fixed Size: When you create a playlist for, say, 10 songs, you can't add an 11th song without removing one first. Similarly, arrays have a fixed size. Once you create an array of a certain size, you can't add more items than its capacity.
Is Homogeneous: All songs in your playlist are music tracks. Similarly, all elements in an array are of the same type. If you have an array of integers, you can't suddenly store a text string in it.
Has Direct Access: If you want to listen to the 7th song in your playlist, you can jump directly to it. Similarly, with arrays, you can instantly access any element if you know its index.
Contiguous Memory: This is a bit more technical. When an array is created in a computer's memory, it occupies a continuous block of memory. Think of it like a row of adjacent lockers in school. Each locker is next to the other, with no gaps in between.
Python and Arrays
Python, known for its flexibility and ease of use, offers multiple ways to work with arrays. While Python does not have a native array data structure like some other languages, it provides powerful alternatives that can function similarly and even offer extended capabilities.
At first glance, Python's list might seem synonymous with an array, but there are subtle differences and nuances to consider:
|A built-in Python data structure||Not native in Python - they come from the `array` module|
|Dynamic size||Fixed (predefined) size|
|Can hold items of different data types||Hold items of the same type|
|Provide a range of built-in methods for manipulation||Need to import external modules|
|O(1) time complexity for access operations||O(1) time complexity for access operations|
|Consume more memory||More memory efficient|
Looking at this table, it comes naturally to ask - "When to use which?". Well, if you need a collection that can grow or shrink dynamically and can hold mixed data types, Python's list is the way to go. However, for scenarios requiring a more memory-efficient collection with elements of the same type, you might consider using Python's
array module or external libraries like NumPy.
The array Module in Python
When most developers think of arrays in Python, they often default to thinking about lists. However, Python offers a more specialized array structure through its built-in
array module. This module provides a space-efficient storage of basic C-style data types in Python.
While Python lists are incredibly versatile and can store any type of object, they can sometimes be overkill, especially when you only need to store a collection of basic data types, like integers or floats. The
array module provides a way to create arrays that are more memory efficient than lists for specific data types.
Creating an Array
To use the
array module, you first need to import it:
from array import array
Once imported, you can create an array using the
arr = array('i', [1, 2, 3, 4, 5]) print(arr)
'i' argument indicates that the array will store signed integers. There are several other type codes available, such as
'f' for floats and
'd' for doubles.
Accessing and Modifying Elements
You can access and modify elements in an array just like you would with a list:
print(arr) # Outputs: 3
And now, let's modify the element by changing it's value to
arr = 6 print(arr) # Outputs: array('i', [1, 2, 6, 4, 5])
array module provides several methods to manipulate arrays:
append()- Adds an element to the end of the array:
arr.append(7) print(arr) # Outputs: array('i', [1, 2, 6, 4, 5, 7])
extend()- Appends iterable elements to the end:
arr.extend([8, 9]) print(arr) # Outputs: array('i', [1, 2, 6, 4, 5, 7, 8, 9])
pop()- Removes and returns the element at the given position:
arr.pop(2) print(arr) # Outputs: array('i', [1, 2, 4, 5, 7, 8, 9])
remove(): Removes the first occurrence of the specified value:
arr.remove(2) print(arr) # Outputs: array('i', [1, 4, 5, 7, 8, 9])
reverse(): Reverses the order of the array:
arr.reverse() print(arr) # Outputs: array('i', [9, 8, 7, 5, 4, 1])
Note: There are more methods than we listed here. Refer to the official Python documentation to see a list of all available methods in the
array module offers a more memory-efficient way to store basic data types, it's essential to remember its limitations. Unlike lists, arrays are homogeneous. This means all elements in the array must be of the same type. Also, you can only store basic C-style data types in arrays. If you need to store custom objects or other Python types, you'll need to use a list or another data structure.
NumPy, short for Numerical Python, is a foundational package for numerical computations in Python. One of its primary features is its powerful N-dimensional array object, which offers fast operations on arrays, including mathematical, logical, shape manipulation, and more.
NumPy arrays are more versatile than Python's built-in
arraymodule and are a staple in data science and machine learning projects.
Why Use NumPy Arrays?
The first thing that comes to mind is performance. NumPy arrays are implemented in C and allow for efficient memory storage and faster operations due to optimized algorithms and the benefits of contiguous memory storage.
While Python's built-in arrays are one-dimensional, NumPy arrays can be multi-dimensional, making them ideal for representing matrices or tensors.
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
Finally, NumPy provides a vast array of functions to operate on these arrays, from basic arithmetic to advanced mathematical operations, reshaping, splitting, and more.
Note: When you know the size of the data in advance, pre-allocating memory for arrays (especially in NumPy) can lead to performance improvements.
Creating a NumPy Array
To use NumPy, you first need to install it (
pip install numpy) and then import it:
import numpy as np
Once imported, you can create a NumPy array using the
arr = np.array([1, 2, 3, 4, 5]) print(arr) # Outputs: [1 2 3 4 5]
You can also create multi-dimensional arrays:
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) print(matrix)
This will give us:
[[1 2 3] [4 5 6] [7 8 9]]
Besides these basic ways we can create arrays, NumPy provides us with other clever ways we can create arrays. One of which is the
arange() method. It creates arrays with regularly incrementing values:
arr = np.arange(10) print(arr) # Outputs: [0 1 2 3 4 5 6 7 8 9]
Another one is the
linspace() method, which creates arrays with a specified number of elements, spaced equally between specified beginning and end values:
even_space = np.linspace(0, 1, 5) print(even_space) # Outputs: [0. 0.25 0.5 0.75 1. ]
Accessing and Modifying Elements
Accessing and modifying elements in a NumPy array is intuitive:
print(arr) # Outputs: 3 arr = 6 print(arr) # Outputs: [1 2 6 4 5]
Doing pretty much the same for multi-dimensional arrays:
print(matrix[1, 2]) # Outputs: 6 matrix[1, 2] = 10 print(matrix)
Will change the value of the element in the second row (index
1) and the third column (index
[[1 2 3] [4 5 20] [7 8 9]]
Changing the Shape of an Array
NumPy offers many functions and methods to manipulate and operate on arrays. For example, you can use the
reshape() method to change the shape of an array. Say we have a simple array:
import numpy as np # Create a 1D array arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) print("Original Array:") print(arr) # Output: # Original Array: #[ 1 2 3 4 5 6 7 8 9 10 11 12]
And we want to reshape it to a 3x4 matrix. All you need to do is use the
reshape() method with desired dimensions passed as arguments:
# Reshape it to a 3x4 matrix reshaped_arr = arr.reshape(3, 4) print("Reshaped Array (3x4):") print(reshaped_arr)
This will result in:
Reshaped Array (3x4): [[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12]]
numpy.dot() method is used for matrix multiplication. It returns the dot product of two arrays. For one-dimensional arrays, it is the inner product of the arrays. For 2-dimensional arrays, it is equivalent to matrix multiplication, and for N-D, it is a sum product over the last axis of the first array and the second-to-last of the second array.
Let's see how it works. First, let's compute the dot product of two 1-D arrays (the inner product of the vectors):
import numpy as np # 1-D array dot product vec1 = np.array([1, 2, 3]) vec2 = np.array([4, 5, 6]) dot_product_1d = np.dot(vec1, vec2) print("Dot product of two 1-D arrays:") print(dot_product_1d) # Outputs: 32 (1*4 + 2*5 + 3*6)
This will result in:
Dot product of two 1-D arrays: 32
32 is, in fact, the inner product of the two arrays - (14 + 25 + 3*6). Next, we can perform matrix multiplication of two 2-D arrays:
# 2-D matrix multiplication mat1 = np.array([[1, 2], [3, 4]]) mat2 = np.array([[2, 0], [1, 3]]) matrix_product = np.dot(mat1, mat2) print("Matrix multiplication of two 2-D arrays:") print(matrix_product)
Which will give us:
Matrix multiplication of two 2-D arrays: [[ 4 6] [10 12]]
NumPy arrays are a significant step up from Python's built-in lists and the
array module, especially for scientific and mathematical computations. Their efficiency, combined with the rich functionality provided by the NumPy library, makes them an indispensable tool for anyone looking to do numerical operations in Python.
Advice: This is just a quick overview of what you can do with the NumPy library. For more information about the library, you can read our "NumPy Tutorial: A Simple Example-Based Guide"
Arrays, a cornerstone of computer science and programming, have proven their worth time and again across various applications and domains. In Python, this fundamental data structure, through its various incarnations like lists, the
array module, and the powerful NumPy arrays, offers developers a blend of efficiency, versatility, and simplicity.
Throughout this guide, we've journeyed from the foundational concepts of arrays to their practical applications in Python. We've seen how arrays, with their memory-contiguous nature, provide rapid access times, and how Python's dynamic lists bring an added layer of flexibility. We've also delved into the specialized world of NumPy, where arrays transform into powerful tools for numerical computation.