Sorting, although a basic operation, is one of the most important operations a computer should perform. It is a building block in many other algorithms and procedures, such as searching and merging. Knowing different sorting algorithms could help you better understand the ideas behind the different algorithms, as well as help you come up with better algorithms.
The Selection Sort algorithm sorts an array by finding the minimum value of the unsorted part and then swapping it with the first unsorted element. It is an in-place algorithm, meaning you won't need to allocate additional lists. While slow, it is still used as the main sorting algorithm in systems where memory is limited.
In this article, we will explain how the Selection Sort works and implement it in Python. We will then break down the actions of the algorithm to learn its time complexity.
So how does the selection sort work? Selection sort breaks the input list in two parts, the sorted part which initially is empty, and the unsorted part, which initially contains the list of all elements. The algorithm then selects the minimum value of all the unsorted file and swaps it with the first unsorted value, and then increases the sorted part by one.
A high level implementation of this sort would look something like this:
def selection_sort(L): for i in range(len(L) - 1): min_index = argmin(L[i:]) L[i], L[min_index] = L[min_index], L[i]
In the above pseudocode,
argmin() is a function that returns the index of the minimum value. The algorithm uses a variable
i to keep track of where the sorted list ends and where the unsorted one begins. Since we start with no sorted items and take the minimum value, it will always be the case that every member of the unsorted part is greater than any member of the sorted part.
The first line increments the value of
i, the second line finds the minimum value's index, and the third line swaps those values. Swapping works because Python calculated the right-hand side before assigning anything to the left-hand side, so we don't need any temporary variables.
Let's see how it works in action with a list that contains the following elements:
[3, 5, 1, 2, 4].
We begin with the unsorted list:
- 3 5 1 2 4
The unsorted section has all the elements. We look through each item and determine that
1 is the smallest element. So, we swap
- 1 5 3 2 4
Of the remaining unsorted elements,
[5, 3, 2, 4],
2 is the lowest number. We now swap
- 1 2 3 5 4
This process continues until the list is sorted:
- 1 2 3 5 4
- 1 2 3 4 5
- 1 2 3 4 5
Let's see how we can implement this in Python!
The trick to implementing this algorithm is keeping track of the minimum value and swapping two elements of the list. Open a file named
sort.py in your favorite editor and enter the following code in it:
def selection_sort(L): # i indicates how many items were sorted for i in range(len(L)-1): # To find the minimum value of the unsorted segment # We first assume that the first element is the lowest min_index = i # We then use j to loop through the remaining elements for j in range(i+1, len(L)-1): # Update the min_index if the element at j is lower than it if L[j] < L[min_index]: min_index = j # After finding the lowest item of the unsorted regions, swap with the first unsorted item L[i], L[min_index] = L[min_index], L[i]
Now let's add some code to the file to test the algorithm:
L = [3, 1, 41, 59, 26, 53, 59] print(L) selection_sort(L) # Let's see the list after we run the Selection Sort print(L)
You can then open a terminal and run to see the results:
$ python sort.py [3, 1, 41, 59, 26, 53, 59] [1, 3, 26, 41, 53, 59, 59]
The list was correctly sorted! We know how it works and we can implement the Selection Sort in Python. Let's get into some theory and look at its performance with regards to time.
Time Complexity Calculation
So how long does it take for selection sort to sort our list? We are going to take an approach and calculate exactly how much time the selection sort algorithm takes, given an array of size
n. The first line of the code is:
This line shouldn't take that much since it's only setting the function stack. We say that this is a constant - the size of our input does not change how long it takes for this code to run. Let's say it takes
c1 operations to perform this line of code. Next, we have:
for i in range(len(L)-1):
This one is a little trickier. First of all, we have two function invocations,
range(), which are performed before the
for loop begins. The cost of
len() is also independent of size in CPython, which is the default Python implementation on Windows, Linux, and Mac. This is also true for the initialization of
range(). Let's call these two together
Next, we have the
for, which is running
n - 1 times. This is not a constant, the size of the input does make an impact on how long this is executed. So we have to multiply whatever time it takes for one loop to complete by
n - 1.
There is a constant cost for evaluating the
in operator, let's say
c3. That covers the outer for-loop.
The variable assignment is also done in constant time. We'll call this one
min_index = i
We now encounter the inner for-loop. It has two constant function invocations. Let's say they take
c5is different from
rangehere has two arguments, and there is an addition operation being performed here.
So far we have
c1 + c2 + (n - 1) * (c3 + c4 + c5) operations, and then our inner loop begins, multiplying everything by...? Well, it's tricky, but if you look closely, it takes
n - 2 times in the first loop,
n - 3 in the second one, and 1 in the last time.
We need to multiply everything by the sum of all numbers between 1 and
n - 2. Mathematicians have told us that the sum would be
(n - 2) * (n - 1) / 2. Feel free to read more about the sum of integers between 1 and any positive number
The contents of the inner loop are completed in constant time as well. Let's assign the time it takes Python to do the
if, assignment statement and the variable swap take up an arbitrary constant time of
for j in range(i+1, len(L)-1): if L[j] < L[min_index]: min_index = j L[i], L[min_index] = L[min_index], L[i]
All-together we get
c1 + c2 + (n - 1) * (c3 + c4 + c5) + (n - 2) * (n - 3) * c6 / 2.
We can simplify this to:
a * n * n + b * n + c, where
c representing the values of the evaluated constants.
This is known as O(n2). What does that mean? In summary, our algorithm's performance is based on the squared size of our input list. Therefore, if we double the size of our list, the time it takes to sort it would be multiplied by 4! If we divide the size of our input by 3, the time would shrink by 9!
In this article, we looked at how Selection Sort works and implemented it in Python. We then broke the code down line by line to analyze the algorithm's time complexity.
Learning sorting algorithms will help you get a better understanding of algorithms in general. So, in case you haven't already, you can check out our sorting algorithms overview!