Merge Sort in Python

Introduction

Merge Sort is one of the most famous sorting algorithms. If you're studying Computer Science, Merge Sort, alongside Quick Sort is likely the first efficient, general-purpose sorting algorithm you have heard of. It is also a classic example of a divide-and-conquer category of algorithms.

Merge Sort

The way Merge Sort works is:

An initial array is divided into two roughly equal parts. If the array has an odd number of elements, one of those "halves" is by one element larger than the other.

The subarrays are divided over and over again into halves until you end up with arrays that have only one element each.

Then you combine the pairs of one-element arrays into two-element arrays, sorting them in the process. Then these sorted pairs are merged into four-element arrays, and so on until you end up with the initial array sorted.

Here's a visualization of Merge Sort:

As you can see, the fact that the array couldn't be divided into equal halves isn't a problem, the 3 just "waits" until the sorting begins.

There are two main ways we can implement the Merge Sort algorithm, one is using a top-down approach like in the example above, which is how Merge Sort is most often introduced.

The other approach, i.e. bottom-up, works in the opposite direction, without recursion (works iteratively) - if our array has N elements we divide it into N subarrays of one element and sort pairs of adjacent one-element arrays, then sort the adjacent pairs of two-element arrays and so on.

Note: The bottom-up approach provides an interesting optimization which we'll discuss later. We'll be implementing the top-down approach as it's simpler and more intuitive coupled with the fact that there's no real difference between the time complexity between them without specific optimizations.

The main part of both these approaches is how we combine (merge) the two smaller arrays into a larger array. This is done fairly intuitively, let's say we examine the last step in our previous example. We have the arrays:

  • A: 2 4 7 8

  • B: 1 3 11

  • sorted: empty

The first thing we do is look at the first element of both arrays. We find the one that's smaller, in our case that's 1, so that's the first element of our sorted array, and we move forward in the B array:

  • A: 2 4 7 8

  • B: 1 3 11

  • sorted: 1

Then we look at the next pair of elements 2 and 3; 2 is smaller so we put it in our sorted array and move forward in array A. Of course, we don't move forward in array B and we keep our pointer at 3 for future comparisons:

  • A: 2 4 7 8

  • B: 1 3 11

  • sorted: 1 2

Using the same logic we move through the rest and end up with with an array of {1, 2, 3, 4, 7, 8, 11}.

The two special cases that can occur are:

  • Both subarrays have the same element. We can move forward in either one and add the element to the sorted array. We can technically move forward in both arrays and add both elements to the sorted array but this would require special behavior when we encountered the same elements in both arrays.
  • We "run" out of elements in one subarray. For example, we have an array with {1, 2, 3} and an array with {9, 10, 11}. Clearly we'll go through all the elements in the first array without moving forward even once in the second. Whenever we run out of elements in a subarray we simply add the elements of the second one after the other.

Keep in mind that we can sort however we want - this example sorts integers in ascending order but we can just as easily sort in descending order, or sort custom objects.

Implementation

We'll be implementing Merge Sort on two types of collections - on arrays of integers (typically used to introduce sorting) and on custom objects (a more practical and realistic scenario).

We'll implement the Merge Sort algorithm using the top-down approach. The algorithm doesn't look very "pretty" and can be confusing, so we'll go through each step in detail.

Sorting Arrays

Let's start with the easy part. The base idea of the algorithm is to divide (sub)arrays into halves and sort them recursively. We want to keep doing this as much as possible, i.e. until we end up with subarrays that have only one element:

def merge_sort(array, left_index, right_index):
    if left_index > right_index:
        return

    middle = (left_index + right_index)//2
    merge_sort(array, left_index, middle)
    merge_sort(array, middle + 1, right_index)
    merge(array, left_index, right_index, middle)

By calling the merge method last, we make sure that all the divisions will happen before we start the sorting. We use the // operator to be explicit about the fact that we want integer values for our indices.

The next step is the actual merging part through a few steps and scenarios:

  • Create copies of our arrays. The first array being the subarray from [left_index,..,middle] and the second from [middle+1,...,right_index]
  • We go through both copies (keeping track of pointers in both arrays), picking the smaller of the two elements we're currently looking at, and add them to our sorted array. We move forward in whichever array we've chosen the element from, and forward in the sorted array regardless.
  • If we run out of elements in one of our copies - simply add the remaining elements in the other copy to the sorted array.

With our requirements laid out, let's go ahead and define a merge() function:

def merge(array, left_index, right_index, middle):
    # Make copies of both arrays we're trying to merge

    # The second parameter is non-inclusive, so we have to increase by 1
    left_copy = array[left_index:middle + 1]
    right_copy = array[middle+1:right_index+1]

    # Initial values for variables that we use to keep
    # track of where we are in each array
    left_copy_index = 0
    right_copy_index = 0
    sorted_index = left_index

    # Go through both copies until we run out of elements in one
    while left_copy_index < len(left_copy) and right_copy_index < len(right_copy):

        # If our left_copy has the smaller element, put it in the sorted
        # part and then move forward in left_copy (by increasing the pointer)
        if left_copy[left_copy_index] <= right_copy[right_copy_index]:
            array[sorted_index] = left_copy[left_copy_index]
            left_copy_index = left_copy_index + 1
        # Opposite from above
        else:
            array[sorted_index] = right_copy[right_copy_index]
            right_copy_index = right_copy_index + 1

        # Regardless of where we got our element from
        # move forward in the sorted part
        sorted_index = sorted_index + 1

    # We ran out of elements either in left_copy or right_copy
    # so we will go through the remaining elements and add them
    while left_copy_index < len(left_copy):
        array[sorted_index] = left_copy[left_copy_index]
        left_copy_index = left_copy_index + 1
        sorted_index = sorted_index + 1

    while right_copy_index < len(right_copy):
        array[sorted_index] = right_copy[right_copy_index]
        right_copy_index = right_copy_index + 1
        sorted_index = sorted_index + 1

Now let's test our program out:

array = [33, 42, 9, 37, 8, 47, 5, 29, 49, 31, 4, 48, 16, 22, 26]
merge_sort(array, 0, len(array) -1)
print(array)

And the output is:

[4, 5, 8, 9, 16, 22, 26, 29, 31, 33, 37, 42, 47, 48, 49]

Sorting Custom Objects

Now that we have the basic algorithm down we can take a look at how to sort custom classes. We can override the __eq__, __le__, __ge__ and other operators as needed for this.

This lets us use the same algorithm as above but limits us to only one way of sorting our custom objects, which in most cases isn't what we want. A better idea is to make the algorithm itself more versatile, and pass a comparison function to it instead.

First we'll implement a custom class, Car and add a few fields to it:

class Car:
    def __init__(self, make, model, year):
        self.make = make
        self.model = model
        self.year = year

    def __str__(self):
        return str.format("Make: {}, Model: {}, Year: {}", self.make, self.model, self.year)

Then we'll make a few changes to our Merge Sort methods. The easiest way to achieve what we want is by using lambda functions. You can see that we only added an extra parameter and changed the method calls accordingly, and only one other line of code to make this algorithm a lot more versatile:

def merge(array, left_index, right_index, middle, comparison_function):
    left_copy = array[left_index:middle + 1]
    right_copy = array[middle+1:right_index+1]

    left_copy_index = 0
    right_copy_index = 0
    sorted_index = left_index

    while left_copy_index < len(left_copy) and right_copy_index < len(right_copy):

        # We use the comparison_function instead of a simple comparison operator
        if comparison_function(left_copy[left_copy_index], right_copy[right_copy_index]):
            array[sorted_index] = left_copy[left_copy_index]
            left_copy_index = left_copy_index + 1
        else:
            array[sorted_index] = right_copy[right_copy_index]
            right_copy_index = right_copy_index + 1

        sorted_index = sorted_index + 1

    while left_copy_index < len(left_copy):
        array[sorted_index] = left_copy[left_copy_index]
        left_copy_index = left_copy_index + 1
        sorted_index = sorted_index + 1

    while right_copy_index < len(right_copy):
        array[sorted_index] = right_copy[right_copy_index]
        right_copy_index = right_copy_index + 1
        sorted_index = sorted_index + 1


def merge_sort(array, left_index, right_index, comparison_function):
    if left_index >= right_index:
        return

    middle = (left_index + right_index)//2
    merge_sort(array, left_index, middle, comparison_function)
    merge_sort(array, middle + 1, right_index, comparison_function)
    merge(array, left_index, right_index, middle, comparison_function)

Let's test out or modified algorithm on a few Car instances:

car1 = Car("Alfa Romeo", "33 SportWagon", 1988)
car2 = Car("Chevrolet", "Cruze Hatchback", 2011)
car3 = Car("Corvette", "C6 Couple", 2004)
car4 = Car("Cadillac", "Seville Sedan", 1995)

array = [car1, car2, car3, car4]

merge_sort(array, 0, len(array) -1, lambda carA, carB: carA.year < carB.year)

print("Cars sorted by year:")
for car in array:
    print(car)

print()
merge_sort(array, 0, len(array) -1, lambda carA, carB: carA.make < carB.make)
print("Cars sorted by make:")
for car in array:
    print(car)

We get the output:

Cars sorted by year:
Make: Alfa Romeo, Model: 33 SportWagon, Year: 1988
Make: Cadillac, Model: Seville Sedan, Year: 1995
Make: Corvette, Model: C6 Couple, Year: 2004
Make: Chevrolet, Model: Cruze Hatchback, Year: 2011

Cars sorted by make:
Make: Alfa Romeo, Model: 33 SportWagon, Year: 1988
Make: Cadillac, Model: Seville Sedan, Year: 1995
Make: Chevrolet, Model: Cruze Hatchback, Year: 2011
Make: Corvette, Model: C6 Couple, Year: 2004

Optimization

Let's elaborate the difference between top-down and bottom-up Merge Sort now. Bottom-up works like the second half of the top-down approach where instead of recursively calling the sort on halved subarrays, we iteratively sort adjacent subarrays.

One thing we can do to improve this algorithm is to consider sorted chunks instead of single elements before breaking the array down.

What this means is that, given an array such as {4, 8, 7, 2, 11, 1, 3}, instead of breaking it down into {4}, {8}, {7}, {2}, {11}, {1} ,{3} - it's divided into subarrays which may already be sorted: {4,8}, {7}, {2,11}, {1,3}, and then sorting them.

With real life data we often have a lot of these already sorted subarrays that can noticeably shorten the execution time of Merge Sort.

Another thing to consider with Merge Sort, particularly the top-down version is multi-threading. Merge Sort is convenient for this since each half can be sorted independently of its pair. The only thing that we need to make sure of is that we're done sorting each half before we merge them.

Merge Sort is however relatively inefficient (both time and space) when it comes to smaller arrays, and is often optimized by stopping when we reach an array of ~7 elements, instead of going down to arrays with one element, and calling Insertion Sort to sort them instead, before merging into a larger array.

This is because Insertion Sort works really well with small and/or nearly sorted arrays.

Conclusion

Merge Sort is an efficient, general-purpose sorting algorithm. It's main advantage is the reliable runtime of the algorithm and it's efficiency when sorting large arrays. Unlike Quick Sort, it doesn't depend on any unfortunate decisions that lead to bad runtimes.

One of the main drawbacks is the additional memory that Merge Sort uses to store the temporary copies of arrays before merging them. However, Merge Sort is an excellent, intuitive example to introduce future Software Engineers to the divide-and-conquer approach to creating algorithms.

We've implemented Merge Sort both on simple integer arrays and on custom objects via a lambda function used for comparison. In the end, possible optimizations for both approaches were briefly discussed.

Author image
Hey guys, I want to point out that I don't have any social media - since there have been some messages to colleague of mine with the same name. If you need any help - post it in the comments :)