Run-Length Encoding

Run-Length Encoding

In this article we'll go over how the run-length encoding algorithm works, what it's used for, and how to implement its encode and decode functions in Python.

Run-length encoding (RLE) is a very simple form of data compression in which a stream of data is given as the input (i.e. "AAABBCCCC") and the output is a sequence of counts of consecutive data values in a row (i.e. "3A2B4C"). This type of data compression is lossless, meaning that when decompressed, all of the original data will be recovered when decoded. Its simplicity in both the encoding (compression) and decoding (decompression) is one of the most attractive features of the algorithm.

Here you can see a simple example of a stream ("run") of data in its original form and encoded form:

Input data:

AAAAAAFDDCCCCCCCAEEEEEEEEEEEEEEEEE

Output data:

6A1F2D7C1A17E

In this example we were able to compress the data from 34 characters down to 13.

As you may have noticed, the more consecutive values in a row, the more space we save in the resulting compression. On the other hand, if you have a sequence of data that frequently changes between values (i.e. "BEFEFADED") then we won't save much space at all. In fact, we could even increase the size of our data since a single instance of a character results in 2 characters (i.e. "A" becomes "1A") in the output of the encoding.

Because of this, RLE is only good for certain types of data and applications. For example, the Pixy camera, which is a robotics camera that helps you easily track objects, uses RLE to compress labeled video data before transferring it from the embedded camera device to an external application. Each pixel is given a label of "no object", "object 1", "object 2", etc. This is the perfect encoding for this application because of its simplicity, speed, and ability to compress the low-entropy label data.

Encoding

In order to encode a string of data, your code will need to loop through each character of the data and count the occurrences. Once you see a character that is different from the previous character, you will append the number of occurrences and the character to your encoding.

Below you'll find a simple implementation in Python:

# rle-encode.py

def rle_encode(data):
    encoding = ''
    prev_char = ''
    count = 1

    if not data: return ''

    for char in data:
        # If the prev and current characters
        # don't match...
        if char != prev_char:
            # ...then add the count and character
            # to our encoding
            if prev_char:
                encoding += str(count) + prev_char
            count = 1
            prev_char = char
        else:
            # Or increment our counter
            # if the characters do match
            count += 1
    else:
        # Finish off the encoding
        encoding += str(count) + prev_char
        return encoding

From the comments you should be able to tell what is going on throughout the code. If not, it would be a good exercise to run through the code with a debugger and see it in action.

Continuing with the same file as above, here is an example of the code being executed:

encoded_val = rle_encode('AAAAAAFDDCCCCCCCAEEEEEEEEEEEEEEEEE')
print(encoded_val)

And the output:

$ python rle-encode.py
6A1F2D7C1A17E

Decoding

Decoding an RLE-encoded stream of data is actually even easier than encoding it. Like before, you iterate through the data stream one character at a time. If you see a numeric character then you add it to your count, and if you see a non-numeric character then you add count of those characters to your decoding, which is returned to the caller once you iterate through all of the inputdata.

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

Here is the algorithm implemented in Python:

# rle-decode.py

def rle_decode(data):
    decode = ''
    count = ''
    for char in data:
        # If the character is numerical...
        if char.isdigit():
            # ...append it to our count
            count += char
        else:
            # Otherwise we've seen a non-numerical
            # character and need to expand it for
            # the decoding
            decode += char * int(count)
            count = ''
    return decode

We can run this code on the same output that we got from our encoding:

decoded_val = rle_decode('6A1F2D7C1A17E')
print(decoded_val)

And the output is the same as our original input to the encoding function:

$ python rle-decode.py
AAAAAAFDDCCCCCCCAEEEEEEEEEEEEEEEEE

Note that this implementation doesn't do any error checking to ensure that we have a valid RLE data stream. If any of the input data isn't formatted properly then you'll likely encounter an error.

Last Updated: June 14th, 2019
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Want a remote job?

    Prepping for an interview?

    • Improve your skills by solving one coding problem every day
    • Get the solutions the next morning via email
    • Practice on actual problems asked by top companies, like:
     
     
     

    Better understand your data with visualizations

    With over 330+ pages, you'll learn the ins and outs of visualizing data in Python with popular libraries like Matplotlib, Seaborn, Bokeh, and more.

    © 2013-2021 Stack Abuse. All rights reserved.