Replace Occurrences of a Substring in String with Python

Introduction

Replacing all or n occurrences of a substring in a given string is a fairly common problem of string manipulation and text processing in general. Luckily, most of these tasks are made easy in Python by its vast array of built-in functions, including this one.

Let's say, we have a string that contains the following sentence:

The brown-eyed man drives a brown car.

Our goal is to replace the word "brown" with the word "blue":

The blue-eyed man drives a blue car.

In this article, we'll be using the replace() function as well as the sub() and subn() functions with patterns to replace all occurrences of a substring from a string.

replace()

The simplest way to do this is by using the built-in function - replace() :

string.replace(oldStr, newStr, count)

The first two parameters are required, while the third one is optional. oldStr is the substring we want to replace with the newStr. What's worth noting is that the function returns a new string, with the performed transformation, without affecting the original one.

Let's give it a try:

string_a = "The brown-eyed man drives a brown car."
string_b = string_a.replace("brown", "blue")
print(string_a)
print(string_b)

We've performed the operation on string_a, packed the result into string_b and printed them both.

This code results in:

The brown-eyed man drives a brown car.
The blue-eyed man drives a blue car.

Again, the string in memory that string_a is pointing to remains unchanged. Strings in Python are immutable, which simply means you can't change a string. However, you can re-assign the reference variable to a new value.

To seemingly perform this operation in-place, we can simply re-assign string_a to itself after the operation:

string_a = string_a.replace("brown", "blue")
print(string_a)

Here, the new string generated by the replace() method is assigned to the string_a variable.

Replace n Occurrences of a Substring

Now, what if we don't wish to change all occurrences of a substring? What if we want to replace the first n?

That's where the third parameter of the replace() function comes in. It represents the number of substrings that are going to be replaced. The following code only replaces the first occurrence of the word "brown" with the word "blue":

string_a = "The brown-eyed man drives a brown car."
string_a = string_a.replace("brown", "blue", 1)
print(string_a)

And this prints:

The blue-eyed man drives a brown car.

By default, the third parameter is set to change all occurrences.

Substring Occurrences with Regular Expressions

To escalate the problem even further, let's say we want to not only replace all occurrences of a certain substring, but replace all substrings that fit a certain pattern. Even this can be done with a one-liner, using regular expressions, and the standard library's re module.

Regular expressions are a complex topic with a wide range of use in computer science, so we won't go too much in-depth in this article but if you need a quick start you can check out our guide on Regular Expressions in Python.

In its essence, a regular expression defines a pattern. For example, let's say we have a text about people who own cats and dogs, and we want to change both terms with the word "pet". First, we need to define a pattern that matches both terms like - (cat|dog).

Using the sub() Function

With the pattern sorted out, we're going to use the re.sub() function which has the following syntax:

re.sub(pattern, repl, string, count, flags)

The first argument is the pattern we're searching for (a string or a Pattern object), repl is what we're going to insert (can be a string or a function; if it is a string, any backslash escapes in it are processed) and string is the string we're searching in.

Optional arguments are count and flags which indicate how many occurrences need to be replaced and the flags used to process the regular expression, respectively.

If the pattern doesn't match any substring, the original string will be returned unchanged:

import re
string_a = re.sub(r'(cat|dog)', 'pet', "Mark owns a dog and Mary owns a cat.")
print(string_a)

This code prints:

Mark owns a pet and Mary owns a pet.

Case-Insensitive Pattern Matching

To perform case-insensitive pattern matching, for example, we'll set the flag parameter to re.IGNORECASE:

import re
string_a = re.sub(r'(cats|dogs)', "Pets", "DoGs are a man's best friend", flags=re.IGNORECASE)
print(string_a)

Now any case-combination of "dogs" will also be included. When matching the pattern against multiple strings, to avoid copying it in multiple places, we can define a Pattern object. They also have a sub() function with the syntax:

Pattern.sub(repl, string, count)

Using Pattern Objects

Let's define a Pattern for cats and dogs and check a couple of sentences:

import re
pattern = re.compile(r'(Cats|Dogs)')
string_a = pattern.sub("Pets", "Dogs are a man's best friend.")
string_b = pattern.sub("Animals", "Cats enjoy sleeping.")
print(string_a)
print(string_b)

Which gives us the output:

Pets are a man's best friend.
Animals enjoy sleeping.

The subn() Function

There's also a subn() method with the syntax:

re.subn(pattern, repl, string, count, flags)

The subn() function returns a tuple with the string and number of matches in the String we've searched:

import re
string_a = re.subn(r'(cats|dogs)', 'Pets', "DoGs are a mans best friend", flags=re.IGNORECASE)
print(string_a)

The tuple looks like:

('Pets are a mans best friend', 1)

A Pattern object contains a similar subn() function:

Pattern.subn(repl, string, count)

And it's used in a very similar way:

import re
pattern = re.compile(r'(Cats|Dogs)')
string_a = pattern.subn("Pets", "Dogs are a man's best friend.")
string_b = pattern.subn("Animals", "Cats enjoy sleeping.")
print(string_a)
print(string_b)

This results in:

("Pets are a man's best friend.", 1)
('Animals enjoy sleeping.', 1)

Conclusion

Python offers easy and simple functions for string handling. The easiest way to replace all occurrences of a given substring in a string is to use the replace() function.

If needed, the standard library's re module provides a more diverse toolset that can be used for more niche problems like finding patterns and case-insensitive searches.

Author image
CS student with a passion for juggling and math.