Introduction
Replacing all or n occurrences of a substring in a given string is a fairly common problem of string manipulation and text processing in general. Luckily, most of these tasks are made easy in Python by its vast array of built-in functions, including this one.
Let's say, we have a string that contains the following sentence:
The brown-eyed man drives a brown car.
Our goal is to replace the word "brown"
with the word "blue"
:
The blue-eyed man drives a blue car.
In this article, we'll be using the replace()
function as well as the sub()
and subn()
functions with patterns to replace all occurrences of a substring from a string.
replace()
The simplest way to do this is by using the built-in function - replace()
:
string.replace(oldStr, newStr, count)
The first two parameters are required, while the third one is optional. oldStr
is the substring we want to replace with the newStr
. What's worth noting is that the function returns a new string, with the performed transformation, without affecting the original one.
Let's give it a try:
string_a = "The brown-eyed man drives a brown car."
string_b = string_a.replace("brown", "blue")
print(string_a)
print(string_b)
We've performed the operation on string_a
, packed the result into string_b
and printed them both.
This code results in:
The brown-eyed man drives a brown car.
The blue-eyed man drives a blue car.
Again, the string in memory that string_a
is pointing to remains unchanged. Strings in Python are immutable, which simply means you can't change a string. However, you can re-assign the reference variable to a new value.
To seemingly perform this operation in-place, we can simply re-assign string_a
to itself after the operation:
string_a = string_a.replace("brown", "blue")
print(string_a)
Here, the new string generated by the replace()
method is assigned to the string_a
variable.
Replace n Occurrences of a Substring
Now, what if we don't wish to change all occurrences of a substring? What if we want to replace the first n?
That's where the third parameter of the replace()
function comes in. It represents the number of substrings that are going to be replaced. The following code only replaces the first occurrence of the word "brown"
with the word "blue"
:
string_a = "The brown-eyed man drives a brown car."
string_a = string_a.replace("brown", "blue", 1)
print(string_a)
And this prints:
The blue-eyed man drives a brown car.
By default, the third parameter is set to change all occurrences.
Substring Occurrences with Regular Expressions
To escalate the problem even further, let's say we want to not only replace all occurrences of a certain substring, but replace all substrings that fit a certain pattern. Even this can be done with a one-liner, using regular expressions, and the standard library's re
module.
Regular expressions are a complex topic with a wide range of use in computer science, so we won't go too much in-depth in this article but if you need a quick start you can check out our guide on Regular Expressions in Python.
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
In its essence, a regular expression defines a pattern. For example, let's say we have a text about people who own cats and dogs, and we want to change both terms with the word "pet"
. First, we need to define a pattern that matches both terms like - (cat|dog)
.
Using the sub() Function
With the pattern sorted out, we're going to use the re.sub()
function which has the following syntax:
re.sub(pattern, repl, string, count, flags)
The first argument is the pattern we're searching for (a string or a Pattern
object), repl
is what we're going to insert (can be a string or a function; if it is a string, any backslash escapes in it are processed) and string
is the string we're searching in.
Optional arguments are count
and flags
which indicate how many occurrences need to be replaced and the flags used to process the regular expression, respectively.
If the pattern doesn't match any substring, the original string will be returned unchanged:
import re
string_a = re.sub(r'(cat|dog)', 'pet', "Mark owns a dog and Mary owns a cat.")
print(string_a)
This code prints:
Mark owns a pet and Mary owns a pet.
Case-Insensitive Pattern Matching
To perform case-insensitive pattern matching, for example, we'll set the flag parameter to re.IGNORECASE
:
import re
string_a = re.sub(r'(cats|dogs)', "Pets", "DoGs are a man's best friend", flags=re.IGNORECASE)
print(string_a)
Now any case-combination of "dogs"
will also be included. When matching the pattern against multiple strings, to avoid copying it in multiple places, we can define a Pattern
object. They also have a sub()
function with the syntax:
Pattern.sub(repl, string, count)
Using Pattern Objects
Let's define a Pattern
for cats and dogs and check a couple of sentences:
import re
pattern = re.compile(r'(Cats|Dogs)')
string_a = pattern.sub("Pets", "Dogs are a man's best friend.")
string_b = pattern.sub("Animals", "Cats enjoy sleeping.")
print(string_a)
print(string_b)
Which gives us the output:
Pets are a man's best friend.
Animals enjoy sleeping.
The subn() Function
There's also a subn()
method with the syntax:
re.subn(pattern, repl, string, count, flags)
The subn()
function returns a tuple with the string and number of matches in the String we've searched:
import re
string_a = re.subn(r'(cats|dogs)', 'Pets', "DoGs are a mans best friend", flags=re.IGNORECASE)
print(string_a)
The tuple looks like:
('Pets are a mans best friend', 1)
A Pattern
object contains a similar subn()
function:
Pattern.subn(repl, string, count)
And it's used in a very similar way:
import re
pattern = re.compile(r'(Cats|Dogs)')
string_a = pattern.subn("Pets", "Dogs are a man's best friend.")
string_b = pattern.subn("Animals", "Cats enjoy sleeping.")
print(string_a)
print(string_b)
This results in:
("Pets are a man's best friend.", 1)
('Animals enjoy sleeping.', 1)
Conclusion
Python offers easy and simple functions for string handling. The easiest way to replace all occurrences of a given substring in a string is to use the replace()
function.
If needed, the standard library's re
module provides a more diverse toolset that can be used for more niche problems like finding patterns and case-insensitive searches.