Common String Manipulation in Python

# Common String Manipulation in Python

### Introduction

Python is a high-level, dynamically-typed, multi-paradigm programming language - and it notably comes with a plethora of built-in tools for various tasks, lowering the amount of effort required to quickly prototype and test ideas out. Strings are one of the most commonly used data structures in computer science, and naturally, manipulating strings is a common procedure.

In this guide, you'll learn how to perform string manipulation in Python.

### Strings and String Manipulation

Strings are sequences (or rather... strings) of characters. They're typically implemented as an array of characters, that together act as a single object, in most programming languages. That being said - string manipulation boils down to changing the characters in the array, in any form.

Note: In most languages, Python included, strings are immutable - once created, a string cannot be changed. If you wish to change a string, under the hood a new string is created, consisting of the original and the change you wish to make. This is because strings are very commonly used, and can be "pooled" into a common pool, from which objects can be reused for strings that are the same (which happens fairly commonly). In most cases, this lowers the overhead of object initialization on the system's memory and increases the performance of the language. This is also known as String Interning.

In Python - to declare a string, you enclose a sequence of characters in single, double or triple quotes (with or without the str() constructor):

# Single quote
welcome = 'Good morning, Mark!'
# Double quote
note = "You have 7 new notifications."
# Triple quote allow for multi-row strings
more_text= """
Would
you
like
to
them?
"""


You could also explicitly initialize a string object using the str() constructor:

welcome1 = 'Good morning Mark!'
welcome2 = str('Good morning Mark!')


Depending on the version of Python you're using, as well as the compiler, the second line will either intern or won't intern the string. The built-in id() function can be used to verify this - it returns the ID of the object in memory:

print(id(welcome1)) # 1941232459688
print(id(welcome2)) # 1941232459328


In all practical terms - you don't really need to worry about string interning or its performance on your application.

Note: Another implementation note is that Python doesn't support a character type, unlike other languages that turn arrays of a character type into a string type. In Python, character is a string of length 1.

If you check the type of any of the objects we've created - you'll be greeted with str:

print(type(welcome1)) # class <'str'>


The string class provides a fairly long list of methods that can be used to manipulate/alter strings (all of which return a changed copy, since strings are immutable). In addition, standard operators have been overriden for string-specific usage, so you can "add" strings together, using operators such as +!

### Operators for String Manipulation

Operators are a cornerstone of all languages - and they're typically rounded into arithmetic operators (+, -, *, /), relational operators (<, >, <=, >=, =, ==) and logical operators (& or AND, | or OR), etc. To make working with strings intuitive, Python operators have been overriden to allow direct string usage!

Besides adding integers, the + operator can be used to combine/concatenate two strings:

string_1 = "Hello"
string_2 = " World!"
print(string_1 + string_2) # Hello World!


#### String Multiplication

An oftentimes underappreciated operators is the multiplication operator - *. It can be used to instantiate multiple strings or sequences, as part of a single string:

string = 'Recursion...' * 5
print(string) # Recursion...Recursion...Recursion...Recursion...Recursion...


Since expressions are evaluated from the right to the left, you can multiply a string and then add it to another string:

string = "I think I'm stuck in a " + "loop... " * 5
print(string) # I think I'm stuck in a loop... loop... loop... loop... loop...


The += operator, known as the "inplace" operator, is a shorthand operator. It shortens the addition of two operands by inserting the assigned reference variable as the first operand in the addition:

s = 'Hello'
# Equivalent to:
# s = s + 'World'
s += 'World'
print(s) # HelloWorld


### Functions for String Manipulation

#### len()

The len() function is built-into the Python namespace, and can thus be called as a global convenience function. It's used to assess the length of a sequence - a list, tuple, etc. Since strings are lists, their length can also be assessed with the len() function!

print(len("It's been 84 years...")) # 21


It takes any iterable sequence as an input and returns its length as an integer.

#### find()

The find() method searches for an occurrence of a pattern in a string, and returns its starting position (index at which it starts), otherwise returning -1:

text = "Writing Python is quite fun."

print(text.find("quite")) # 18
print(text.find("at"))  # -1


## Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

The find() method takes in additional two optional arguments - beg, and end. The str defines the string to be searched, beg is the beginning index (0 by default), and end is the string's ending index which is set to the length of the string by default. By altering these, you can change the search space for the pattern:

text = "I haven't been this choked up since I got a hunk of moussaka caught in my throat! - Hades."
text2 = "I"

print(text.find(text2))     # 0
print(text.find(text2, 10)) # 36
print(text.find(text2, 40)) # -1


Note: The rfind() method finds the last occurrence.

#### count()

The count() method looks for the provided substring in the given text (case-sensitive) and returns an integer denoting the number of occurrences of that pattern in the string:

text = "The flower that blooms in adversity is the most rare and beautiful of all – Mulan."
text_count = text.count('i')
print("The count of 'i' is", text_count) # The count of 'i' is 4


By default, counting starts at 0 and continues to the end of the string, but a beginning and ending index can be supplied:

text = "The flower that blooms in adversity is the most rare and beautiful of all – Mulan."
# str, beg, end
text_count = text.count('i', 0, 5)
print("The count of 'i' is", text_count) # The count of 'i' is 0


#### Slicing

Slicing is a powerful and versatile notation that can be used to, well, slice sequences! By using the bracket notation, as when accessing elements from an iterable sequence, you can also access a slice of elements, between a starting and ending index:

text = "Hello, World!"
print(text[6:12]) # World


The slice notation accepts three inputs - iterable[start:stop:step]. start is the starting index (inclusive), stop is the ending index (exclusive), and step is the increment (which can also be a negative number). Let's try slicing the string between the 2nd (inclusive) and 7th (exclusive) index with a step of 2:

text = 'The code runs fast'
print(text[2:7:2]) # ecd


#### startswith() and endswith()

The startswith() method in Python determines if a string starts with a supplied substring while the endswith() method checks if a string ends with a substring, and both return a boolean value:

text = "hello world"

print(text.startswith("H")) # False
print(text.endswith("d")) # True


Note: Both startswith() and endswith() are case-sensitive.

### Formatting Strings

The strip() method eliminates whitespace from the beginning and end of the line, making it an easy approach to removing trailing empty characters. To remove merely space to the right or left, use rstrip() or lstrip():

text = '         a short break         '
text.strip() # 'a short break'

text.rstrip() #'         a short break'
text.lstrip() #'a short break         '


For a dedicated guide to removing whitespaces from strings - read our Guide to Python's strip() method!

#### Changing a String's Case - upper(), lower(), capitalize(), title(), swapcase()

Changing the case of a string is pretty straightforward! The upper(), lower(), capitalize(), title(), and swapcase() methods can all be used to change the case of a string:

text = "When life gets you down you know what you've gotta do? Just keep swimming! – Finding Nemo"

print(text.upper())      # Uppercases all characters
print(text.lower())      # Lowercases all characters
print(text.title())      # Title-case
print(text.capitalize()) # Capitalizes the first character
print(text.swapcase())   # Swaps whatever case for each character


This results in:

WHEN LIFE GETS YOU DOWN YOU KNOW WHAT YOU'VE GOTTA DO? JUST KEEP SWIMMING! – FINDING NEMO
when life gets you down you know what you've gotta do? just keep swimming! – finding nemo
When Life Gets You Down You Know What You'Ve Gotta Do? Just Keep Swimming! – Finding Nemo
When life gets you down you know what you've gotta do? just keep swimming! – finding nemo
wHEN LIFE GETS YOU DOWN YOU KNOW WHAT YOU'VE GOTTA DO? jUST KEEP SWIMMING! – fINDING nEMO


#### String Splitting and Partitioning with split() and partition()

To find a substring and then split the string based on its location, you'll need the partition() and split() methods. Both will return a list of strings with the split applied. Both are case-sensitive.

The partition() method returns the substring before the first occurrence of the split-point, the split-point itself, and the substring after it:

text = "To be or not to be, that is the question"

print(text.partition('to be')) # ('To be or not ', 'to be', ', that is the question')


Meanwhile, split() splits the string on every whitespace by default, yielding a list of separate words in a string:

text = "To be or not to be, that is the question"
print(text.split()) # ['To', 'be', 'or', 'not', 'to', 'be,', 'that', 'is', 'the', 'question']


Naturally, you can also split by any other character supplied in the split() call:

text = "To be or not to be, that is the question"
print(text.split(',')) # ['To be or not to be', ' that is the question']


#### Joining Strings with join()

The join() method works on iterables containing exclusively string instances, joining all of the elements together into a string. It's worth noting that the method is called on a string denoting the delimiter, not the string you're joining iterables onto:

text = ['One', 'Two', 'Three', 'Four']
print(', '.join(text)) # One, Two, Three, Four


For a more detailed guide on joining lists into strings, including different data types, read our Python: Convert List to String with join()

#### Replacing Substrings

Replacing a substring, without knowing where it's located is pretty easy! Using the replace() method, you can supply the pattern to be replaced, and the new pattern to be inserted in that space:

text = "Because of what you have done, the heavens are now part of man's world"
print(text.replace("man's", "human's")) # Because of what you have done, the heavens are now part of the human world


### Conclusion

In this article - we've gone over some of the common string manipulation techniques, operators and methods/functions, with associated more detailed guides.

Last Updated: May 19th, 2022

Get tutorials, guides, and dev jobs in your inbox.

Shittu OlumideAuthor

Software developer and Technical writer.

Project

### Real-Time Road Sign Detection with YOLOv5

# python# machine learning# computer vision# pytorch

If you drive - there's a chance you enjoy cruising down the road. A responsible driver pays attention to the road signs, and adjusts their...

David Landup
Details
Project

### Data Visualization in Python: The Collatz Conjecture

# python# matplotlib# data visualization

The Collatz Conjecture is a notorious conjecture in mathematics. A conjecture is a conclusion based on existing evidence - however, a conjecture cannot be proven....

Details