Comparing Strings using Python

Comparing Strings using Python

In Python, strings are sequences of characters, which are effectively stored in memory as an object. Each object can be identified using the id() method, as you can see below. Python tries to re-use objects in memory that have the same value, which also makes comparing objects very fast in Python:

$ python
Python 3.9.0 (v3.9.0:9cf6752276, Oct  5 2020, 11:29:23) 
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = "abc"
>>> b = "abc"
>>> c = "def"
>>> print (id(a), id(b), id(c))
(139949123041320, 139949123041320, 139949122390576)
>>> quit()

In order to compare strings, Python offers a few different operators to do so. First, we will explain them in more detail below. Second, we'll go over both the string and the re modules, which contain methods to handle case-insensitive and inexact matches. Third, to deal with multi-line strings the difflib module is quite handy. A number of examples will help you to understand how to use them.

Compare Strings with the The == and != Operators

As a basic comparison operator you'll want to use == and !=. They work in exactly the same way as with integer and float values. The == operator returns True if there is an exact match, otherwise False will be returned. In contrast, the != operator returns True if there is no match and otherwise returns False. Listing 1 demonstrates this.

In a for loop, a string containing the name of the Swiss city "Lausanne" is compared with an entry from a list of other places, and the comparison result is printed using stdout.

Listing 1:

# Define strings
listOfPlaces = ["Berlin", "Paris", "Lausanne"]
currentCity = "Lausanne"

for place in listOfPlaces:
    print (f"comparing {place} with {currentCity}: %{place == currentCity}")

Running the Python script from above the output is as follows:

$ python3 comparing-strings.py
comparing Berlin with Lausanne: False
comparing Paris with Lausanne: False
comparing Lausanne with Lausanne: True

The == and is Operators

Python has the two comparison operators == and is. At first sight they seem to be the same, but actually they are not.

== compares two variables based on the value they represent. In contrast, the is operator compares two variables based on the object ID in memory.

John (Doe) and John (Moe) are both called John. If we can reduce them to just their names, they'd be equal in value, but still two different people qualitatively.

The next example demonstrates that for three variables with string values. The two variables a and b have the same value, and Python refers to the same object in order to minimize memory usage.

This is done for simple types and strings by default, but not for other objects:

>>> a = 'hello'
>>> b = 'hello'
>>> c = 'world'
>>> a is b
True
>>> a is c
False
>>> id(a)
140666888153840
>>> id(b)
140666888153840
>>> 

As soon as the value changes Python will re-instantiate the object and assign the variable. In the next code snippet b gets the value of 2, and subsequently b and c refer to the same object:

>>> b = 'world'
>>> id(b)
140666888154416
>>> id(c)
140666888154416

More Comparison Operators

For a comparison regarding a lexicographical order you can use the comparison operators <, >, <=, and >=. The comparison itself is done character by character. The order depends on the order of the characters in the alphabet. This order depends on the character table that is in use on your machine while executing the Python code.

Keep in mind the order is case-sensitive. As an example for the Latin alphabet, "Bus" comes before "bus". Listing 2 shows how these comparison operators work in practice.

Listing 2:

# Define the strings
listOfPlaces = ["Berlin", "Paris", "Lausanne"]
currentCity = "Lausanne"

for place in listOfPlaces:
    if place < currentCity:
            print (f"{place} comes before {currentCity}")
    elif place > currentCity:
            print (f"{place} comes after {currentCity}")
    else:
            print (f"{place} is equal to {currentCity}")

Running the Python script from above the output is as follows:

$ python3 comparing-strings-order.py
Berlin comes before Lausanne
Paris comes after Lausanne
Lausanne is equal to Lausanne

Case-Insensitive String Comparisons

The previous examples focused on exact matches between strings. To allow case-insensitive comparisons Python offers special string methods such as upper() and lower(). Both of them are directly available as methods of the according string object.

upper() converts the entire string into uppercase letters, and lower() into lowercase letters, respectively. Based on Listing 1 the next listing shows how to use the lower() method.

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

Listing 3:

# using the == operator
listOfPlaces = ["Berlin", "Paris", "Lausanne"]
currentCity = "lausANne"

for place in listOfPlaces:
    print (f"comparing {place} with {place.lower() == currentCity.lower()}: {currentCity}")

The output is as follows:

$ python3 comparing-strings-case-insensitive.py
comparing Berlin with lausANne: False
comparing Paris with lausANne: False
comparing Lausanne with lausANne: True

Compare Strings Using Regular Expressions (RegEx)

A Regular Expression - or "regex" for short - defines a specific pattern of characters.

Advice: If you'd like to read more about Regular Expressions, read our "Introduction to Regular Expressions in Python"!

To make use of this mechanism in Python import the re module first and define a specific pattern, next. Again, the following example is based on Listing 1. The search pattern matches "bay", and begins with either a lowercase or an uppercase letter. Precisely, the following Python code finds all the strings in which the search pattern occurs no matter at which position of the string - at the beginning, or in the middle, or at the end.

Listing 4:

# import the additional module
import re

# define list of places
listOfPlaces = ["Bayswater", "Table Bay", "Beijing", "Bombay"]

# define search string
pattern = re.compile("[Bb]ay")

for place in listOfPlaces:
    if pattern.search(place):
        print (f"{place} matches the search pattern")

The output is as follows, and matches "Bayswater", "Table Bay", and "Bombay" from the list of places:

$ python3 comparing-strings-re.py
Bayswater matches the search pattern
Table Bay matches the search pattern
Bombay matches the search pattern

Multi-Line and List Comparisons

So far our comparisons have only been on a few words. Using the difflib module Python also offers a way to compare multi-line strings, and entire lists of words. The output can be configured according to various formats of diff tools.

The next example (Listing 5) compares two multi-line strings line by line, and shows deletions as well as additions. After the initialization of the Differ object in line 12 the comparison is made using the compare() method in line 15. The result is printed on the standard output:

# Import the additional module
import difflib
 
# Define original text, 
# taken from: https://en.wikipedia.org/wiki/Internet_Information_Services
original = ["About the IIS", "", "IIS 8.5 has several improvements related", "to performance in large-scale scenarios, such", "as those used by commercial hosting providers and Microsoft's", "own cloud offerings."]

# Define modified text
edited = ["About the IIS", "", "It has several improvements related", "to performance in large-scale scenarios."]

# Initiate the Differ object
d = difflib.Differ()
 
# Calculate the difference between the two texts
diff = d.compare(original, edited)
 
# Output the result
print ('\n'.join(diff))

Running the script creates the output as seen below. Lines with deletions are indicated by - signs whereas lines with additions start with a + sign. Furthermore, lines with changes start with a question mark. Changes are indicated using ^ signs at the according position. Lines without an indicator are still the same:

$ python comparing-strings-difflib.py
  About the IIS
  
- IIS 8.5 has several improvements related
?  ^^^^^^

+ It has several improvements related
?  ^

- to performance in large-scale scenarios, such
?                                        ^^^^^^

+ to performance in large-scale scenarios.
?                                        ^

- as those used by commercial hosting providers and Microsoft's
- own cloud offerings.

Conclusion

In this article you have learned various ways to compare strings in Python. We hope that this overview helps you effectively program in your developer life.

Acknowledgements

The author would like to thank Mandy Neumeyer for her support while preparing the article.

Last Updated: June 21st, 2023
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Frank HofmannAuthor

IT developer, trainer, and author. Coauthor of the Debian Package Management Book (http://www.dpmb.org/).

Project

Building Your First Convolutional Neural Network With Keras

# python# artificial intelligence# machine learning# tensorflow

Most resources start with pristine datasets, start at importing and finish at validation. There's much more to know. Why was a class predicted? Where was...

David Landup
David Landup
Details
Course

Data Visualization in Python with Matplotlib and Pandas

# python# pandas# matplotlib

Data Visualization in Python with Matplotlib and Pandas is a course designed to take absolute beginners to Pandas and Matplotlib, with basic Python knowledge, and...

David Landup
David Landup
Details

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms