Introduction
When working with Python, we often have to deal with data in the form of numbers or words. Sometimes, words and numbers are stored together, and our needs compel us to separate numbers from words.
In this article, we'll explain how to define words and numbers in Python. Then, we'll see how to separate numbers from words, in case they're stored together, using different methods and in various situations.
Strings, Integers, and Floats in Python
In Python, strings, integers, and floats are fundamental data types used to represent different kinds of values in a program.
Integers represent whole numbers without any decimal part and can be positive or negative. Here's how to define an integer in Python:
# Create a variable expressing an integer number
age = 15
We've created a variable called age
and assigned it the value of 15. To verify the type of a variable in Python, we can use the built-in function type()
. This is how it works:
# Show the type
type(age)
And we get:
int
So, we pass the variable age
to the built-in function type()
and it tells us that this is an integer, as we expected.
To express numbers, we can also use the float type. A float, which is short for floating-point numbers, represents real numbers with decimal points. They are generally suitable for scientific calculations, but not only. Here's how we define a float in Python:
# Create a variable expressing a float number
pi = 3.14
And again, to check the type of our variable:
# Show the type
type(pi)
And we get:
float
Strings are a sequence of characters enclosed in double quotes or single quotes. They are used to store text or, more generally, any kind of characters. This means that a string can even contain numbers, which is the focus of this article. So, for example, a string can be:
# Create a string with my name
my_name = "Federico"
# Show the type
type(my_name)
And we get:
str
But it can also be:
# Create a variable with text and number
federico_car = 'Federico has 1 car'
Finally, pay attention: we said that a string in Python is any character enclosed in quotes. This means that the type of the following:
# Create a string variable expressing a number
age = '15'
is str
.
Now, with this overview in mind, let's see some methods to intercept numbers inside strings.
Methods to Find Numbers in Strings
Let's see an overview of some methods we can use to check if a string contains numbers.
The int() and float() Methods
The easiest ways to transform a string into a number are through the int()
and float()
methods. Let's see how we can use them.
Suppose we expressed our age as a string, but we want it as an integer. We can do it like so:
# Create a string variable expressing an integer number
age = '30'
# Transform string into integer type
age_int = int(age)
# Show type
type(age_int)
And we have:
int
So, we've defined the variable age
as a string. Then, we passed it as the argument of the method int()
and it transformed the string into an integer.
Now, suppose we have expressed a price as a string, but we want to convert it to a float. We can do it like so:
# Create a string variable expressing a decimal number
price = "34.99"
# Transform string to float type
price_float = float(price)
# Show type
type(price_float)
And we have:
float
Similarly, as before, we pass a string to the method float()
and it transforms it into a float.
Now, these methods are easy to use but have a significant limitation: they can convert strings into numbers only if the value inside the quotes is a number. So, to get an understanding, consider the following example:
# Create a string with text and numbers
apples = "21 apples"
# Transform string to integer
apples_converted = int(apples)
# Show type
type(apples_converted)
And we get:
ValueError: invalid literal for int() with base 10: '21 apples'
This error means that we're trying to convert a string to an integer, but the string cannot be parsed as a valid integer. Of course, the same issue occurs if we parse a string containing only text or if we use the float()
method.
Now, a question may arise: what if the text in the string expresses an integer and we want to convert it to a float? And how about vice-versa? Let's examine both scenarios:
# Create a string expressing a decimal
price = "30.5"
# Transform string into an integer
price_int = int(price)
# Show type
type(price_int)
And we get:
ValueError: invalid literal for int() with base 10
So, we have expressed the price of an object as a decimal number (although the type is a string!) and tried to convert it into an integer. This is not possible, as the error indicates.
Now, let's see the other case:
# Create a string expressing an integer
price = "30"
# Transform string into a float
price_float = float(price)
# Show type
type(price_float)
And we get:
float
So, we can convert a string that expresses a whole number into a float. In fact, if we want to see how Python expresses this number as a float, we can print it:
# Print transformed variable
print(price_float)
And, as we might expect, we get:
30.0
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
Now, these two methods are very basic and have some limitations, as we've seen. This is why we need to learn other methods to solve our problem in more general situations.
The isdigit() Method
The isdigit()
method checks if all the characters in a string are digits (0-9). It returns True
if the string contains only digits, and False
otherwise. So, let's see a couple of examples:
# Create string with text and numbers
letters_and_numbers = "Hello123"
# Create string with only text
letters = "Hello World"
# Create string with only numbers
numbers = "123456"
Now, let's use the isdigit()
method:
# Apply isdigit() method and print
print(letters_and_numbers.isdigit())
print(letters.isdigit())
print(numbers.isdigit())
And we get:
False
False
True
So, to use this method, we write the name of the variable we're verifying and we add .isdigit()
. As we expected, the results show that only the variable numbers
contains characters that are all digits.
This method is good, but we can do better in Python. Listing all the printed values, in fact, may become difficult to read. So, we can improve the above code like so:
# Apply isdigit() method and print, showing variables and results
print(f"Is {letters_and_numbers} an only digit string? {letters_and_numbers.isdigit()}")
print(f"Is {letters} an only digit string? {letters.isdigit()}")
print(f"Is {numbers} an only digit string? {numbers.isdigit()}")
And we get:
Is Hello123 an only digit string? False
Is Hello World an only digit string? False
Is 123456 an only digit string? True
So, inside print()
, we can use f
before the double quotes to insert the variables inside the curly brackets {}
. This allows Python to return the actual value of the variable passed through the curly brackets. Then, knowing that the isdigit()
method returns a boolean (True
or False
), the above code provides more readable results.
Now, it would be beneficial if this method could detect numbers in strings containing both digits and other characters. To achieve this, we can create a function like so:
# Create a function that detects digits in strings
def contains_number(string: str) -> bool:
return any(char.isdigit() for char in string)
Next, we pass the letters_and_numbers
variable to the contains_number()
function and observe the result:
# Invoke the function, passing 'letters_and_numbers' as an argument
contains_number(letters_and_numbers)
And we get:
True
Exactly as we wanted: We were able to intercept digits in a variable that contains different kinds of characters. Now, let's explain this function step by step:
-
def contains_number(string: str) -> bool
means that we are defining a function calledcontains_number()
where we expect the argument, which we generically calledstring
, to be a string (:str
). Then, we know that the function will return a boolean (->bool
). Note that this notation, called "Type Hints", is available from Python 3 onwards, and it is not mandatory. We could have writtendef contains_number(string):
and the function would work correctly. Type Hints are just a useful way to inform the user on what types to expect when dealing with functions (and classes), so it's a kind of "a facilitator". -
Now, let's explain
any(char.isdigit() for char in string)
which is what the function returns. First, we have created a generator expression (a generator is a special type of object in Python that allows us to generate a sequence of values dynamically, without needing to store all the values in memory at once) withchar.isdigit() for char in string
. This generates a sequence of Boolean values, indicating whether each character in the string is a digit. In particular,for char in string
iterates over each characterchar
in the argumentstring
, passed to the function. Then,char.isdigit()
checks if the characterchar
is a digit. Finally, theany()
function is a built-in Python function that takes an iterable as an argument and returnsTrue
if at least one element in the iterable isTrue
. It returnsFalse
if all elements areFalse
. So, in conclusion,any(char.isdigit() for char in string)
evaluates toTrue
if at least one character in the string is a digit, andFalse
otherwise.
Now, let's see other methods.
Using Regular Expressions
Another method we can use to find if a string contains a number is through regular expressions (also called "regex"). Regular expressions are a sequence of characters that help us match or find patterns in text. Here's how we can use this Python module for our purposes:
import re
# Create strings
letters_and_numbers = "Hello123 0.3"
letters = "Hello World"
numbers = "1 2 3 4 5 6 0.5"
# Use regex to intercept numbers and print results
print(bool(re.search(r'\d', letters_and_numbers)))
print(bool(re.search(r'\d', letters)))
print(bool(re.search(r'\d', numbers)))
And we get:
True
False
True
So, first we need to import the re
module to use regular expressions. Then, we can use the re.search()
method for each variable to check for any digits. In regular expressions, \d
represents any digit character. Next, we apply the bool()
method, which returns a boolean value. If there are any digits in the string we are checking, we get True
.
Note that, compared to the previous method, this one identifies numbers without using a function. This method also detects the decimal number in the string letters_and_numbers
, unlike the previous method that only detects digits (numbers from 0 to 9). We'll explain more about this in the next paragraph.
Now, let's examine the last method.
The isnumeric() Method
The isnumeric()
method works exactly like the isdigit()
method: it returns True
if all characters in the string are numbers. The difference between the two is that isnumeric()
can identify a wider range of numbers, such as floats, fractions, superscripts, subscripts, and more. So, if we're aware we're searching for numbers that can be in different forms, then isnumeric()
should be preferred.
On the coding side, we use it similarly to the isdigit()
method. Let's see a simple example:
# Create strings
letters_and_numbers = "Hello123"
letters = "Hello World"
numbers = "1 2 3 4 5 6 0.5"
# Apply isnumeric() method and print results
print(letters_and_numbers.isnumeric())
print(letters.isnumeric())
print(numbers.isnumeric())
And we get:
False
False
True
Now, let's see some more advanced examples we may encounter while programming in Python.
Advanced Manipulation Examples
Now we want to show how we can use the methods we've seen above to intercept numbers in strings in more practical situations, for example when analyzing lists, dictionaries, and data frames.
Finding Numbers in Strings in Lists
Consider we have a list where we have stored some strings containing both numbers and text. We could extract the numbers with regular expressions like so:
import re
# Create a list with strings expressing text and numbers
data = ['I have 10 apples', 'There are 5 bananas', 'I will buy one apple']
# Create a function to retrieve numbers in strings
def extract_numbers(string):
return re.findall(r'\d+', string)
# Iterate over the list, extract numbers, and print results
for string in data:
numbers = extract_numbers(string)
print(f"Numbers in string: {string} - {numbers}")
And we get:
Numbers in string: I have 10 apples - ['10']
Numbers in string: There are 5 bananas - ['5']
Numbers in string: I will buy one apple - []
So, here we report the number associated with the string, if present. The only differences between the previous example on regular expressions are:
- We used
re.findall()
. This method takes two arguments:pattern
andstring
. It searches for all occurrences of thepattern
within thestring
and returns a list of all matched substrings. - In this case, the
pattern
is represented by\d+
, which matches one or more consecutive digits in the string using regex.
So, we have stored numbers and text in some strings in a list called data
. We have created a function called extract_numbers()
that intercepts all the consecutive digits in the string through the method re.findall()
, thanks to regex. We then iterate through the list with a for loop
and invoke the function extract_numbers()
so that all the strings in the list are checked. Then, the code prints the strings themselves and the numbers intercepted (if any).
Finding Numbers in Strings in Dictionaries
Now, suppose we have a shopping list stored in a dictionary where we report some fruit and their respective prices. We want to see if the price is expressed as a number. We can do it like so:
# Create function to intercept numbers in strings
def contains_number(string: str) -> bool:
return any(char.isnumeric() for char in string)
# Create a dictionary
shopping_list = {
'Banana': '1',
'Apple': 'Five',
'Strawberry': '3.5',
'Pear': '3',
}
# Iterate over dictionary and print results
for key, value in shopping_list.items():
if contains_number(value):
print(f"Value for '{key}' contains a number: {value}")
else:
print(f"Value for '{key}' does not contain a number: {value}")
And we get:
Value for 'Banana' contains a number: 1
Value for 'Apple' does not contain a number: Five
Value for 'Strawberry' contains a number: 3.5
Value for 'Pear' contains a number: 3
So, we have created a function contains_number()
as we have seen before, but here we've used the isnumeric()
method (we have a decimal). We then store the prices of some fruits in a dictionary called shopping_list
. With shopping_list.items()
, we access the keys and values of the dictionary, and check if the values are numeric by invoking the function contains_number()
. Finally, thanks to an if
statement, we can separate the strings containing numbers from those containing only text, and print the results.
Finding Numbers in Strings in Pandas Data Frames
In Python, we can store data in data frames, which are collections of columns and rows (similar to Excel sheets, for simplicity). Data frames can be manipulated with a library called pandas
in Python.
Suppose we want to create a column (we call a single column of a data frame a "Pandas series") where we have stored the price of an object from different suppliers on Amazon:
import pandas as pd
# Create dictionary
data = {'Amazon_prices': ['10', '8', '9.2', 'eleven', 'seven']}
# Transform dictionary into data frame
df = pd.DataFrame(data)
# Print data frame
print(df)
And we have:
Amazon_prices
0 10
1 8
2 9.2
3 eleven
4 seven
So, we have stored some data in a dictionary called data
. Then, with the pd.DataFrame()
method, we've converted the dictionary into a Pandas data frame called df
.
At this point, we can use the str.contains()
method from the Pandas library, which is useful for checking patterns in strings. We can use regex to define the pattern like so:
# Intercept numbers in column
numeric_values = df['Amazon_prices'].str.contains(r'\d+', regex=True)
With the above code, we are checking if the column Amazon_prices
contains numbers, thanks to regex. With df['Amazon_prices']
, we are selecting the column of the data frame. Then, the .str.contains()
method checks if we have at least one number in the strings, thanks to regex, as seen before. Finally, the regex=True
activates the use of regex.
We can then create a new column to add to our data frame with the resulting booleans like so:
# Create a new column for data frame
df['IsNumeric'] = numeric_values
So, IsNumeric
is the new column of our data frame, containing the booleans. We can now print the modified data frame with print(df)
and we get:
Amazon_prices IsNumeric
0 10 True
1 8 True
2 9.2 True
3 eleven False
4 seven False
And then, we have a complete overview of the data frame.
Conclusions
In this article, we've seen various methods to determine if there are any numbers in strings, in different cases and situations. We've also demonstrated that there is no one-size-fits-all method; depending on the situation, we have to choose the one that best suits the specific problem we're facing.