Python Regular Expressions - Validate Phone Numbers

Introduction

Handling user-submitted phone numbers can be a challenging task for developers, especially considering the various formats and notations used around the world. Ensuring that these phone numbers are valid and properly formatted is crucial for any application that relies on accurate contact information. That's where Python and its powerful regular expressions module come into play.

In this article, we'll explore the world of regular expressions and learn how to use Python's re module to validate phone numbers in your applications. We'll break down the process step-by-step, so you'll walk away with a solid understanding of how to tackle phone number validation effectively and efficiently.

Basics of Python's re Module

Python's re module is a built-in library designed specifically for working with regular expressions - a powerful tool for searching, matching, and manipulating text based on patterns. In this section, we'll cover the basics of the re module that you need to understand before you start validating phone numbers (which we'll demonstrate later in this article).

Advice: If you want to have more comprehensive insight into regular expressions in Python, you should definitely read our "Introduction to Regular Expressions in Python" article.

The re module in Python provides a robust set of functions for working with regular expressions. To start using it, you just need to simply import the module in your code:

import re

There are several essential functions provided by the re module for working with regular expressions. Some of the most commonly used ones are re.search(), re.match(), re.findall(), re.compile(), and others.

re.search() searches the entire input string for a match to the given pattern. It returns a match object if a match is found, and None otherwise. re.match() is pretty similar to re.search(), but only checks if the pattern matches at the beginning of the input string. re.findall() returns all non-overlapping matches of the pattern in the input string as a list of strings. Finally, re.compile() compiles a regular expression pattern into a pattern object, which can be used for faster, more efficient pattern matching.

Special Characters and Patterns in Regular Expressions

Regular expressions use special characters and constructs to define search patterns. You can take a look at some of the most important we'll use in this article in the following table:

Special Character What it Matches
. Any single character except a newline
* Zero or more repetitions of the preceding character or pattern
+ One or more repetitions of the preceding character or pattern
? Zero or one repetition of the preceding character or pattern
{m,n} The preceding character/pattern at least `m` times and at most `n` times
[abc] Any single character in the set `(a, b, c)`
\d Any digit (0-9)
\s Any whitespace character

With these basic concepts of the Python re module and regular expressions in mind, we can now move on to validating phone numbers using this powerful tool.

How are phone numbers usually formatted?

Phone numbers can come in various formats depending on the country, regional conventions, and individual preferences. To effectively validate phone numbers using regular expressions, you should have at least a decent understanding of the common components and variations in phone number formats.

First of all, we'll mention international and local phone number formats. The international format includes the country code (preceded by a + symbol), area code, and local number - for example, +1 (555) 123-4567. On the other hand, the local format omits the country code and typically includes just the area code and local number - (555) 123-4567.

Note: Each country has a unique country code that is used to identify its phone numbers internationally. For example, the United States has the country code +1, while the United Kingdom has the country code +44.

On the other hand, different regions or cities within a country, are assigned specific area codes to help identify phone numbers geographically. Area codes can vary in length and format depending on the country and region.

Regardless of the phone number format you choose, there are a variety of separators you can use when writing out a phone number. That means phone numbers can be written with different separators between the components we mentioned earlier. Some of the most common separators are:

  1. Spaces - +1 555 123 4567
  2. Dashes - +1-555-123-4567
  3. Periods - +1.555.123.4567
  4. No separators - +15551234567
  5. Parentheses around the area code - +1 (555) 1234567

Note that there is more variance in the ways you can record phone numbers internationally. But, the examples we've shown here are a great starting point for understanding how to create and alternate regular expressions to match your specific phone number format.

How to Build a Regular Expression for Phone Numbers

To create an effective regular expression pattern for phone numbers, we'll break down the components and account for the variations discussed earlier. We'll use special characters and constructs to ensure our pattern can handle different phone number formats.

First of all, let's reiterate what are main components of a phone number we should consider when building a regular expression:

  1. Country code
    • an optional component
    • typically preceded by a '+' symbol
    • consists of one or more digits
  2. Area code
    • enclosed in optional parentheses
    • consists of a sequence of digits
    • the length may vary depending on the country and region
  3. Local number
    • a sequence of digits
    • separated into groups by optional separators such as spaces, dashes, or periods

To make our pattern flexible, we'll use special characters and constructs such as \d (for matching digits), ? (for making components optional), [ -.] (to match common phone number separators), and so on.

Note: Now is a great time to make sure you understand all the special characters and patterns you can use in the regular expressions we discussed above. Also, make sure you understand how escape characters (especially the backslash \ ) in regular expressions work.

With these concepts in mind, let's finally start building a regular expression pattern for phone numbers. First of all, we'll create a pattern that matches the country code:

country_code_regex = "(\+\d{1,3})?"

Here, a country code is an optional component consisting of 1 to 3 digits, with + sign in front of them. Now, let's accommodate an optional area code:

area_code_regex = "\(?\d{1,4}\)?"

We've decided that area codes can be surrounded by a pair of parentheses and that they consist of 1 to 4 digits. After we've accommodated the area codes, let's finally focus on the local numbers. Say that local numbers consist of a sequence of 7 digits, where one of the mentioned separators can be placed between the third and fourth digit in the number:

local_number_regex = "\d{3}[\s.-]?\d{4}"
Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

And that's pretty much it! We just need to combine regular expressions we created for each section of a phone number so that each of them can be separated with one of the mentioned separators:

phone_number_regex = "(\+\d{1,3})?\s?\(?\d{1,4}\)?[\s.-]?\d{3}[\s.-]?\d{4}"

Optionally, we can surround this regular expression with the start of a string (^) and end of a string ($) anchors to make sure the complete phone number can be matched, and that's it, we have a regular expression that matches phone numbers:

phone_number_regex = "^(\+\d{1,3})?\s?\(?\d{1,4}\)?[\s.-]?\d{3}[\s.-]?\d{4}$"

Note: Keep in mind that this pattern is just an example and may need to be adjusted depending on the specific phone number formats you want to validate!

Writing a Python Function to Validate Phone Numbers

With our regular expression pattern for phone numbers in hand, we can now write a Python function to validate phone numbers using the re module. The function will take a phone number as input, check if it matches our pattern, and return the validation result.

To begin, import the re module in your Python script:

import re

After that, let's define our phone number validation function. First of all, we need to compile the regular expression pattern using the re.compile() method:

pattern = re.compile(r"(\+\d{1,3})?\s?\(?\d{1,4}\)?[\s.-]?\d{3}[\s.-]?\d{4}")

Now, we can use the re.search() or re.match() to validate actual phone numbers. re.search() is a good choice for this task since it checks for a match anywhere in the input string, whereas re.match() checks only at the beginning. Here, we'll use the re.search():

match = re.search(pattern, phone_number)

Note: Alternatively, you can use the re.match() to ensure that the phone number pattern starts at the beginning of the input string.

Now, we can wrap our logic into a separate function that returns True if a match is found, and False otherwise:

def validate_phone_number(phone_number):
    match = re.search(phone_number)
    if match:
        return True
    return False

Testing with Example Numbers

To test our function, we can use a list of example phone numbers and print the validation results:

import re

def validate_phone_number(regex, phone_number):
    match = re.search(regex, phone_number)
    if match:
        return True
    return False

pattern = re.compile(r"(\+\d{1,3})?\s?\(?\d{1,4}\)?[\s.-]?\d{3}[\s.-]?\d{4}")

test_phone_numbers = [
    "+1 (555) 123-4567",
    "555-123-4567",
    "555 123 4567",
    "+44 (0) 20 1234 5678",
    "02012345678",
    "invalid phone number"
]

for number in test_phone_numbers:
    print(f"{number}: {validate_phone_number(pattern, number)}")

This will give us the following:

+1 (555) 123-4567: True
555-123-4567: True
555 123 4567: True
+44 (0) 20 1234 5678: True
02012345678: True
invalid phone number: False

Which is to be expected. This function should work for most common phone number formats. But, once again, depending on the specific formats you want to validate, you may need to adjust the regular expression pattern and the validation function accordingly.

Advanced Techniques for Phone Number Validation

While our basic phone number validation function should work for many use cases, you can enhance its functionality and readability using some advanced techniques. Here are a few ideas to take your phone number validation to the next level:

Using Named Groups for Better Readability

Named groups in regular expressions allow you to assign a name to a specific part of the pattern, making it easier to understand and maintain. To create a named group, use the syntax (?P<name>pattern):

pattern = re.compile(r"(?P<country_code>\+\d{1,3})?\s?\(?(?P<area_code>\d{1,4})\)?[\s.-]?(?P<local_number>\d{3}[\s.-]?\d{4})")

Here, we grouped all of our phone number sections into separate named groups - country_code, area_code, and local_number.

Validating Specific Country and Area Codes

To validate phone numbers with specific country codes and area codes, you can modify the pattern accordingly. For example, to validate US phone numbers with area codes between 200 and 999, you can use the following pattern:

pattern = re.compile(r"(\+1)?\s?\(?(2\d{2}|[3-9]\d{2})\)?[\s.-]?\d{3}[\s.-]?\d{4}")

Handling Common User Input Errors

Users may inadvertently input incorrect phone numbers or formats. You can improve your validation function to handle common errors, such as extra spaces or incorrect separators, by preprocessing the input string before matching it against the pattern:

def preprocess_phone_number(phone_number):
    # Remove extra spaces
    phone_number = " ".join(phone_number.split())

    # Replace common incorrect separators
    phone_number = phone_number.replace(",", ".").replace(";", ".")

    return phone_number

def validate_phone_number(phone_number):
    phone_number = preprocess_phone_number(phone_number)
    match = pattern.search(phone_number)
    if match:
        return True
    return False

These advanced techniques can help you create a more robust and flexible phone number validation function that better handles various formats and user input errors.

Conclusion

Phone number validation is an important task for many applications that rely on accurate contact information. By leveraging Python's powerful re module and regular expressions, you can create a flexible and efficient validation function to handle various phone number formats and variations.

In this article, we explored the basics of Python's re module, common phone number components and formats, and the process of building a regular expression pattern for phone numbers. We also demonstrated how to write a phone number validation function using the compiled pattern and shared advanced techniques to enhance the function's flexibility and robustness.

Last Updated: May 20th, 2023
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms