Guide to Regular Expressions and Matching Strings in JavaScript

Guide to Regular Expressions and Matching Strings in JavaScript

Introduction

Matching strings or string patterns can be a real struggle. In the most common cases, you'll need these to validate e-mails, user inputs, file names, or most kinds of input strings. While there are many pattern-matching libraries and approoaches - a time-tested approach is using Regular Expressions to define a set of rules a certain string has to follow in order to match that pattern.

In JavaScript, the RegExp class used to represent Regular Expressions and can be coupled with a few methods which make matching patterns easier.

Obviously, the prerequisite to working with these is knowledge of Regular Expressions. If you are not comfortable with writing them, you can always use RegEx testing websites such as regex101.com or regexr.com - which visually display the effects of your expressions on given strings.

In this guide, we'll be looking at Regular Expressions in JavaScript, the usage of the RegExp() class, as well as the exec() and test() methods.

Afterwards, we'll take a look at some of the methods implemented with the String object - match(), search()and replace(), which work with Regular Expressions as a shorter alternative to using the RegEx class.

What are Regular Expressions?

Before we dive into JavaScript's API for working with RegEx, let's first take a look at Regular Expressions themselves. If you're already familiar with them - this can serve as a refresher, or you can skip the section fully.

A Regular Expression (abbr. RegEx) is a pattern of characters used to match different combinations of strings or characters. There are certain rules you need to follow in order to form a proper Regular Expression. We'll go over these quickly and follow up with an example:.

  • [abc] - matches a single character: a, b or c
  • [^abc] - matches every character except a, b or c
  • [a-z] - matches any character in the range a-z
  • \s - matches any whitespace character
  • \w - matches any word character

These are some of the basic patterns but they can get you far. Regular expressions also support operators:

  • a? - operator ? matches zero or one character a
  • a* - operator * matches zero or more characters a
  • a+ - operator + matches one or more characters a
  • a{n} - operator {n} matches character a exactly n times in a row
  • a{n, m} - operator {n, m} matches character a between n and m times in a row
  • \. - operator \ escapes the character ., which means character . won't have its usual meaning - matching any string - but will be matched as a character .

To put this into practice - let's write a Regular Expression that checks if a string contains @gmail.com at the end of the string and contains three characters a before the @ symbol:

"\w+a{3}@gmail\.com"

Let's break this down quickly:

  • \w - matches any character
  • a{3} - matches three characters a in a row
  • @gmail\.com - matches a literal string "@gmail.com", while escaping the . with a \ operator

With this RegEx, we can match strings such as:

[email protected]
[email protected]

But not:

[email protected]
[email protected]
[email protected]

You can go ahead and test these out in a visual RegEx tester as well to see which parts match and why.

The RegExp Class

In JavaScript, there are two ways of creating a Regular Expression:

  1. Using a RegEx literal, which is a pattern put between the / characters:
let regex = "/[abc]+/";

You should use this approach if your RegEx will remain constant throughout the script, because this RegEx is compiled when the script is loaded automatically.

  1. Using the RegExp() constructor:
let regex = new RegExp("[abc]+");

This approach is preferred when your RegEx is dynamic and can change throughout the lifecycle of the script. It's compiled at runtime, not load time.

Note: Starting with ES6, you can also pass a RegEx literal as an argument of the constructor:

let regex = new RegExp(/[abc]+/);

When working with RegExp, you can also pass flags - characters with a meaning - which alter the way a pattern is matched. Some of these flags are:

  • i - denoting case-insensitive, so A and a are the same when matching
// Matches both ABC and abc one or more times
let regex = new RegExp("[abc]+", "i"); 
  • g - denoting that all the possible cases will be matched, not just the first one encountered

  • m - denoting the multi-line mode, which allows the pattern to be matched with string written in multiple lines

let string = `
This string can also be matched with
Even though it's written in multiple lines
`

The RegExp() constructor is used solely for creating a pattern to be tested. However, it contains two methods that can test out the pattern and match it if fits: exec() and test().

The exec() Method

The exec() method, without much surprise, executes a search in a string. If there is a match, it returns an array containing information about the match, otherwise, it returns null.

To handle potential null values, you can use the Null Coalescing Operator added to ECMAScript 2020.

Let's test it out on the e-mail example - we're checking whether an e-mail is ending with @gmail.com and contains three consecutive a characters right before the @gmail domain.

Also, we'll use the case-insensitive flag:

let regex = new RegExp(/\w+a{3}@gmail\.com/, "i");

let result1 = regex.exec("[email protected]");
let result2 = regex.exec("[email protected]");

console.log(result1);
console.log(result2);

Or you can apply the Null Coalescing Operator for null-safety:

let regex = new RegExp(/\w+a{3}@gmail\.com/, "i");

let result1 = regex.exec("[email protected]") ?? 'No matched results';
let result2 = regex.exec("[email protected]") ?? 'No matched results';

Let's take a look at the output:

[ '[email protected]',
  index: 0,
  input: '[email protected]',
  groups: undefined ]
  
[ '[email protected]',
  index: 0,
  input: '[email protected]',
  groups: undefined ]

This array contains multiple things:

  1. The matched string
  2. The index value from which the matched strings starts
  3. The input string
  4. The groups property which holds an object of all named capturing groups - in most cases, this will be undefined

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

If you wish to isolate only the matched string without the extra information, you can print out

console.log(results[0])

An interesting feature of the exec() method is that it remembers the index of the character where it stopped executing, so basically, you can call this method again and again, until you get a null in return.

This property is called the lastIndex. In order to achieve this, you can pass an array of strings to exec() instead of a single string.

Let's pass an array of three strings; two of which will be matched and one that won't. In order to get multiple results, we can loop through the array and call exec() until we get a null. Also, let's create an empty array matchedStrings and push the matched strings to it.

Note: You must pass the g flag to the RegExp() constructor in order to get all the results, not just the first one. This way, you'll avoid going into an infinite loop, and nobody likes infinite loops.

let regex = new RegExp(/\w+a{3}@gmail\.com/, "g");

let strings = ["[email protected]", "[email protected]", "[email protected]"];
let matchedStrings = [];

let result = regex.exec(strings);
if(result != null) {
    matchedStrings.push(result[0]);
}

while(result != null) {
    result = regex.exec(strings);
    if(result != null) {
        matchedStrings.push(result[0]);
    }
}

console.log(matchedStrings);

This results in:

 ["[email protected]", "[email protected]"]

You can see that we never kept track of an index of last executed string in an array, but exec() knew where to continue its search. Pretty neat!

The test() Method

The test() method is similar to exec() except that it doesn't return an array containing information, but a simple true or false. It performs the same search as exec() and if a pattern is matched with a string, it returns true. Otherwise, it returns false:

let regex = new RegExp(/\w+a{3}@gmail\.com/, "i");

let results = regex.test("[email protected]");
console.log(results); // Output: true

results = regex.test("[email protected]");
console.log(results); // Output: false

This method can't return a null, and you can use the results to dictate further conditional logic.

The test() method also remembers the lastIndex of the execution, so you can test out an array of strings. However, if you test the same string twice, you'll get different results:

let regex = new RegExp(/\w+a{3}@gmail\.com/, "g"); // Remember the 'g' flag when working with multiple results

let results = regex.test("[email protected]");
console.log(results); // Output: true

results = regex.test("[email protected]");
console.log(results); // Output: false

The reason we get false the second time is because lastIndex has moved to the end of the string, so when it starts searching the second time - it starts at the end of string - and there is nothing to match with. Thus, it returns false.

You'll have to ensure non-duplicates if you're using test() for expected behavior.

Usage of test() with an array of strings is the same as exec(), except that you'll be printing out true/false. In practice, this is not commonly used, unless you're keeping track of the number of matched strings.

The match() Method

The match() method is the first of the String methods we'll be looking at - and it works well with Regular Expressions.
It takes a RegEx as an argument and returns an array of matches or null if there are none, so in essence - much the same API as the exec() method of a RegEx instance:

let regex = new RegExp(/\w+a{3}@gmail\.com/, "g"); // Note the 'g' flag

let string = "[email protected]";
let resultArray = string.match(regex);

console.log(resultArray); // Output: [ '[email protected]' ]

Note: You can alternatively use a RegEx literal here instead to shorten the code, as it's compiled to a RegEx instance anyway:

let string = "[email protected]";
let resultArray = string.match(/\w+a{3}@gmail\.com/);

console.log(resultArray); // Output: [ '[email protected]' ]

To get a better feel of the method, let's change the RegEx to /[a-z]/ - to match only lower case characters:

let regex = new RegExp(/[a-z]/, "g"); // Note the 'g' flag

let string = "[email protected]";
let resultArray = string.match(regex);

console.log(resultArray);

This results in an array of all the lowercase characters in the string:

["s","o","m","e","m","a","i","l","a","a","a","g","m","a","i","l","c","o","m"]

The search() Method

The search() method searches for a match between the passed pattern and the string. If a match is found, its index is returned. Otherwise, the method returns -1:

let regex = new RegExp(/\w+a{3}@gmail\.com/, "g"); // Note the 'g' flag

let string = "some string that isn't matched [email protected]";
let result = string.search(regex);

console.log(result); // Output: 31

string = "It should return -1 with this string";
result = string.search(regex);

console.log(result); // Output: -1

This method should be used when you want to find out whether a match is found and its index. If you only want to know whether a match is found, you should use test().

You can also extract this info from the exec() method, but that requires you to match an element in an array and this returns a more easily parsable result.

The replace() Method

The replace(to_replace, replace_with) method returns a new string with where the pattern matching to_replace is replaced with replace_with.

It doesn't change the original string as strings are immutable.

The to_replace argument can either be a string or a RegExp instance. If it's a string, only the first occurrence will be replaced, whilst if it's a RegExp, every single one will be replaced.

For the purpose of this method, let's replace gmail.com with yahoo.com.

let regex = new RegExp(/gmail\.com/, "g"); // Note the 'g' flag

let string = "[email protected]";
let result = string.replace(regex, "yahoo.com");

console.log(result); // Output: [email protected]

string = "[email protected] [email protected]"
result = string.replace(regex, "yahoo.com");

console.log(result); // Output: [email protected] [email protected]

console.log(string); // Output: [email protected] [email protected]

As you can see in the second example, all occurrences matching the regex are replaced with yahoo.com. Also, the original string is left unchanged.

Conclusion

Even though Regular Expressions can be difficult to read and, at first, can be hard to fathom, after understanding them, working with them and constructing them can be quite fun.

JavaScript made sure make testing and matching as easy as possible, all you need to do is learn the regular expressions.

However, with tools available today and with sites similar to the ones listed at the beginning of the guide, you can quite easily get around actually learning all the rules to Regular Expressions.

In this guide, we've covered:

  • The RegExp Class - a class whose object is used to represent a regular expression
  • The exec() Method - which searches for a regex in a string and returns an array of matches (with additional infromation).
  • The test() Method - which only tests if there's a match in a string and returns true/false.
  • The match() Method - defined in the String class, returns an array of matches (without additional information).
  • The search() Method - defined in the String class, returns an index of a match found.
  • The replace() Method - defined in the String class, replaces a RegExp() with a string.

Probably the best practice for regular expressions is to try and test out the ones for e-mail and password validating.

Last Updated: August 10th, 2021
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Want a remote job?

    Prepping for an interview?

    • Improve your skills by solving one coding problem every day
    • Get the solutions the next morning via email
    • Practice on actual problems asked by top companies, like:
     
     
     

    Getting Started with AWS in Node.js

    Build the foundation you'll need to provision, deploy, and run Node.js applications in the AWS cloud. Learn Lambda, EC2, S3, SQS, and more!

    © 2013-2021 Stack Abuse. All rights reserved.