Introduction
Matching strings or string patterns can be a real struggle. In the most common cases, you'll need these to validate emails, user inputs, file names, or most kinds of input strings. While there are many pattern-matching libraries and approaches - a time-tested approach is using Regular Expressions to define a set of rules a certain string has to follow in order to match that pattern.
In JavaScript, the RegExp
class used to represent Regular Expressions and can be coupled with a few methods which make matching patterns easier.
Obviously, the prerequisite to working with these is knowledge of Regular Expressions. If you are not comfortable with writing them, you can always use RegEx testing websites such as regex101.com or regexr.com - which visually display the effects of your expressions on given strings.
In this guide, we'll be looking at Regular Expressions in JavaScript, the usage of the
RegExp
class, as well as theexec()
andtest()
methods.
Afterwards, we'll take a look at some of the methods implemented with the String
object - match()
, search()
and replace()
, which work with Regular Expressions as a shorter alternative to using the RegExp
class.
What are Regular Expressions?
Before we dive into JavaScript's API for working with RegEx, let's first take a look at Regular Expressions themselves. If you're already familiar with them - this can serve as a refresher, or you can skip the section fully.
A Regular Expression (abbr. RegEx) is a pattern of characters used to match different combinations of strings or characters. There are certain rules you need to follow in order to form a proper Regular Expression. We'll go over these quickly and follow up with an example:.
[abc]
- matches a single character: a, b or c[^abc]
- matches every character except a, b or c[a-z]
- matches any character in the range a-z\s
- matches any whitespace character\w
- matches any word character
These are some of the basic patterns but they can get you far. Regular expressions also support operators:
a?
- operator?
matches zero or one charactera
a*
- operator*
matches zero or more charactersa
a+
- operator+
matches one or more charactersa
a{n}
- operator{n}
matches charactera
exactlyn
times in a rowa{n, m}
- operator{n, m}
matches charactera
betweenn
andm
times in a row\.
- operator\
escapes the character.
, which means character.
won't have its usual meaning - matching any string - but will be matched as a character.
To put this into practice - let's write a Regular Expression that checks if a string contains @gmail.com
at the end of the string and contains three characters a
before the @
symbol:
"\w+a{3}@gmail\.com"
Let's break this down quickly:
\w
- matches any charactera{3}
- matches three charactersa
in a row@gmail\.com
- matches a literal string "@gmail.com", while escaping the.
with a\
operator
With this RegEx, we can match strings such as:
[email protected]
[email protected]
But not:
[email protected]
[email protected]
[email protected]
You can go ahead and test these out in a visual RegEx tester as well to see which parts match and why.
The RegExp Class
In JavaScript, there are two ways of creating a Regular Expression:
- Using a RegEx literal, which is a pattern put between the
/
characters:
let regex = "/[abc]+/";
You should use this approach if your RegEx will remain constant throughout the script, because this RegEx is compiled when the script is loaded automatically.
- Using the
RegExp()
constructor:
let regex = new RegExp("[abc]+");
This approach is preferred when your RegEx is dynamic and can change throughout the lifecycle of the script. It's compiled at runtime, not load time.
Note: Starting with ES6, you can also pass a RegEx literal as an argument of the constructor:
let regex = new RegExp(/[abc]+/);
When working with RegExp
, you can also pass flags - characters with a meaning - which alter the way a pattern is matched. Some of these flags are:
i
- denoting case-insensitive, soA
anda
are the same when matching
// Matches both ABC and abc one or more times
let regex = new RegExp("[abc]+", "i");
-
g
- denoting that all the possible cases will be matched, not just the first one encountered -
m
- denoting the multi-line mode, which allows the pattern to be matched with string written in multiple lines
let string = `
This string can also be matched with
Even though it's written in multiple lines
`
The RegExp()
constructor is used solely for creating a pattern to be tested. However, it contains two methods that can test out the pattern and match it if it fits: exec()
and test()
.
The exec() Method
The exec()
method, without much surprise, executes a search in a string. If there is a match, it returns an array containing information about the match, otherwise, it returns null
.
To handle potential
null
values, you can use the Null Coalescing Operator added to ECMAScript 2020.
Let's test it out on the email example - we're checking whether an email is ending with @gmail.com
and contains three consecutive a
characters right before the @gmail
domain.
Also, we'll use the case-insensitive flag:
let regex = new RegExp(/\w+a{3}@gmail\.com/, "i");
let result1 = regex.exec("[email protected]");
let result2 = regex.exec("[email protected]");
console.log(result1);
console.log(result2);
Or you can apply the Null Coalescing Operator for null
-safety:
let regex = new RegExp(/\w+a{3}@gmail\.com/, "i");
let result1 = regex.exec("[email protected]") ?? 'No matched results';
let result2 = regex.exec("[email protected]") ?? 'No matched results';
Let's take a look at the output:
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
[ '[email protected]',
index: 0,
input: '[email protected]',
groups: undefined ]
[ '[email protected]',
index: 0,
input: '[email protected]',
groups: undefined ]
This array contains multiple things:
- The matched string
- The index value from which the matched strings starts
- The input string
- The groups property which holds an object of all named capturing groups - in most cases, this will be
undefined
If you wish to isolate only the matched string without the extra information, you can print out
console.log(results[0])
An interesting feature of the exec()
method is that it remembers the index of the character where it stopped executing, so basically, you can call this method again and again, until you get a null
in return.
This property is called the lastIndex
. In order to achieve this, you can pass an array of strings to exec()
instead of a single string.
Let's pass an array of three strings; two of which will be matched and one that won't. In order to get multiple results, we can loop through the array and call exec()
until we get a null
. Also, let's create an empty array matchedStrings
and push the matched strings to it.
Note: You must pass the g
flag to the RegExp()
constructor in order to get all the results, not just the first one. This way, you'll avoid going into an infinite loop, and nobody likes infinite loops.
let regex = new RegExp(/\w+a{3}@gmail\.com/, "g");
let strings = ["[email protected]", "[email protected]", "[email protected]"];
let matchedStrings = [];
let result = regex.exec(strings);
if (result != null) {
matchedStrings.push(result[0]);
}
while (result != null) {
result = regex.exec(strings);
if (result != null) {
matchedStrings.push(result[0]);
}
}
console.log(matchedStrings);
This results in:
["[email protected]", "[email protected]"]
You can see that we never kept track of an index of the last executed string in an array, but exec()
knew where to continue its search. Pretty neat!
The test() Method
The test()
method is similar to exec()
except that it doesn't return an array containing information, but a simple true
or false
. It performs the same search as exec()
and if a pattern is matched with a string, it returns true
. Otherwise, it returns false
:
let regex = new RegExp(/\w+a{3}@gmail\.com/, "i");
let results = regex.test("[email protected]");
console.log(results); // Output: true
results = regex.test("[email protected]");
console.log(results); // Output: false
This method can't return a null
, and you can use the results to dictate further conditional logic.
The test()
method also remembers the lastIndex
of the execution, so you can test out an array of strings. However, if you test the same string twice, you'll get different results:
let regex = new RegExp(/\w+a{3}@gmail\.com/, "g"); // Remember the 'g' flag when working with multiple results
let results = regex.test("[email protected]");
console.log(results); // Output: true
results = regex.test("[email protected]");
console.log(results); // Output: false
The reason we get false
the second time is because lastIndex
has moved to the end of the string, so when it starts searching the second time - it starts at the end of the string - and there is nothing to match with. Thus, it returns false
.
You'll have to ensure non-duplicates if you're using test()
for expected behavior.
Usage of test()
with an array of strings is the same as exec()
, except that you'll be printing out true
/false
. In practice, this is not commonly used, unless you're keeping track of the number of matched strings.
The match() Method
The match()
method is the first of the String
methods we'll be looking at - and it works well with Regular Expressions.
It takes a RegEx as an argument and returns an array of matches or null
if there are none, so in essence - much the same API as the exec()
method of a RegEx
instance:
let regex = new RegExp(/\w+a{3}@gmail\.com/, "g"); // Note the 'g' flag
let string = "[email protected]";
let resultArray = string.match(regex);
console.log(resultArray); // Output: [ '[email protected]' ]
Note: You can alternatively use a RegEx literal here instead to shorten the code, as it's compiled to a RegEx
instance anyway:
let string = "[email protected]";
let resultArray = string.match(/\w+a{3}@gmail\.com/);
console.log(resultArray); // Output: [ '[email protected]' ]
To get a better feel of the method, let's change the RegEx to /[a-z]/
- to match only lowercase characters:
let regex = new RegExp(/[a-z]/, "g"); // Note the 'g' flag
let string = "[email protected]";
let resultArray = string.match(regex);
console.log(resultArray);
This results in an array of all the lowercase characters in the string:
["s","o","m","e","m","a","i","l","a","a","a","g","m","a","i","l","c","o","m"]
The search() Method
The search()
method searches for a match between the passed pattern and the string. If a match is found, its index is returned. Otherwise, the method returns -1
:
let regex = new RegExp(/\w+a{3}@gmail\.com/, "g"); // Note the 'g' flag
let string = "some string that isn't matched [email protected]";
let result = string.search(regex);
console.log(result); // Output: 31
string = "It should return -1 with this string";
result = string.search(regex);
console.log(result); // Output: -1
This method should be used when you want to find out whether a match is found and its index. If you only want to know whether a match is found, you should use test()
.
You can also extract this info from the exec()
method, but that requires you to match an element in an array and this returns a more easily parsable result.
The replace() Method
The replace(to_replace, replace_with)
method returns a new string with where the pattern matching to_replace
is replaced with replace_with
.
It doesn't change the original string as strings are immutable.
The to_replace
argument can either be a string or a RegExp
instance. If it's a string, only the first occurrence will be replaced, whilst if it's a RegExp
, every single one will be replaced.
For the purpose of this method, let's replace gmail.com
with yahoo.com
.
let regex = new RegExp(/gmail\.com/, "g"); // Note the 'g' flag
let string = "[email protected]";
let result = string.replace(regex, "yahoo.com");
console.log(result); // Output: [email protected]
string = "[email protected] [email protected]"
result = string.replace(regex, "yahoo.com");
console.log(result); // Output: [email protected] [email protected]
console.log(string); // Output: [email protected] [email protected]
As you can see in the second example, all occurrences matching the regex
are replaced with yahoo.com
. Also, the original string is left unchanged.
Conclusion
Even though Regular Expressions can be difficult to read and, at first, can be hard to fathom, after understanding them, working with them and constructing them can be quite fun.
JavaScript makes sure to make testing and matching as easy as possible, all you need to do is learn the regular expressions.
However, with tools available today and with sites similar to the ones listed at the beginning of the guide, you can quite easily get around actually learning all the rules to Regular Expressions.
In this guide, we've covered:
- The
RegExp
Class - a class whose object is used to represent a regular expression - The
exec()
Method - which searches for a regex in a string and returns an array of matches (with additional information). - The
test()
Method - which only tests if there's a match in a string and returnstrue
/false
. - The
match()
Method - defined in theString
class, returns an array of matches (without additional information). - The
search()
Method - defined in theString
class, returns an index of a match found. - The
replace()
Method - defined in theString
class, replaces aRegExp()
with a string.
Probably the best practice for regular expressions is to try and test out the ones for email and password validating.