Validate Email Addresses with Regular Expressions in JavaScript

Introduction

For web developers, validating user inputs in various types of forms is of crucial importance. Since that is the starting point of data being sent between the client and the server, you need to make sure that everything starts off on the right foot - lest you end up with robust validation on the server end, which is oftentimes a bigger hassle than doing it on the front-end.

Additionally, input can be malicious - in which case, you have to take security into consideration as well. It's better to avoid it altogether by validating input on the front-end.

In this article, we'll take a look at how to validate email addresses in JavaScript using Regular Expressions.

Regular Expressions in JavaScript

For anyone unfamiliar with regular expressions, or anyone feeling like they need a quick reminder, here it is!

Regular expressions are sequences of metacharacters, denoting a pattern. These patterns can be of various kinds: a mix of letters with digits, special characters and even different language characters. An abbreviation for regular expression is RegEx or RegExp.

Through metacharacters, quantifiers, groups and escape characters - you can express just about any pattern. For instance, this expression denotes a sequence of characters that contains any valid letter between A-Z (both lowercase and uppercase) or digits, in any combination:

^([A-Za-z]|[0-9])+$

This is also known as checking whether a sequence is alphanumeric.

For the rest of the guide, we will assume that you are somewhat familiar with Regular Expressions.

If you aren't familiar with Regular Expressions and would like to learn more - read our Guide to Regular Expressions and Matching Strings in JavaScript!

Matching Email Formats in JavaScript with Regular Expressions

First and foremost, a regular expression that matches all the possible valid email addresses doesn't exist. However, the one that matches 99.9%, does. When validating emails, or really any input, a good practice, which can more or less guarantee that the user will match the RegEx, is to limit the user input upfront.

For example, mandatory usage of gmail.com or yahoo.com and straight-up rejecting the non-supported providers (though, you do run into the issue of scalability and staying up-to-date with this approach).

Another question is raised:

What is the format of an email?

It's surprisingly a loose definition, as we'll shortly see - and you can go simple or robust on this. We'll cover the most general regular expressions for validating email, as well as those which are more specific in the guide.

Before we get into the code, let's preview the email formats that we will be looking into:

  • General format - (something)@(some_domain).(some_toplevel_domain)
  • Specific hosts or domains - referring to a specific type of domain, or top-level domain
  • RFC 5322 - Internet Message Format, covering 99.9% of email addresses

General Email Format Regular Expression

After many attempts at validating with robust Regular Expressions, many engineers fall back to the good old "general" format that works most of the time. Whether this is a good thing or not is debatable.

What does an email address entail? It has to have a @ symbol, as well as some string preceding it, and some string proceeding it. Additionally, the second string needs to contain a dot, which has an additional 2-3 characters after that.

In conclusion, this is a rough sketch:

(randomString)@(randomString2).(2-3 characters)

This follows the general intuition of these emails being valid:

[email protected]
[email protected]
[email protected]

With that in mind, to generally validate an email address in JavaScript via Regular Expressions, we translate the rough sketch into a RegExp:

let regex = new RegExp('[a-z0-9]+@[a-z]+\.[a-z]{2,3}');

let testEmails = ["notanemail.com", "[email protected]", "[email protected]", "[email protected]"];

testEmails.forEach((address) => {
    console.log(regex.test(address))
});

The first string can contain any lowercase alphanumeric characters - john.doe.1, workingemail, etc.

This results in:

false
true
true
false

Will this always work? No. There will be some malformed emails that pass-through. You also can't perform spam-detection using this Regular Expression so an email address that intuitively looks like a spam passes this expression just fine:

console.log(regex.test("[email protected]")); // true
Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

Though, even the most robust, complex email address validation expressions fail on this - they're there to validate the form, not whether the email exists.

Technically speaking, [email protected] could exist, so who are we to say it doesn't based on just a Regular Expression?

Specific Email Addresses

Lowering the amount of uncertainty helps. The less uncertainty there is, the less restrictions you need to impose using an expression. This makes specific email address validation more accurate using the same general formats as we've just seen - you don't have to cover as many edge cases.

Let's take a look at some general cases referring to the domain and top-level domain.

Validating an Email Address Domain with JavaScript

Say, you work in a company called Stack Abuse. All the staff have an email ending with @stackabuse.com and the user string is the one changing. The rough sketch would look like this:

(randomString)@stackabuse.com

This makes our task a lot easier, as some of the variables such as the domain name and organization type are now fixed. These two are the typical problem-causing variables, as domain names can vary wildly.

To validate an email address pertaining to a specific domain thus becomes an easy task, utilizing the RegExp class:

let regex = new RegExp('[a-z0-9][email protected]');

let testEmails = ["notanemail.com", "[email protected]", "[email protected]"];

testEmails.forEach((address) => {
    console.log(regex.test(address))
});

This results in:

false
true
false

With this approach, you can change any literal string to match it according to your needs. As always, the first part of the regular expression can be changed to match cases with uppercase letters, including special characters such as + or _, etc.

Validating Email Address Top-Level Domains in JavaScript

This case is pretty similar to the previous one, except that we will be limiting the last two or three characters of the email.
These can be literally any of: .com, .org, .edu, .eu, .us, etc. Let's match only emails containing .edu since it is never solely this top-level domain, but instead something like [email protected].

let regex = new RegExp('[a-z0-9]+@[a-z]+\.edu\.[a-z]{2,3}');

let testEmails = ["notanemail.com", "[email protected]", "[email protected]"];

testEmails.forEach((address) => {
    console.log(regex.test(address))
});

An invalid email, as well as a valid one fail - because they don't contain an edu in their top-level domain, though, the made-up Yale address works:

false
false
true

RFC 5322 Format

The RFC 5322 Format is an Internet Message Format (classic format of an email message). The RFC 5322 only dictates what should be allowed - it isn't an expression itself.

There are multiple expressions that implement the rules laid out, and these can get pretty complex.

If implemented correctly, the RFC 5322-compliant Regular Expression should validate 99.99% of the valid email addresses.

A short-hand version is:

let regex = new RegExp("([!#-'*+/-9=?A-Z^-~-]+(\.[!#-'*+/-9=?A-Z^-~-]+)*|\"\(\[\]!#-[^-~ \t]|(\\[\t -~]))+\")@([!#-'*+/-9=?A-Z^-~-]+(\.[!#-'*+/-9=?A-Z^-~-]+)*|\[[\t -Z^-~]*])");

While an extended version that covers additional edge-cases is:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

These expressions aren't particularly easy to comprehend unless you break them down into groups and spend some time reading through them. Though, an easier way is to visualize it:

*Image and claim of accuracy are courtesy of EmailRegex.com.

That being said, let's use this expression instead to validate a couple of addresses:

let regex = new RegExp("([!#-'*+/-9=?A-Z^-~-]+(\.[!#-'*+/-9=?A-Z^-~-]+)*|\"\(\[\]!#-[^-~ \t]|(\\[\t -~]))+\")@([!#-'*+/-9=?A-Z^-~-]+(\.[!#-'*+/-9=?A-Z^-~-]+)*|\[[\t -Z^-~]*])");

let testEmails = ["notanemail.com", "[email protected]", "[email protected]"];

testEmails.forEach((address) => {
    console.log(regex.test(address))
});

This results in:

false
true
true

You can go ahead and test this expression interactively through a beautiful interface at regex101.

Conclusion

In conclusion, there is really not a single "proper" way to validate email addresses using regular expressions. However, there is a wrong way - if you don't cover the cases that shouldn't be correct.

For those who want to make sure that literally, almost everything is covered - use the RFC 5322 Format.

Last Updated: May 12th, 2023
Was this article helpful?

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms