Introduction
Regular Expressions (RegEx) are a powerful tool and help us match patterns in a flexible, dynamic and efficient way, as well as to perform operations based on the results.
In this short guide, we'll take a look at how to validate email addresses in Java with Regular Expressions.
If you'd like to read more about Regular Expressions and the
regex
package, read out Guide to Regular Expressions in Java!
Validating Email Addresses in Java
Validating email addresses isn't hard - there's not much diversity in the email world, though, there are a few ways you can go about it.
Regular Expressions are expressive so you can add more and more constraints based on how you want to validate the emails, just by adding more matching rules.
Typically, you can boil things down to a pretty simple RegEx that will fit most email address patterns.
You can disregard the organization type (.com
, .org
, .edu
), host (gmail
, yahoo
, outlook
), or other parts of an email address, or even enforce them.
In the proceeding sections, we'll take a look at a few different Regular Expressions, and which email formats they support or reject.
General-Purpose Email Regular Expression
A general purpose email format is:
[email protected]
The organizationtype
is by convention, 3 characters - edu
, org
, com
, etc. There are quite a few hosts, even custom ones, so really, this could be any sequence of characters - even aaa
.
That being said, for pretty loose validation (but still a fully valid one) we can check whether the String contains 4 groups:
- Any sequence of characters - name
- The
@
symbol - Any sequence of characters - host
- Any 2-3-character letter sequence - organization type (
io
,com
,etc
).
This nets us a Regular Expression that looks like:
(.*)(@)(.*)(.[a-z]{2,3})
To additionally make sure they don't contain any whitespaces at all, we can add a few \S
checks:
(\S.*\S)(@)(\S.*\S)(.\S[a-z]{2,3})
That being said, to validate an email address in Java, we can simply use the Pattern
and Matcher
classes:
String email = "[email protected]";
Pattern pattern = Pattern.compile("(\\S.*\\S)(@)(\\S.*\\S)(.\\S[a-z]{2,3})");
Matcher matcher = pattern.matcher(email);
if (matcher.matches()) {
System.out.println("Full email: " + matcher.group(0));
System.out.println("Username: " + matcher.group(1));
System.out.println("Hosting Service: " + matcher.group(3));
System.out.println("TLD: " + matcher.group(4));
}
This results in:
Full email: [email protected]
Username: someone
Hosting Service: gmail
TLD: com
Alternatively, you can use the built-in matches()
method of the String class (which just uses a Pattern
and Matcher
anyway):
String email = "[email protected]";
if(email.matches("(\\S.*\\S)(@)(\\S.*\\S)(.\\S[a-z]{2,3})")) {
System.out.println(String.format("Email '%s' is valid!", email));
}
Which results in:
Email '[email protected]' is valid!
Awesome! This general-purpose RegEx will take care of pretty much all generic input and will check whether an email follows the generic form that all emails follow.
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
For the most part - this will work quite well, and you won't need much more than this. You won't be able to detect spam emails with this, such as:
[email protected]
However, you will enforce a certain form.
Note: To enforce certain hosts or domains, simply replace the .*
and/or .[a-z]{2,3}
with actual values, such as gmail
, io
and .edu
.
Robust Email Validation Regex
What does a robust email RegEx look like? Chances are - you won't like it, unless you enjoy looking at Regular Expressions, which isn't a particularly common hobby.
Long story short, this is what it looks like:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=^_`{|}~-]+)*
|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]
|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")
@
(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
|\[(?:(?:(2(5[0-5]|[0-4][0-9])
|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])
|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]
|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
- Adapted RegEx by bortzmeyer
This is the RFC5322-compliant Regular Expression that covers 99.99% of input email addresses.*
Explaining it with words is typically off the table, but visualizing it helps a lot:
*Image and claim are courtesy of EmailRegex.com.
That being said, to create a truly robust email verification Regular Expression checker in Java, let's substitute the loose one with this:
String email = "[email protected]";
Pattern pattern = Pattern.compile("(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])");
Matcher matcher = pattern.matcher(email);
if (matcher.matches()) {
System.out.println(String.format("Email '%s' is valid!", matcher.group(0)));
}
Needless to say - this works:
Email '[email protected]' is valid!
This doesn't check whether the email exists (can't check that unless you try to send the email to the address) so you're always stuck with that possibility. And of course, even this regex will note that odd email addresses such as:
[email protected]
... are fully valid.
Conclusion
In this short guide, we've taken a look at how to perform email validation in Java with Regular Expressions.
Any sort of validation really typically depends on your specific project, but there are some loose/general-purpose forms you can enforce and match for.
We've built a simple general-purpose form which will work most of the time, followed by a greatly robust Regular Expression as detailed by RFC5322.