Introduction
Checking for substrings within a String is a fairly common task in programming. For example, sometimes we wish to break a String if it contains a delimiter at a point. Other times, we wish to alter the flow if a String contains (or lacks) a certain substring, which could be a command.
There's a couple of ways to do this in Java, and most of them are what you'd expect to see in other programming languages as well. One approach that is unique to Java, however, is the use of a Pattern
class, which we'll cover later in the article.
Alternatively, you can use Apache Commons and the helper class StringUtils
, which offers many derived methods from the core methods for this purpose.
Core Java
String.contains()
The first and foremost way to check for the presence of a substring is the .contains()
method. It's provided by the String
class itself and is very efficient.
The method accepts a CharSequence
and returns true
if the sequence is present in the String we call the method on:
String string = "Java";
String substring = "va";
System.out.println(string.contains(substring));
Running this would yield:
true
Note: The .contains()
method is case sensitive. If we tried looking for "Va"
in our string
, the result would be false
.
Oftentimes, to avoid this issue, since we're not looking for case sensitivity, you'd match the case of both Strings before checking:
System.out.println(string.toLowerCase().contains(substring.toLowerCase()));
// OR
System.out.println(string.toUpperCase().contains(substring.toUpperCase()));
String.indexOf()
The .indexOf()
method is a bit more crude than the .contains()
method, but it's nevertheless the underlying mechanism that enables the .contains()
method to work.
It returns the index of the first occurrence of a substring within a String, and offers a few constructors to choose from:
indexOf(int ch)
indexOf(int ch, int fromIndex)
indexOf(String str)
indexOf(String str, int fromIndex)
We can either search for a single character with or without an offset or search for a String with or without an offset.
The method will return the index of the first occurrence if present, and -1
if not:
String string = "Lorem ipsum dolor sit amet.";
// You can also use unicode for characters
System.out.println(string.indexOf('i'));
System.out.println(string.indexOf('i', 8));
System.out.println(string.indexOf("dolor"));
System.out.println(string.indexOf("Lorem", 10));
Running this code will yield:
6
19
12
-1
- The first occurrence of
i
is in the wordipsum
, 6 places from the start of the character sequence. - The first occurrence of
i
with an offset of8
(i.e. the search starts ats
ofipsum
) is in thesit
word, 19 places from the start. - The first occurrence of the String
dolor
is 12 places from the start. - And finally, there is no occurrence of
Lorem
with an offset of10
.
Ultimately, the .contains()
method calls upon the .indexOf()
method to work. That makes .indexOf()
inherently even more efficient than the counterpart (albeit a very small amount) - though it does have a slightly different use-case.
String.lastIndexOf()
As opposed to the .indexOf()
method, which returns the first occurrence, the .lastIndexOf()
method returns the index of the last occurrence of a character or String, with or without an offset:
String string = "Lorem ipsum dolor sit amet.";
// You can also use unicode for characters
System.out.println(string.lastIndexOf('i'));
System.out.println(string.lastIndexOf('i', 8));
System.out.println(string.lastIndexOf("dolor"));
System.out.println(string.lastIndexOf("Lorem", 10));
Running this code will yield:
19
6
12
0
Some may be a bit surprised by the results and say:
lastIndexOf('i', 8)
should've returned19
as that's the last occurrence of the character after the 8th character in the String
What's worth noting is that when running the .lastIndexOf()
method, the character sequence is reversed. The counting starts at the final character and goes towards the first.
That being said - yes. The expected output is 6
, as that's the last occurrence of the character after skipping 8 elements from the end of the sequence.
Pattern with Regex and Matcher
The Pattern
class is essentially a compiled representation of a regular expression. It's used alongside the Matcher
class to match character sequences.
This class works by compiling a pattern first. We then assign another pattern to a Matcher
instance, which uses the .find()
method to compare the assigned and compiled patterns.
If they match, the .find()
method results in true
. If the patterns don't match, the method results in false
.
Pattern pattern = Pattern.compile(".*" + "some" + ".*");
Matcher matcher = pattern.matcher("Here is some pattern!");
System.out.println(matcher.find());
This would yield:
true
Apache Commons
Due to its usefulness and prevalence in Java, many projects have Apache Commons included in the classpath. It's a great library with many useful features often used in production - and checking for substrings is no exception.
Apache Commons offers the StringUtils
class with many helper methods for String manipulation, null-checking, etc. For this task, we can utilize any of the .contains()
, .indexOf()
, .lastIndexOf()
, or .containsIgnoreCase()
methods.
If not, including it is as easy as adding a dependency to your pom.xml
file if you're using Maven:
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>{version}</version>
</dependency>
Or by adding it through Gradle:
compile group: 'org.apache.commons', name: 'commons-lang3', version: '{version}'
StringUtils.contains()
The .contains()
method is pretty straightforward and very similar to the core Java approach.
The only difference is that we don't call the method on the String we're checking (as it doesn't inherit this method), but rather pass the String we're searching in alongside the String we're searching for:
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
String string = "Checking for substrings within a String is a fairly common task in programming.";
System.out.println(StringUtils.contains(string, "common task"));
Running this code will yield:
true
Note: This method is case-sensitive.
StringUtils.indexOf()
Naturally, the .indexOf()
method also works very similarly to the core Java approach:
String string = "Checking for substrings within a String is a fairly common task in programming.";
// Search for first occurrence of 'f'
System.out.println(StringUtils.indexOf(string, 'f'));
// Search for first occurrence of 'f', skipping the first 12 elements
System.out.println(StringUtils.indexOf(string, 'f', 12));
// Search for the first occurrence of the "String" string
System.out.println(StringUtils.indexOf(string, "String"));
Running this code will yield:
9
45
32
StringUtils.indexOfAny()
The .indexOfAny()
method accepts a vararg of characters, instead of a single one, allowing us to search for the first occurrence of any of the passed characters:
String string = "Checking for substrings within a String is a fairly common task in programming.";
// Search for first occurrence of 'f' or 'n', whichever comes first
System.out.println(StringUtils.indexOfAny(string, ['f', 'n']));
// Search for the first occurrence of "String" or "for", whichever comes first
System.out.println(StringUtils.indexOfAny(string, "String", "for"));
Running this code will yield:
6
9
StringUtils.indexOfAnyBut()
The .indexOfAnyBut()
method searches for the first occurrence of any character that's not in the provided set:
String string = "Checking for substrings within a String is a fairly common task in programming.";
// Search for first character outside of the provided set 'C' and 'h'
System.out.println(StringUtils.indexOfAny(string, ['C', 'h']));
// Search for first character outside of the provided set 'C' and 'h'
System.out.println(StringUtils.indexOfAny(string, ["Checking", "for"]));
Running this code will yield:
2
14
StringUtils.indexOfDifference()
The .indexOfDifference()
method compares two character arrays, and returns the index of the first differing character:
String s1 = "Hello World!"
String s2 = "Hello world!"
System.out.println(StringUtils.indexOfDifference(s1, s2));
Running this code will yield:
6
StringUtils.indexOfIgnoreCase()
The .indexOfIgnoreCase()
method will return the index of the first occurrence of a character in a character sequence, ignoring its case:
String string = "Checking for substrings within a String is a fairly common task in programming."
System.out.println(StringUtils.indexOf(string, 'c'));
System.out.println(StringUtils.indexOfIgnoreCase(string, 'c'));
Running this code will yield:
3
0
StringUtils.lastIndexOf()
And finally, the .lastIndexOf()
method works pretty much the same as the regular core Java method:
String string = "Lorem ipsum dolor sit amet.";
// You can also use unicode for characters
System.out.println(StringUtils.lastIndexOf(string, 'i'));
System.out.println(StringUtils.lastIndexOf(string, 'i', 8));
System.out.println(StringUtils.lastIndexOf(string, "dolor"));
System.out.println(StringUtils.lastIndexOf(string, "Lorem", 10));
Running this code will yield:
19
6
12
0
StringUtils.containsIgnoreCase()
The .containsIgnoreCase()
method checks if String contains a substring, ignoring the case:
String string = "Checking for substrings within a String is a fairly common task in programming.";
System.out.println(StringUtils.containsIgnoreCase(string, "cOmMOn tAsK"));
Running this code will yield:
true
StringUtils.containsOnly()
The .containsOnly()
method checks if a character sequence contains only the specifies values.
This can be a bit misleading, so another way to put it is - it checks if the character sequence is made up of only the specified characters. It accepts either a String or a character sequence:
String string = "Hello World!"
System.out.println(StringUtils.containsOnly(string, 'HleWord!'));
System.out.println(StringUtils.containsOnly(string, "wrld"));
Running this will yield:
true
false
The "Hello World!"
String indeed is constructed of only the characters in the 'HleWord!'
sequence.
Note: Not all of the characters from the sequence need to be used in the string
for the method to return true. What matters is that string
doesn't contain a character that's not in the character sequence.
StringUtils.containsNone()
The .containsNone()
method checks if the String contains any of the "forbidden" characters from a set. If it does, false
is returned, and vice-versa:
String string = "Hello World!"
System.out.println(StringUtils.containsNone(string, 'xmt'));
System.out.println(StringUtils.containsNone(string, "wrld"));
Running this code yields:
true
false
StringUtils.containsAny()
And finally, the .containsAny()
method returns true
if a character sequence contains any of the passed parameters in the form of a character sequence or a String:
String string = "Hello World!"
System.out.println(StringUtils.containsAny(string, ['h', 'm']));
System.out.println(StringUtils.containsAny(string, "hell"));
This code would yield:
true
true
Conclusion
In conclusion, there are many ways to check for a substring in a String. The core Java approach will be enough in most cases, though if you need to check with more than a single condition - Apache Commons is a real time-saver.
In many cases, defining logic of your own for a method such as .indexOfAnyBut()
would be a pain and simply redundant. Since most projects nowadays already have Apache Commons in the classpath, it's most likely that you can simply use the methods provided by the StringUtils
class.