Introduction
Oftentimes, we are faced with a situation where we need to split a string at some specific character or substring, to derive some useful information from it.
For example, we might want to split a phone number on the country code or data imported from a CSV file.
In this article, we'll cover how to split a String in Java.
The split() Method (Without a Limit)
This method takes one String
parameter, in regular expression (regex) format. This method splits the string around the matches of the given regular expression.
The syntax for this method is:
String[] split(String regex, int limit)
Where the regex
parameter represents the delimiter, i.e. based on what we'll split our string. Keep in mind that this parameter doesn't need to be anything complicated, Java simply provides the option of using regular expressions.
For example, let's see how we can split this String into two separate names:
String myString = "Jane-Doe";
String[] splitString = myString.split("-");
We can simply use a character/substring instead of an actual regular expression. Of course, there are certain special characters in regex which we need to keep in mind, and escape them in case we want their literal value.
Once the string is split, the result is returned as an array of Strings. Strings in the returned array appear in the same order as in the original string.
The results are packed in the String array. To retrieve the separate names, we can access each element:
System.out.println(splitString[0]);
System.out.println(splitString[1]);
This results in:
Jane
Doe
Keep in mind, this method will split the string on all occurrences of the delimiter. For example, we can have a CSV formatted input:
String myString = "Jane,21,Employed,Software Engineer";
String[] splitString = myString.split(",");
for (String s : splitString) {
System.out.println(s);
}
This results in:
Jane
21
Employed
Software Engineer
Java split() Method (With a Limit)
Here, the method takes on two parameters, one being the previously discussed regex
, and the other being an integer value, denoting the limit
. The limit
parameter is used to decide how many times we want to split the string.
The limit
parameter can take one of three forms, i.e it can either be greater than, less than or above zero. Let's take a look at what each of these situations represents:
- A positive
limit
- TheString
will be split up to a maximum oflimit - 1
times. Beyond this, the rest of the string will be returned as the last element of the array, as it is, without splitting. The length of the returned array will always be less than or equal tolimit
. - A negative
limit
- TheString
is split at the delimiter as many times as possible, ignoring the particular negative value set. The substrings in the array include the trailing spaces in the original string, if there are any. - When the
limit
is set to0
- TheString
is again split as many times as possible, and there is no limit on the length of the resulting array. This works the same as calling thesplit()
method, with regex as the only argument, as seen earlier. In this case, trailing spaces are not returned.
Positive Limit Value
Let's take a look at some examples of using different limits. Firstly, a positive limit
value:
String myString = "there,,are,more,than,three,commas,,,";
String [] splitStrings = myString.split(",", 4);
for(String string : splitStrings){
System.out.println(String.format(" \" %s \"", string));
}
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
With a limit of 4, the String
will be split at most three (limit - 1
) times. Which gives us an array with four elements (0..3), the last element being everything after the third split:
"there"
""
"are"
"more,than,three,commas,,,"
If we used a negative limit
on this same String:
String myString = "there,,are,more,than,three,commas,,,";
String [] splitStrings = myString.split(",", -1);
for(String string : splitStrings){
System.out.println(String.format(" \" %s \"", string));
}
The String
will be split as many times as possible, and the trailing empty strings would be added to the array:
"there"
""
"are"
"more"
"than"
"three"
"commas"
""
""
""
The actual negative value we used isn't taken into consideration, we would get the same result if we used -150
.
If we set the limit
to 0
, the String would again be split as many times as possible, but the resulting array wouldn't contain the trailing empty spaces:
String myString = "there,,are,more,than,three,commas,,,";
// Equivalent to calling the split() method with only the regex parameter
String [] splitStrings = myString.split(",", 0);
for(String string : splitStrings){
System.out.println(String.format(" \" %s \"", string));
}
This would give us:
"there"
""
"are"
"more"
"than"
"three"
"commas"
Note on Special Characters
As we mentioned earlier, the regex
parameter passed as the delimiter in the split()
method is a regular expression. We have to make sure to escape special characters if we want to use their literal value as a delimiter. For example, the *
character means "one or more instances of the following character(s)".
There are 12 such characters in regex. These are: \
, ^
, $
, .
, |
, ?
, *
, +
, (
, )
, [
, {
. You can see their meaning in regex here.
If we want to split a String
at one of these characters, special care has to be taken to escape these characters in the method parameters. One way we can use this is to use a backslash \
. For example:
string.split("\\|");
Splits the string
variable at the |
character. We use two backlashes here since we need to first escape the Java-meaning of the backlash, so the backslash can be applied to the |
character.
Instead of this, we can use a regex character set This refers to putting the special characters to be escaped inside square brackets. This way, the special characters are treated as normal characters. For example, we could use a |
as a delimiter by saying:
string.split("[|]");
Yet another way to escape special characters is to use Pattern.quote()
:
string.split(Pattern.quote("|"));
Conclusion
The split()
method of the Java String
class is a very useful and often used tool. Most data, especially that obtained from reading files would require some amount of preprocessing, such as splitting the string, to obtain meaningful information from it.
In this article, we've gone over how to split strings in Java.