How to Split a String in Java

Introduction

Oftentimes, we are faced with a situation where we need to split a string at some specific character or substring, to derive some useful information from it.

For example, we might want to split a phone number on the country code or data imported from a CSV file.

In this article, we'll cover how to split a String in Java.

The split() Method (Without a Limit)

This method takes one String parameter, in regular expression (regex) format. This method splits the string around the matches of the given regular expression.

The syntax for this method is:

String[] split(String regex, int limit)

Where the regex parameter represents the delimiter, i.e. based on what we'll split our string. Keep in mind that this parameter doesn't need to be anything complicated, Java simply provides the option of using regular expressions.

For example, let's see how we can split this String into two separate names:

String myString = "Jane-Doe";
String[] splitString = myString.split("-");

We can simply use a character/substring instead of an actual regular expression. Of course, there are certain special characters in regex which we need to keep in mind, and escape them in case we want their literal value.

Once the string is split, the result is returned as an array of Strings. Strings in the returned array appear in the same order as in the original string.

The results are packed in the String array. To retrieve the separate names, we can access each element:

System.out.println(splitString[0]);
System.out.println(splitString[1]);

This results in:

Jane
Doe

Keep in mind, this method will split the string on all occurrences of the delimiter. For example, we can have a CSV formatted input:

String myString = "Jane,21,Employed,Software Engineer";
String[] splitString = myString.split(",");

for (String s : splitString) {
    System.out.println(s);
}

This results in:

Jane
21
Employed
Software Engineer

Java split() Method (With a Limit)

Here, the method takes on two parameters, one being the previously discussed regex, and the other being an integer value, denoting the limit. The limit parameter is used to decide how many times we want to split the string.

The limit parameter can take one of three forms, i.e it can either be greater than, less than or above zero. Let's take a look at what each of these situations represents:

  • A positive limit - The String will be split up to a maximum of limit - 1 times. Beyond this, the rest of the string will be returned as the last element of the array, as it is, without splitting. The length of the returned array will always be less than or equal to limit.
  • A negative limit - The String is split at the delimiter as many times as possible, ignoring the particular negative value set. The substrings in the array include the trailing spaces in the original string, if there are any.
  • When the limit is set to 0 - The String is again split as many times as possible, and there is no limit on the length of the resulting array. The works the same as calling the split() method, with regex as the only argument, as seen earlier. In this case, trailing spaces are not returned.

Positive Limit Value

Let's take a look at some examples of using different limits. Firstly, a positive limit value:

String myString = "there,,are,more,than,three,commas,,,";
String [] splitStrings = myString.split(",", 4);

for(String string : splitStrings){
    System.out.println(String.format(" \" %s \"", string));
}

With a limit of 4, the String will be split at most three (limit - 1) times. Which gives us an array with four elements (0..3), the last element being everything after the third split:

"there"
""
"are"
"more,than,three,commas,,,"

If we used a negative limit on this same String:

String myString = "there,,are,more,than,three,commas,,,";
String [] splitStrings = myString.split(",", -1);

for(String string : splitStrings){
    System.out.println(String.format(" \" %s \"", string));
}

The String will be split as many times as possible, and the trailing empty strings would be added to the array:

"there"
""
"are"
"more"
"than"
"three"
"commas"
""
""
""

The actual negative value we used isn't taken into consideration, we would get the same result if we used -150.

If we set the limit to 0, the String would again be split as many times as possible, but the resulting array wouldn't contain the trailing empty spaces:

String myString = "there,,are,more,than,three,commas,,,";

// Equivalent to calling the split() method with only the regex parameter
String [] splitStrings = myString.split(",", 0);

for(String string : splitStrings){
    System.out.println(String.format(" \" %s \"", string));
}

This would give us:

"there"
""
"are"
"more"
"than"
"three"
"commas"

Note on Special Characters

As we mentioned earlier, the regex parameter passed as the delimiter in the split() method is a regular expression. We have to make sure to escape special characters if we want to use their literal value as a delimiter. For example, the * character means "one or more instances of the following character(s)".

There are 12 such characters in regex. These are: \, ^, $, ., |, ?, *, +, (, ), [, {. You can see their meaning in regex here.

If we want to split a String at one of these characters, special care has to be taken to escape these characters in the method parameters. One way we can use this is to use a backslash \. For example:

string.split("\\|");

Splits the string variable at the | character. We use two backlashes here since we need to first escape the Java-meaning of the backlash, so the backslash can be applied to the | character.

Instead of this, we can use a regex character set This refers to putting the special characters to be escaped inside square brackets. This way, the special characters are treated as normal characters. For example, we could use a | as a delimiter by saying:

string.split("[|]");

Yet another way to escape special characters is to use Pattern.quote():

string.split(Pattern.quote("|"));

Conclusion

The split() method of the Java String class is a very useful and often used tool. Most data, especially that obtained from reading files would require some amount of pre-processing, such as splitting the string, to obtain meaningful information from it.

In this article, we've gone over how to split strings in Java.