Common String Operations in Java

Introduction

Simply put, a String is used to store text, i.e. a sequence of characters. Java's most used class is the String class, without a doubt, and with such high usage, it's mandatory for Java developers to be thoroughly acquainted with the class and its common operations.

String

There's a lot to say about Strings, from the ways you can initialize them to the String Literal Pool, however in this article we'll focus on common operations, rather than the class itself.

Although, if you'd like to read more about various ways of creating strings in Java you should check out String vs StringBuilder vs StringBuffer.

Here, we're assuming that you're familiar with the fact that Strings are immutable, as it's a very important thing to know before handling them. If not, refer to the previously linked article where it's explained in detail.

The String class comes with many helper methods that help us process our textual data:

String Concatenation

Before we begin using any of these methods on strings, we should take a look at String concatenation as it's a fairly common thing to do. Let's start with the + operator. The String class overloads that operator and it is used to concatenate two strings:

String aplusb = "a" + "b";

// The operands can be String object reference variables as well
String a = "a";
String b = "b";
aplusb = a + b;

The + operator is very slow. String objects are immutable, so every time we wish to concatenate n strings Java has to copy the characters from all strings into a new String object. This gives us quadratic (O(n^2)) complexity.

This isn't a problem with small strings, or when we're concatenating just several strings at the same time (String abcd = "a" + "b" + "c" + "d";). Java automatically uses StringBuilder for concatenating several strings at once, so the source of the performance loss is concatenating in loops. Usually, for something like that, we'd use the aforementioned StringBuilder class.

It works like a mutable String object. It bypasses all the copying in string concatenation and gives us linear (O(n)) complexity.

int n = 1000;

// Not a good idea! Gives the right result, but performs poorly.
String result = "";
for (int i = 0; i < n; i++) {
    result += Integer.valueOf(i);
}

// Better, performance-friendly version.
StringBuilder sb = new StringBuilder("");
for (int i = 0; i < n; i++) {
    sb.append(i);
}

We can also concatenate using the concat() method:

String str1 = "Hello";
System.out.println(str1.concat("World"));

Output:

Hello World

Note: When using String concatenation with other data types, they implicitly get converted to their string representation:

System.out.println("2 = " + 2); 

This gives the expected output "2 = 2".

System.out.println("2 = " + 1 + 1);

In regular circumstances, 1+1 would be evaluated first as Java deals with operations from right to left. However, this time, it won't - the output is "2 = 11". This is because of something called "operator precedence".

Essentially when two or more "+" operators are encountered (with no other operators present, nor parentheses) Java will start with the leftmost "+" operator and continue from there. If we wanted the output to be "2 = 2" again, we'd need to add parentheses in the appropriate place.

System.out.println("2 = " + (1 + 1));

On the other hand, if we try to use the concat() method with a different data type:

String str1 = "Hello";
System.out.println(str1.concat(53));

We'd be greeted with an exception:

incompatible types: int cannot be converted to String

When using the + operand, Java automatically converts the data type into a String, whereas when using the method concat(), it doesn't.

By the way, with all of the methods we'll explore in this article, we don't need to provide a reference variable, sometimes for brevity it's easier to simply use them on a literal:

// Instead of this...
String ourString = "this is just some string";
System.out.println(ourString.substring(5,10));

// ...we can do this:
System.out.println("this is just some string".substring(5,10));

Really, either way is fine, but the second way yields less code.

Determine String Length

length() returns the total number of characters in our String.

isEmpty() returns true or false depending on whether our String is empty or not. So this means that isEmpty() returns true for the same case that length() returns 0.

For example:

if (s.length() == 0) // or s.isEmpty() {
    System.out.println("s is empty");
}
else System.out.println("s isn't empty, it's: " + s + "\n");

Here we show how you can use these methods to check for an empty string. The conditional check could also be replaced with s.isEmpty() and would work just the same.

Finding Characters and Substrings

Since a String is an immutable sequence of characters, we can ask what character is in what position, or find the position of a character. Indexing of a String starts at 0, like we're used to with arrays.

charAt(int index) returns the character value at a given index.

indexOf() is overloaded, and therefore has multiple uses:

  • indexOf(int ch) returns the first index position that matches the given character value
  • indexOf(int ch, int fromIndex) returns the first index that matches the given character value AFTER fromIndex
  • indexOf(String substring) returns the (first) starting position of substring in the String object it was called on
  • indexOf(String substring, int fromIndex) same as the previous method, but the search begins at fromIndex instead of 0

All of the overloaded indexOf() methods return -1 if the index was not found.

lastIndexOf() is also overloaded, and has equivalent method signatures to indexOf(), and also returns -1 if an appropriate index wasn't found. It searches the String object backward unless a fromIndex is specified.

The index passed to the method has to be within the range [0, example.length() - 1] to be valid. Otherwise, a StringIndexOutOfBoundsException is thrown.

String example = "This should be complicated enough to show some things we should show";

// Find the characters at the indexes given
System.out.println(example.charAt(0));
System.out.println(example.charAt(5));

// An StringIndexOutOfBoundsException is thrown in both these cases:
// System.out.println(example.charAt(-1));
// System.out.println(example.charAt(200));

// Find the index of characters or substrings
System.out.println(example.indexOf('s')); // returns the first occurence of 's'
System.out.println(example.indexOf('s', 4)); // the first 's' after index 4
System.out.println(example.indexOf("should")); // the index of the first "should" in our string
System.out.println(example.indexOf("should", 15)); // the index of the first "should" in our
                                                   // string _after_ index 15

// Find the last index of characters or substrings
System.out.println(example.lastIndexOf('s')); // returns the first occurence of 's' when we look backwards from the end of the string
System.out.println(example.lastIndexOf('s', 45)); // searches for 's' backwards from the position 45
System.out.println(example.lastIndexOf("should")); // returns the position at which the substring 'should' appears, looking backwards from the end of the string
System.out.println(example.lastIndexOf("should", 20)); // finds substring 'should' from position 20 backwards, and returns the position at which it begins

This will output the following:

T
s
3
5
5
57
64
42
57
5

Note: indexOf(int ch, int fromIndex) is often used in loops, when we want to do something for every occurrence of a character in a String.

int foundAt = -1;
String example = "This should be complicated enough to show some things we should show";
while (true) {
    foundAt = example.indexOf('s', foundAt + 1);
    if (foundAt == -1)
        break;
    else {
        // do something with that information
    }
}

Comparing Strings

The compareTo() method lexicographically compares our String with another. The actual comparison of the two strings is based on the Unicode value of each character in the string. The method returns either a positive number, a negative number, or 0.

If all characters in our string were all lower case (or all uppercase) letters, the return value of the compareTo() method can be interpreted as "if the return value was negative, my string would come before the other string in a dictionary".

I emphasize the point that the letters would need to be in the same case, since the function might produce unexpected output otherwise.

The compareTo() method doesn't go through all the characters in our strings, it returns as soon as it reaches the end of any of the strings, or as soon as it finds a non-matching character. In which case the function returns (Unicode value of the mismatched character in our string) - (Unicode value of the mismatched character in the given string).

For anyone that's curious - ASCII is a part of Unicode. Which means that a-z and A-Z are in the same order as in ASCII encoding, i.e. they're all one after the other in their respective cases. Namely, a-z are codes between 97-122 and A-Z is 65-90. So the value for 'a' is 97, the value for 'b' is 98 and so on. This way, when we subtract the Unicode value for 'b' from 'a', we get -1. Meaning that 'a' is one letter before 'b', which it is.

System.out.println("a".compareTo("a"));
System.out.println("a".compareTo("b"));
System.out.println("1".compareTo("12345678"));
System.out.println("2".compareTo("12345678"));
System.out.println("abcd".compareTo("abgggggggggg"));
0
-1
-7
1
-4

On the third line of the code above, in this case compareTo returns the difference in string lengths, since it didn't find a mismatched character before it "ran out" of characters in one string.

And in the last line we see -4 is printed because of 'c' - 'g', since that's the first mismatch it found, and it doesn't care about the rest.

Note: The "unexpected" part when using compareTo() happens when we compare strings with different cases.

System.out.println("ORANGE".compareTo("apple")); 

We might expect the method to return a positive value, since "apple" should come before "ORANGE". However, the Unicode value for 'O' is less than the Unicode value for 'a'.

This might sometimes be preferred behavior, but in case it isn't - we use compareToIgnoreCase(). That method does essentially the same thing as compareTo(), it just pretends that everything is in the same case, and gives us a "proper" dictionary order.

Note: compareTo() and compareToIgnoreCase() are often used when we make a Comparator for a custom class.

For example, let's say we have a Person object like the following:

class Person {
    String firstName;
    String lastName;
    // ...
}

Now let's say we have an ArrayList called "people" of many Person objects, in no particular order. We'd like to sort that ArrayList so that they are ordered in lexicographical order based on their last name, and if people have the same last name, we'd like to sort them based on their first name.

Comparator<Person> personComparator = new Comparator<Person>() {
    @Override
    public int compare(Person p1, Person p2) {
        if (p1.firstName.compareTo(p2.firstName) != 0) {
            return p1.firstName.compareTo(p2.firstName);
        }
        else return p1.lastName.compareTo(p2.lastName);
    }
};
Collections.sort(people, personComparator);

Extracting Substrings

A "substring" is a subset of (or part of) another string. The substring() method returns a new string that is a substring of the string we use the method on.

In other words, if we wanted a new string containing the first three characters of our string, we'd use ourString.substring(0, 3).

The substring() method has two variations:

  • substring(int startIndex) returns a String containing all the characters from startIndex (inclusive) to the end of our String. It behaves the same as substring(int startIndex, ourString.length()).
  • substring(int startIndex, int endIndex) returns a String containing all the characters from startIndex (inclusive) to endIndex (exclusive, i.e. the character at endIndex isn't returned)

Note: The given indices must still be in the interval [0, ourString.length()-1]. Java, unlike some other languages, does NOT support negative indices in the substring() method! Java will throw a StringIndexOutOfBoundsException for any of the following reasons:

  • startIndex is negative
  • endIndex is larger than the length of our String object
  • startIndex is larger than endIndex

Although the documentation doesn't explicitly say that "no negative values are allowed at all" (one might have the habit of giving -1 as the endIndex from other programming languages), that rule can be derived from that fact that startIndex can't be negative, and that endIndex has to be larger than startIndex.

However, Java just makes us take the extra step of writing ourString.length() - someNumber as endIndex instead of just - someNumber.

String ourString = "abcdef";
System.out.println(ourString.substring(0,3));
System.out.println(ourString.substring(2));
System.out.println(ourString.substring(1,3));

// If we want the last few characters
System.out.println(ourString.substring(ourString.length()-3));
abc
cdef
bc
def

Changing String Case

These two simple methods are used to change the case of characters within a string.

  • toLowerCase(): changes all upper case characters to lower case (ignores everything else)
  • toUpperCase(): changes all lower case characters to upper case (ignores everything else)
String ourString = "ThInK oF a ClEvEr StRiNg";

System.out.println(ourString.toLowerCase());
System.out.println(ourString.toUpperCase());
System.out.println(ourString);

This will ouptput the following:

think of a clever string
THINK OF A CLEVER STRING
ThInK oF a ClEvEr StRiNg

Notice that the initial String object itself is unchanged.

Removing Whitespace

This method returns a copy of the initial String object in which any leading and trailing whitespace (spaces, tabs, newlines) is removed.

String ourString = "      Any non-leading and non-trailing whitespace is  \n  preserved       ";
System.out.println(ourString.trim());

Output:

Any non-leading and non-trailing whitespace is  
  preserved

trim() is often used when processing user input, since it makes sure that we have no useless whitespace and doesn't change the string if we don't.

A very common use of trim() with user input is checking whether any non-whitespace characters were entered at all:

// Usually we check for empty inputs like this:
if (userinput.isEmpty()) { ... }
// ...or the equivalent
if (userinput.length() != 0) { ... }

// But a better way to check would be this, which
// handles cases where the user entered only
// whitespace (i.e. "    ")
if (userinput.trim().isEmpty()) { ... }

Formatting Strings

The format() method returns a formatted string with a given format and arguments. It's used to make life simple when formatting complex strings in Java. It works similarly to printf in C:

public static String format(String form, Object... args)

This method declaration might seem complicated but let's take a closer look at it:

  • For our purposes, the static part means that this method is called through the String class, and not through an object of the String class. Meaning that when we want to use this method we'd write String.format(...) and not ourString.format(...). We can call the method the second way, but ourString won't play a part in the method anyway.
  • The ... (three dots) after Object just says that a variable number of arguments can be passed here. One or two or fifty, it all depends on the String form.

Let's start with a simple example.

int a = 2;
int b = 3;
int c = 4;
int d = 1;

// %d indicates we want to print an integer
System.out.println(String.format("%d", a));
2

The format() method goes through the form string and looks for special characters and replaces them with arguments in args.

Special characters start with a %. In our example, we used %d, which Java understands as "I'll try and parse the provided argument in args as an integer".

A slightly more insightful example of when format() is useful:

// Very messy, hard to read, and hard to maintain
System.out.println("a = " + a + "\n" + "b = " + b + "\n" + "c = " + c + "\n" + "d = " + d + "\n");

// Much prettier
System.out.println(String.format("a = %d \nb = %d \nc = %d \nd = %d", a, b, c, d));

As we can see in this example, Java matches the special characters beginning with % with the arguments in order. Meaning that when it sees the first %d it will match it to a, the second %d to b and so on.

There are a lot of special characters for format() and you can find the full list in the docs (including a whole bunch of date/time options), but the ones you'll most commonly see and use are:

  • %d: integral types (byte, short, int, long, BigInteger)
  • %s: Strings
  • %f: for float as a decimal number, %e formatted as a decimal number in computerized scientific notation, and %g prints either the same as %f or %e depending on the precision value after rounding.
  • %b: for Boolean values. If the value is null, "false" is printed

Generally speaking, the format() method has a seemingly complicated syntax:

%[argument_index$][flags][width][.precision]conversion

argument_index, flags, width, and precision are all optional as indicated by [].

Precision can mean different things for different data types. For floats/doubles precision has the obvious meaning of "how many digits am I supposed to show after the decimal period". Other than that, precision specifies the maximum number of characters to be written to the output.

double ourDouble = 1123.9303;
System.out.println(String.format("%f", ourDouble));
System.out.println(String.format("%.3f", ourDouble)); // specifies that we only want 3 digits after decimal point
System.out.println(String.format("%e", ourDouble));

String ourString  = "what does precision do with strings?";
System.out.println(String.format("%.8s", ourString)); // prints the first 8 characters of our string

int ourInt = 123456789;
// System.out.println(String.format("%.4d", ourInt)); // precision can't be used on ints

This will output:

1123.930300
1123.930
1.123930e+03
what doe

The optional width specifies the minimum width of the output.

// If our number has less than 6 digits, this will
// add extra 0s to the beginning until it does
System.out.println(String.format("%06d", 12)); 

// If our number has more than 6 digits, it will just print it out
System.out.println(String.format("%06d", 1234567));

// We can specify output width, with the output being aligned
// to the right if it's shorter than the given space. If it's
// longer, everything gets printed. The || are added for
// demonstration purposes only
System.out.println(String.format("|%20d|", 12));
// Or we can align the output to the left
System.out.println(String.format("|%-20d|", 12));

// We can also easily print an octal/hexadecimal value of an integer
System.out.println(String.format("Octal: %o, Hex: %x", 10, 10));

Running this code will produce the following:

000012
1234567
|                  12|
|12                  |
Octal: 12, Hex: a

Regex and Checking for Substrings

contains(CharSequence s) returns true if s is a part of our String object (s can be a String itself or StringBuilder object, or really anything that implements CharSequence), otherwise it returns false.

startsWith(String prefix) returns true if our String object literally starts with the given prefix, otherwise it returns false.

endsWith(String suffix) returns true if our String object literally ends with the given suffix, otherwise it returns false.

matches(String regex) returns true if our entire String matches the given regular expression.

All of these methods are rather straight-forward. Although matches() presumes knowledge of regular expressions.

String ourString = "This string contains a contains.";

System.out.println(ourString.contains("contains"));
System.out.println(ourString.startsWith("T"));
System.out.println(ourString.endsWith(":)"));
System.out.println(ourString.matches(".*string.*"));

These operations output the following:

true
true
false
true

Replacing Characters and Substrings

replace(char oldChar, char newChar) replaces all occurrences of oldChar with newChar.

replace(CharSequence target, CharSequence replacement) replaces all occurrences of target string with the replacement string (meaning that we can replace entire substrings instead of just characters).

replaceAll(String regex, String replacement) replaces all substrings that match the regex argument with the replacement string.

replaceFirst(String regex, String replacement) replaces only the first substring that matches the regex argument with the replacement string.

To avoid any confusion, replace() also replaces ALL occurrences of a character sequence, even though there's a method named replaceAll(). The difference is that replaceAll() and replaceFirst() use regex to find the character sequences that need to be replaced.

String ourString = "We really don't like the letter e here";

System.out.println(ourString.replace('e', 'a'));
System.out.println(ourString.replace("here", "there"));
System.out.println(ourString.replaceAll("e(r+)", "a"));
System.out.println(ourString.replaceFirst("e(r+)", "a"));
Wa raally don't lika tha lattar a hara
We really don't like the letter e there
We really don't like the letta e hae
We really don't like the letta e here, only the first occurrence was replaced

Splitting and Joining Strings

The methods split() and join() are two sides of the same coin.

split(String regex) splits this string using a given regular expression and returns a character array.

split(String regex, int limit) is similar to the previous method, but only splits a limit number of times.

join(CharSequence delimiter, CharSequence... elements) on the other hand returns a String containing all of the elements we listed, joined by the delimiter.

join(CharSequence delimiter, Iterable<? extends CharSequence> elements) is a very complicated way of saying that we can use join() on things like lists, to combine all the elements into a String using the given delimiter.

String ourString = "apples, oranges, pears, pineapples";
String[] fruits = ourString.split(",");

System.out.println(Arrays.toString(fruits));

// This is a great place to use the aforementioned trim() method
// to remove the space at the beginning of some of the words
for(int i = 0; i < fruits.length; i++) {
    fruits[i] = fruits[i].trim();
}

System.out.println(Arrays.toString(fruits)); // Arrays.toString() formats the output array on its own
[apples,  oranges,  pears,  pineapples]
[apples, oranges, pears, pineapples]

Keep in mind that split() takes a regular expression to decide where to split the string, so be careful when using characters that have a special meaning in regular expressions.

Since those characters are common (a particular problem is "." since that means "any character" in regex), a safe way of using split() is with Pattern.quote(".") which makes sure that nothing is understood as a special regex character.

String ourString = "apples.oranges.pears.pineapples";

// This returns then prints an empty array, since every
// character is interpreted as something to be split at
// and ignored
System.out.println(Arrays.toString(ourString.split(".")));

// The "regex safe" way of doing this would be
System.out.println(Arrays.toString(ourString.split(Pattern.quote("."))));

// Splits our string to two substrings at most,
// completely ignoring all other occurrences of "."
System.out.println(Arrays.toString(ourString.split(Pattern.quote("."), 2))); 
[]
[apples, oranges, pears, pineapples]
[apples, oranges.pears.pineapples]

join() does the exact opposite of split(). We use join() when we have an array/list/etc. of strings (or StringBuilders/StringBuffers) that we want to form into one new String using some (or no) delimiter.

// A common use is to avoid repetitive concatenation,
// i.e. "1" + "," + "2" + "," + "3" + "," + "4"
System.out.println(String.join(",", "1", "2", "3", "4"));

// We can pass an array or any class that implements
// Iterable (containing character sequences) as the
// second parameter as well
String arrayOfStrings[] = {"1","2","3","4","5"};

System.out.println(String.join("-", arrayOfStrings));
System.out.println(String.join("-", Arrays.asList(arrayOfStrings))); // Works just fine with lists as well

// Join them with an empty string to convert an array
// of Strings to one single String without any extra data
System.out.println(String.join("", arrayOfStrings));
1,2,3,4
1-2-3-4-5
1-2-3-4-5
12345

Creating Character Arrays

This method converts the String it's used on into a character array. It returns a new character array, containing all the characters (in order) that are in the String.

toCharArray() a straightforward method signature.

String ourString = "These will all become separate characters";

System.out.println(Arrays.toString(ourString.toCharArray()));

This will print out the following:

[T, h, e, s, e,  , w, i, l, l,  , a, l, l,  , b, e, c, o, m, e,  , s, e, p, a, r, a, t, e,  , c, h, a, r, a, c, t, e, r, s]

String Equality

equals(Object str) compares two strings, and returns true if the strings contain the same characters in the same order, and false otherwise. The comparison is case-sensitive (use equalsIgnoreCase() for case-insensitive comparison).

It is important to understand that equals() and == perform two different operations. equals() compares the characters inside a String object, as previously mentioned, while == compares the equality of object references, to see whether they refer to the same instance. While statements such as 1 == 1 will return true "string" == "string" might not.

The tricky part here is that the output of == depends on how we've initialized the String objects we're comparing:

String s1 = "Just a String";
String s2 = "Just a String";

System.out.println(s1 == s2);
System.out.println(s1.equals(s2));

s2 = new String("Just a String");
System.out.println(s1 == s2);
System.out.println(s1.equals(s2));
true
true
false
true

equals() returns true in both cases. So you should always use equals() unless you actually want to see whether two reference variables reference the same instance, though this is pretty rare.

Conclusion

It is important to understand the nuances of Strings and String methods in Java. Subtle, hard to find bugs can occur with things like split() and regex specific special characters, or by mistakenly using == when we meant to use equals().

It's best to always look at how a method works, test them out for yourself so that you remember things you need to look out for. Besides, knowing what methods you have at your disposals saves you the unnecessary work of implementing already available methods by yourself.