Common String Operations in Java

# Common String Operations in Java

### Introduction

Simply put, a String is used to store text, i.e. a sequence of characters. Java's most used class is the String class, without a doubt, and with such high usage, it's mandatory for Java developers to be thoroughly acquainted with the class and its common operations.

### String

There's a lot to say about Strings, from the ways you can initialize them to the String Literal Pool, however in this article we'll focus on common operations, rather than the class itself.

Although, if you'd like to read more about various ways of creating strings in Java you should check out String vs StringBuilder vs StringBuffer.

Here, we're assuming that you're familiar with the fact that Strings are immutable, as it's a very important thing to know before handling them. If not, refer to the previously linked article where it's explained in detail.

The String class comes with many helper methods that help us process our textual data:

### String Concatenation

Before we begin using any of these methods on strings, we should take a look at String concatenation as it's a fairly common thing to do. Let's start with the + operator. The String class overloads that operator and it is used to concatenate two strings:

String aplusb = "a" + "b";

// The operands can be String object reference variables as well
String a = "a";
String b = "b";
aplusb = a + b;


The + operator is very slow. String objects are immutable, so every time we wish to concatenate n strings Java has to copy the characters from all strings into a new String object. This gives us quadratic (O(n^2)) complexity.

This isn't a problem with small strings, or when we're concatenating just several strings at the same time (String abcd = "a" + "b" + "c" + "d";). Java automatically uses StringBuilder for concatenating several strings at once, so the source of the performance loss is concatenating in loops. Usually, for something like that, we'd use the aforementioned StringBuilder class.

It works like a mutable String object. It bypasses all the copying in string concatenation and gives us linear (O(n)) complexity.

int n = 1000;

// Not a good idea! Gives the right result, but performs poorly.
String result = "";
for (int i = 0; i < n; i++) {
result += Integer.valueOf(i);
}

// Better, performance-friendly version.
StringBuilder sb = new StringBuilder("");
for (int i = 0; i < n; i++) {
sb.append(i);
}


We can also concatenate using the concat() method:

String str1 = "Hello";
System.out.println(str1.concat("World"));


Output:

Hello World


Note: When using String concatenation with other data types, they implicitly get converted to their string representation:

System.out.println("2 = " + 2);


This gives the expected output "2 = 2".

System.out.println("2 = " + 1 + 1);


In regular circumstances, 1+1 would be evaluated first as Java deals with operations from right to left. However, this time, it won't - the output is "2 = 11". This is because of something called "operator precedence".

Essentially when two or more "+" operators are encountered (with no other operators present, nor parentheses) Java will start with the leftmost "+" operator and continue from there. If we wanted the output to be "2 = 2" again, we'd need to add parentheses in the appropriate place.

System.out.println("2 = " + (1 + 1));


On the other hand, if we try to use the concat() method with a different data type:

String str1 = "Hello";
System.out.println(str1.concat(53));


We'd be greeted with an exception:

incompatible types: int cannot be converted to String


When using the + operand, Java automatically converts the data type into a String, whereas when using the method concat(), it doesn't.

By the way, with all of the methods we'll explore in this article, we don't need to provide a reference variable, sometimes for brevity it's easier to simply use them on a literal:

// Instead of this...
String ourString = "this is just some string";
System.out.println(ourString.substring(5,10));

// ...we can do this:
System.out.println("this is just some string".substring(5,10));


Really, either way is fine, but the second way yields less code.

### Determine String Length

length() returns the total number of characters in our String.

isEmpty() returns true or false depending on whether our String is empty or not. So this means that isEmpty() returns true for the same case that length() returns 0.

For example:

if (s.length() == 0) // or s.isEmpty() {
System.out.println("s is empty");
}
else System.out.println("s isn't empty, it's: " + s + "\n");


Here we show how you can use these methods to check for an empty string. The conditional check could also be replaced with s.isEmpty() and would work just the same.

### Finding Characters and Substrings

Since a String is an immutable sequence of characters, we can ask what character is in what position, or find the position of a character. Indexing of a String starts at 0, like we're used to with arrays.

charAt(int index) returns the character value at a given index.

indexOf() is overloaded, and therefore has multiple uses:

• indexOf(int ch) returns the first index position that matches the given character value
• indexOf(int ch, int fromIndex) returns the first index that matches the given character value AFTER fromIndex
• indexOf(String substring) returns the (first) starting position of substring in the String object it was called on
• indexOf(String substring, int fromIndex) same as the previous method, but the search begins at fromIndex instead of 0

All of the overloaded indexOf() methods return -1 if the index was not found.

lastIndexOf() is also overloaded, and has equivalent method signatures to indexOf(), and also returns -1 if an appropriate index wasn't found. It searches the String object backward unless a fromIndex is specified.

The index passed to the method has to be within the range [0, example.length() - 1] to be valid. Otherwise, a StringIndexOutOfBoundsException is thrown.

String example = "This should be complicated enough to show some things we should show";

// Find the characters at the indexes given
System.out.println(example.charAt(0));
System.out.println(example.charAt(5));

// An StringIndexOutOfBoundsException is thrown in both these cases:
// System.out.println(example.charAt(-1));
// System.out.println(example.charAt(200));

// Find the index of characters or substrings
System.out.println(example.indexOf('s')); // returns the first occurence of 's'
System.out.println(example.indexOf('s', 4)); // the first 's' after index 4
System.out.println(example.indexOf("should")); // the index of the first "should" in our string
System.out.println(example.indexOf("should", 15)); // the index of the first "should" in our
// string _after_ index 15

// Find the last index of characters or substrings
System.out.println(example.lastIndexOf('s')); // returns the first occurence of 's' when we look backwards from the end of the string
System.out.println(example.lastIndexOf('s', 45)); // searches for 's' backwards from the position 45
System.out.println(example.lastIndexOf("should")); // returns the position at which the substring 'should' appears, looking backwards from the end of the string
System.out.println(example.lastIndexOf("should", 20)); // finds substring 'should' from position 20 backwards, and returns the position at which it begins


This will output the following:

T
s
3
5
5
57
64
42
57
5


Note: indexOf(int ch, int fromIndex) is often used in loops, when we want to do something for every occurrence of a character in a String.

int foundAt = -1;
String example = "This should be complicated enough to show some things we should show";
while (true) {
foundAt = example.indexOf('s', foundAt + 1);
if (foundAt == -1)
break;
else {
// do something with that information
}
}


### Comparing Strings

The compareTo() method lexicographically compares our String with another. The actual comparison of the two strings is based on the Unicode value of each character in the string. The method returns either a positive number, a negative number, or 0.

If all characters in our string were all lower case (or all uppercase) letters, the return value of the compareTo() method can be interpreted as "if the return value was negative, my string would come before the other string in a dictionary".

I emphasize the point that the letters would need to be in the same case, since the function might produce unexpected output otherwise.

The compareTo() method doesn't go through all the characters in our strings, it returns as soon as it reaches the end of any of the strings, or as soon as it finds a non-matching character. In which case the function returns (Unicode value of the mismatched character in our string) - (Unicode value of the mismatched character in the given string).

For anyone that's curious - ASCII is a part of Unicode. Which means that a-z and A-Z are in the same order as in ASCII encoding, i.e. they're all one after the other in their respective cases. Namely, a-z are codes between 97-122 and A-Z is 65-90. So the value for 'a' is 97, the value for 'b' is 98 and so on. This way, when we subtract the Unicode value for 'b' from 'a', we get -1. Meaning that 'a' is one letter before 'b', which it is.

System.out.println("a".compareTo("a"));
System.out.println("a".compareTo("b"));
System.out.println("1".compareTo("12345678"));
System.out.println("2".compareTo("12345678"));
System.out.println("abcd".compareTo("abgggggggggg"));

0
-1
-7
1
-4


On the third line of the code above, in this case compareTo returns the difference in string lengths, since it didn't find a mismatched character before it "ran out" of characters in one string.

And in the last line we see -4 is printed because of 'c' - 'g', since that's the first mismatch it found, and it doesn't care about the rest.

Note: The "unexpected" part when using compareTo() happens when we compare strings with different cases.

System.out.println("ORANGE".compareTo("apple"));


We might expect the method to return a positive value, since "apple" should come before "ORANGE". However, the Unicode value for 'O' is less than the Unicode value for 'a'.

This might sometimes be preferred behavior, but in case it isn't - we use compareToIgnoreCase(). That method does essentially the same thing as compareTo(), it just pretends that everything is in the same case, and gives us a "proper" dictionary order.

Note: compareTo() and compareToIgnoreCase() are often used when we make a Comparator for a custom class.

For example, let's say we have a Person object like the following:

class Person {
String firstName;
String lastName;
// ...
}


Now let's say we have an ArrayList called "people" of many Person objects, in no particular order. We'd like to sort that ArrayList so that they are ordered in lexicographical order based on their last name, and if people have the same last name, we'd like to sort them based on their first name.

Comparator<Person> personComparator = new Comparator<Person>() {
@Override
public int compare(Person p1, Person p2) {
if (p1.firstName.compareTo(p2.firstName) != 0) {
return p1.firstName.compareTo(p2.firstName);
}
else return p1.lastName.compareTo(p2.lastName);
}
};
Collections.sort(people, personComparator);


### Extracting Substrings

A "substring" is a subset of (or part of) another string. The substring() method returns a new string that is a substring of the string we use the method on.

In other words, if we wanted a new string containing the first three characters of our string, we'd use ourString.substring(0, 3).

The substring() method has two variations:

• substring(int startIndex) returns a String containing all the characters from startIndex (inclusive) to the end of our String. It behaves the same as substring(int startIndex, ourString.length()).
• substring(int startIndex, int endIndex) returns a String containing all the characters from startIndex (inclusive) to endIndex (exclusive, i.e. the character at endIndex isn't returned)

Note: The given indices must still be in the interval [0, ourString.length()-1]. Java, unlike some other languages, does NOT support negative indices in the substring() method! Java will throw a StringIndexOutOfBoundsException for any of the following reasons:

• startIndex is negative
• endIndex is larger than the length of our String object
• startIndex is larger than endIndex

Although the documentation doesn't explicitly say that "no negative values are allowed at all" (one might have the habit of giving -1 as the endIndex from other programming languages), that rule can be derived from that fact that startIndex can't be negative, and that endIndex has to be larger than startIndex.

However, Java just makes us take the extra step of writing ourString.length() - someNumber as endIndex instead of just - someNumber.

String ourString = "abcdef";
System.out.println(ourString.substring(0,3));
System.out.println(ourString.substring(2));
System.out.println(ourString.substring(1,3));

// If we want the last few characters
System.out.println(ourString.substring(ourString.length()-3));


## Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

abc
cdef
bc
def


### Changing String Case

These two simple methods are used to change the case of characters within a string.

• toLowerCase(): changes all upper case characters to lower case (ignores everything else)
• toUpperCase(): changes all lower case characters to upper case (ignores everything else)
String ourString = "ThInK oF a ClEvEr StRiNg";

System.out.println(ourString.toLowerCase());
System.out.println(ourString.toUpperCase());
System.out.println(ourString);


This will ouptput the following:

think of a clever string
THINK OF A CLEVER STRING
ThInK oF a ClEvEr StRiNg


Notice that the initial String object itself is unchanged.

### Removing Whitespace

This method returns a copy of the initial String object in which any leading and trailing whitespace (spaces, tabs, newlines) is removed.

String ourString = "      Any non-leading and non-trailing whitespace is  \n  preserved       ";
System.out.println(ourString.trim());


Output:

Any non-leading and non-trailing whitespace is
preserved


trim() is often used when processing user input, since it makes sure that we have no useless whitespace and doesn't change the string if we don't.

A very common use of trim() with user input is checking whether any non-whitespace characters were entered at all:

// Usually we check for empty inputs like this:
if (userinput.isEmpty()) { ... }
// ...or the equivalent
if (userinput.length() != 0) { ... }

// But a better way to check would be this, which
// handles cases where the user entered only
// whitespace (i.e. "    ")
if (userinput.trim().isEmpty()) { ... }


### Formatting Strings

The format() method returns a formatted string with a given format and arguments. It's used to make life simple when formatting complex strings in Java. It works similarly to printf in C:

public static String format(String form, Object... args)


This method declaration might seem complicated but let's take a closer look at it:

• For our purposes, the static part means that this method is called through the String class, and not through an object of the String class. Meaning that when we want to use this method we'd write String.format(...) and not ourString.format(...). We can call the method the second way, but ourString won't play a part in the method anyway.
• The ... (three dots) after Object just says that a variable number of arguments can be passed here. One or two or fifty, it all depends on the String form.

int a = 2;
int b = 3;
int c = 4;
int d = 1;

// %d indicates we want to print an integer
System.out.println(String.format("%d", a));

2


The format() method goes through the form string and looks for special characters and replaces them with arguments in args.

Special characters start with a %. In our example, we used %d, which Java understands as "I'll try and parse the provided argument in args as an integer".

A slightly more insightful example of when format() is useful:

// Very messy, hard to read, and hard to maintain
System.out.println("a = " + a + "\n" + "b = " + b + "\n" + "c = " + c + "\n" + "d = " + d + "\n");

// Much prettier
System.out.println(String.format("a = %d \nb = %d \nc = %d \nd = %d", a, b, c, d));


As we can see in this example, Java matches the special characters beginning with % with the arguments in order. Meaning that when it sees the first %d it will match it to a, the second %d to b and so on.

There are a lot of special characters for format() and you can find the full list in the docs (including a whole bunch of date/time options), but the ones you'll most commonly see and use are:

• %d: integral types (byte, short, int, long, BigInteger)
• %s: Strings
• %f: for float as a decimal number, %e formatted as a decimal number in computerized scientific notation, and %g prints either the same as %f or %e depending on the precision value after rounding.
• %b: for Boolean values. If the value is null, "false" is printed

Generally speaking, the format() method has a seemingly complicated syntax:

## Make Clarity from Data - Quickly Learn Data Visualization with Python

Learn the landscape of Data Visualization tools in Python - work with Seaborn, Plotly, and Bokeh, and excel in Matplotlib!

From simple plot types to ridge plots, surface plots and spectrograms - understand your data and learn to draw conclusions from it.