Throughout your programming career you'll find that there are quite a few times you need to extract a substring from another string. Strings are one of the most common data structures, so this comes up often. I bet you could tell me how to do it in your favorite programming language, but what if you had to do it in Bash? It's not as obvious in Bash as it is in other languages, so we'll explain how to do it a few different ways in this article.
If you have experience with Unix-based operating systems then you probably already know about the Bash shell. But if you don’t, here is a quick explanation. Essentially it’s a command shell that was initially written for the GNU project as a replacement for the Bourne shell. Many developers use the Bash shell as an interface to write code or interact with their operating system's file system, as well as to execute other commands. Therefore, it’s always helpful to know how to perform a certain task for it in case you use it often or need to write a shell script.
Using the cut
Command
Getting a substring from the original string using the terminal isn't too difficult thanks to a built-in command meant for this purpose specifically. It works well for use either directly on the command line or for use within a shell script (a .sh
file). The command I'm referring to is cut
, which is a Bash command that takes a string and a few flags (-cN-M
) as inputs and then outputs the resulting substring. Here is one example of the format of the command:
$ echo "STRING" | cut -cN-M
When you plug in the variables (both the string and the flags), Bash will return to you the characters in the string starting from index N
and ending at M
(with the characters at indexes N
and M
both included).
Let’s try out a few examples. In the following one we use the string "abcdefghi" and extract a substrings from it below:
$ echo "abcdefghi" | cut -c2-6
bcdef
When you run the above command in a terminal, you will get "bcdef" as the result.
Specifying the character index isn't the only way to extract a substring. You can also use the -d
and -f
flags to extract a string by specifying characters to split on. The -d
flag lets you specify the delimiter to split on while -f
lets you choose which substring of the split to choose. Keep in mind that the cut
command is not 0-indexed, so the first item in the list starts at 1.
$ echo "STRING" | cut -d'C' -f I
In the example above, C
is the character to split on and I
is the index to choose.
Given this, let’s try another example. Suppose you have to extract a series of digits from the name of a directory. The format of the directory name might be something like "birthday-091216-pics". In this example, there are a few characters before the digits we care about and a couple of characters after them. There are also dashes placed in between as indicated. We can easily tackle this problem with the cut
command using the previous syntax we just introduced. Here’s how:
$ echo "birthday-091216-pics" | cut -d'-' -f 2
091216
This splits the string into an array (["birthday", "091216", "pics"]), and then picks an item from that array to return (the 2nd item).
If instead you need to use this in a shell script your code may look something like this:
STR="birthday-091216-pics"
SUBSTR=$(echo $STR | cut -d'-' -f 2)
echo $SUBSTR
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
When you run the commands above, you get "091216" as the output, just as before.
Using the Bash Substring Syntax
Another way to extract substrings in a shell script is to use a Bash variable with the substring syntax. The syntax looks like this:
string=YOUR-STRING
echo ${string:P}
echo ${string:P:L}
Here P
is a number that indicates the starting index of the substring and L
is the length of the substring. If you omit the L
parameter then the rest of the string is returned, starting from position P
.
This is different from before with the cut
command where we gave the starting and ending indexes. In this case we have to give the starting index and length instead (or no length at all).
This way is usually more preferred for shell scripts since it's syntactically more compact and easier to read. However, it doesn't work well when used directly from the command line, in which case you'd probably then prefer cut
.
Conclusion
There are quite a few ways to get a substring in Bash, a few of which we discussed here. You can use either the cut
function or the Bash substring syntax to extract strings according to your needs. To learn more about the cut
function specifically (which can also be used on files), check out its Wikipedia page here.