How to Use the grep Command on GNU/Linux

Introduction

Grep is a powerful, yet very simple tool. By default, it searches through an input and prints a single or multiple lines that contain text matched to a pattern specified in the command call.

Before grep became such a widespread tool for the GNU/Linux system, it used to be a private utility written by Ken Thompson for searching through files. The interesting part of the story is that his manager approached him, asking for a tool that does exactly that.

He responded that he'll think of something overnight - while he actually used that time to improve the code and fix some bugs. When he presented the tool the next day, it really did seem like it was written in no time.

In this article, we will learn the basics of grep and its usage by running through its options and some examples.

Grep Usage and Variants

The general form of the grep command is:

$ grep [OPTION...] [PATTERNS] [FILE...]

We can specify zero or more OPTION arguments, one or more PATTERNS and zero or more FILE arguments. If there is no FILE argument specified, grep searches through the working directory (.), if the recursion option is given; otherwise grep searches the standard input pipe.

There are four major variants of grep. Depending on which suits our needs best, we pick the one we'll use by specifying the OPTION argument:

  • -G or --basic-regexp - When used, it interprets the pattern as a Basic Regular Expression (BRE). This variant is used by default (if no other options are specified).
  • -E or --extended-regexp - Interprets the pattern as an Extended Regular Expression (ERE).
  • -F or --fixed-strings - Interprets the pattern as fixed strings, not regular expressions.
  • -P or --perl-regexp - Interprets the pattern as Perl-Compatible Regular Expressions (PCREs). This variant still has some unimplemented features and might produce warnings. It should be considered rather experimental when used with certain options and is recommended only for advanced users.

What's worth noting is that nowadays, grep is a family of tools, which includes egrep, fgrep and rgrep. They are the same as grep -E, grep -F and grep -R respectively, but are deprecated as standalone tools and only provided because some software still relies on them.

In our examples, we'll use the second variant, though, the examples in the proceeding sections should apply to other variants as well.

Searching Through a File

Say we have a test.txt file, with the following contents:

hello 
hElLo
This line does not include the word we're looking for. 
helloHello
  This is the paragraph that has multiple sentences. We'll put one more hello here.
Test line.
Another hello line.

We'd like to find all lines that contain the word hello. It doesn't matter where the word is located in the line, nor if it's part of a longer word like helloHello.

We'll use grep -E with the hello pattern, on the test.txt file:

$ grep -E hello test.txt

Running this command would yield:

hello 
helloHello
  This is the paragraph that has multiple sentences. We'll put one more hello here.
Another hello line.

Note: It's common practice to put the pattern under quotation marks to visually separate it as a pattern. This also allows us to put multiple words as the search term, instead of just one.

Searching for Multiple Words

Sometimes, we'd like to search for a couple of words instead of one. This is done by simply including the search terms within quotation marks:

$ grep -E "This is" test.txt

Running this would result in:

This is the paragraph that has multiple sentences. We'll put one more hello here.

Keep in mind that grep is case-sensitive. If we had searched for "this is" instead, nothing would return.

Using Regular Expressions

Now, let's use a regular expression to single out the word hello and omit results like helloHello:

$ grep -E "(\s|[^a-zA-Z0-9_]*)hello\s" test.txt

Running this command will result in:

hello 
This is the paragraph that has multiple sentences. We'll put one more hello here.
Another hello line.

Specifying Matching Rules with Flags

There are a few options that help us specify the matching rules more easily:

  • -e pattern

This flag means that the proceeding string should be interpreted as a pattern. By default, if you have one pattern, there's no need to flag it with -e. If you have multiple patterns, you'll have to flag them all.

For example, we can search for multiple patterns like so:

$ grep -E -e "hello" -e "the" test.txt

This command will result in:

hello 
This line does not include the word we're looking for. 
helloHello
This is the paragraph that has multiple sentences. We'll put one more hello here.
Another hello line.
  • -f file

This flag allows us to obtain patterns from a file, one per line. When we're dealing with many patterns, it's easier to situate them in a file rather than having them all within the command-line. Each pattern present in the file is taken into account:

$ grep -E -f patterns.txt test.txt

This is how our patterns.txt file looks like:

^helloHello$
*line*

Running the command will result in:

This line does not include the word we're looking for. 
helloHello
Test line.
Another hello line.
  • -i or --ignore-case

This flag overrides the default case-sensitive behavior and returns all matching patterns, regardless of the case:

$ grep -E -i "hello" test.txt

Running this command will result in:

hello 
hElLo
helloHello
This is the paragraph that has multiple sentences. We'll put one more hello here.
Another hello line.

This time around, even though our search term was "hello", other relevant lines were returned, such as helLo.

Note: -y is an obsolete version of -i that does the same thing, but it's kept only for backwards compatibility.

  • -v or --invert-match

Inverts the match - returns the lines that do not match our pattern:

$ grep -E -v "hello" test.txt

Running this command would result in:

hElLo
This line does not include the word we're looking for. 
Test line.
  • -w or --word-regexp

Searches for "standalone" words - before and after which you'd encounter a space, newline, tab, etc. If they're at the beginning or end of the line, the preceding and proceeding non-word constituents such as numbers, characters or underscores don't matter. It's a shorthand convenience flag to avoid writing the regex from before:

$ grep -E -w "hello" test.txt

The output will be the same as if we've simply input the regular expression for standalone words:

hello 
This is the paragraph that has multiple sentences. We'll put one more hello here.
Another hello line.
  • -x or --line-regexp

Searches for lines that match the entire pattern. That is to say, if the pattern and the entire line are a match, the line is returned:

$ grep -E -x "helloHello" test.txt 

It's a convenience flag that allows us to skip writing the regex:

$ grep -E "^helloHello$" test.txt 

These will have the same output - in our case a single line that contains exactly what we specified:

helloHello

Controlling Output

Sometimes, the resulting information can be cluttering. If we're working with big files or if we don't want to fully visually see all the results, we'd want to control the output and change the behavior. Fortunately, grep supports a number of flags and options to do just that:

  • -c or --count

Suppresses the output and returns the number of matched lines:

$ grep -E -c "hello" test.txt 

This command will result in a single number:

4
  • -l

Suppresses the basic output and only shows the names of the files that have matching lines. For the sake of this example, we'll add 4 more files in our directory. Two of them match, and the other two don't match the search term:

$ grep -E -l "hello" *.txt

The last string (*.txt) in our command tells the program to search all of the files in the current directory that have a .txt extension. We could also list them one by one, but this way is more neat.

This command would result in:

01_contains_hello.txt
02_contains_hello.txt
test.txt

By contrast, the -L option shows the names of the files that do not have matchings.

  • -m num

Stops the search after num found lines. When searching through multiple files, this number does not carry over between files, and the count begins anew for each one:

$ grep -E -m 2 "hello" test.txt

This command would result in:

hello 
helloHello
  • -h

Suppresses the names of the files in which matchings were found, given that we search multiple files. Let's search regularly first:

$ grep -E -m 1 "hello" *.txt

This results in:

01_contains_hello.txt:hello
02_contains_hello.txt:hello
test.txt:hello 

However, when we add the -h flag:

$ grep -E -h -m 1 "hello" *.txt

The output is:

hello
hello
hello
  • -n

Shows the line number for each matched line:

$ grep -E -n "hello" test.txt

This results in:

1:hello 
4:helloHello
5: This is the paragraph that has multiple sentences. We'll put one more hello here.
7:Another hello line.

Practical Usage

Let's go over some examples of practical usage of the grep command, having the previous sections in mind.

Search by File Extension using Grep

Grep can search through any given input. That input can be the standard input pipe, a specified file, or even an output of another previously executed program.

Even though there are many ways to list the files by an extension, grep can make that easier for us.

Here we'll use grep to filter the output of another tool that recursively lists all files in the current working directory. We'll try to find all the files with a .txt extension and sort them alphanumerically while also ignoring case differs:

$ find . | sort -f | grep -E "*.txt"

Note that in this command we used |, or the so-called pipe. We used it to redirect the output of one program to the input of another, instead of printing it on standard output like we normally do.

Let's suppose that we have the following files in our current working directory:

./some_file1.txt
./05_contains_hello.txt
./subfolder
./subfolder/py_code.py
./subfolder/some_file.txt
./c_code.c
./markdown_file.md
./some_file2.txt

The output of our command will be:

./05_contains_hello.txt
./some_file1.txt
./some_file2.txt
./subfolder/some_file.txt

Processing File Contents using Grep

We'll use this chance to showcase another neat grep option. Let's suppose that we have a project directory with a huge number of files with code in them and no access to a fancy IDE.

We want to refactor some function, so it'll be very useful to us to find all of its usages beforehand, lest we break the entire codebase.

In our solution, we'll use -r, a grep option for recursive search of all files contained in the current working directory, and all of its subdirectories:

$ grep -E -rn "osCreateMemoryBlock\([^)]*(\s*|\))"

Note that using an intermediate regular expression helps us be more precise about our search: we've even taken into account cases where arguments of the function are written in new lines.

Here, we've combined two options - -r and -n into a single -rn. This is because we'll want to know on which lines the changes need to be made.

Here's what the working directory looks like for this command:

./subfolder2
./subfolder2/c_code.c
./subfolder2/memory_admin.c
./log_client.c
./signals.c
./log_server.c
./fifo_server.c
./fifo_client.c
./skelet.c
./subfolder1
./subfolder1/memory_handler.c
./subfolder1/random_code1.c
./subfolder1/c_code.c
./subfolder1/memory_creater.c
./shared_memory_writer.c
./shared_memory_reader.c

Running the command will output the relative path of each file and the number of each line that contains osCreateMemoryBlock:

subfolder2/c_code.c:47:void *osCreateMemoryBlock(const char* filePath, unsigned size);
subfolder2/c_code.c:116:void *osCreateMemoryBlock(const char* filePath,
subfolder2/memory_admin.c:54:   osMemoryBlock* pMsgBuf = osCreateMemoryBlock(argv[1], sizeof(osMemoryBlock));
subfolder2/memory_admin.c:116:void *osCreateMemoryBlock(const char* filePath,
log_server.c:116:void *osCreateMemoryBlock(const char* filePath,
subfolder1/memory_handler.c:54: osMemoryBlock* pMsgBuf = osCreateMemoryBlock(argv[1], sizeof(osMemoryBlock));
subfolder1/memory_handler.c:116:void *osCreateMemoryBlock(const char* filePath,
subfolder1/memory_creater.c:47:void *osCreateMemoryBlock(const char* filePath, unsigned size);
subfolder1/memory_creater.c:54: osMemoryBlock* pMsgBuf = osCreateMemoryBlock(argv[1], sizeof(osMemoryBlock));
shared_memory_writer.c:33:      int *niz = osCreateMemoryBlock(argv[1], size);

Extracting Log File Data using Grep

Say you noticed that your machine hangs too long before shutting down, but you're unsure as to why. You might want to check your system log, though it contains millions of lines. If we have an approximate idea of when this happened, grep can help us find this information easily:

$ grep -E -n "May 27 19:2[0-1]:[0-9]{2} *" /var/log/syslog

This will output everything written in the system's logs during the period we've specified. It'll also print out the numbers of the lines for our convenience if we'd like to explore the full file later on.

Naturally, this log file is different for everyone, but could look something along the lines of:

1005:May 27 19:21:59 machine kernel: [236041.840122] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 2918)
1006:May 27 19:21:59 machine kernel: [236041.840125] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 3753)
1014:May 27 19:21:59 machine kernel: [236041.841137] mce: CPU4: Core temperature/speed normal
1016:May 27 19:21:59 machine kernel: [236041.841139] mce: CPU4: Package temperature/speed normal

This method can be applied to pretty much any type of log file - just make sure to check its format beforehand, and then proceed to construct an accurate regular expression.

Conclusion

In this article we covered the basics of one of the most famous command-line tools available. It's a very efficient way to search through any text or extract data from it.

Through examples of its usage in real life, it is evident that grep can do wonders in many fields of interest when combined with some other command-line tools and advanced knowledge on constructing regular expressions.

Author image
Belgrade