Introduction
Grep is a powerful, yet very simple tool. By default, it searches through an input and prints a single or multiple lines that contain text matched to a pattern specified in the command call.
Before grep
became such a widespread tool for the GNU/Linux system, it used to be a private utility written by Ken Thompson for searching through files. The interesting part of the story is that his manager approached him, asking for a tool that does exactly that.
He responded that he'll think of something overnight - while he actually used that time to improve the code and fix some bugs. When he presented the tool the next day, it really did seem like it was written in no time.
In this article, we will learn the basics of grep
and its usage by running through its options and some examples.
Grep Usage and Variants
The general form of the grep
command is:
$ grep [OPTION...] [PATTERNS] [FILE...]
We can specify zero or more OPTION arguments, one or more PATTERNS and zero or more FILE arguments. If there is no FILE argument specified, grep
searches through the working directory (.
), if the recursion option is given; otherwise grep
searches the standard input pipe.
There are four major variants of grep
. Depending on which suits our needs best, we pick the one we'll use by specifying the OPTION argument:
- -G or --basic-regexp - When used, it interprets the pattern as a Basic Regular Expression (BRE). This variant is used by default (if no other options are specified).
- -E or --extended-regexp - Interprets the pattern as an Extended Regular Expression (ERE).
- -F or --fixed-strings - Interprets the pattern as fixed strings, not regular expressions.
- -P or --perl-regexp - Interprets the pattern as Perl-Compatible Regular Expressions (PCREs). This variant still has some unimplemented features and might produce warnings. It should be considered rather experimental when used with certain options and is recommended only for advanced users.
What's worth noting is that nowadays, grep
is a family of tools, which includes egrep
, fgrep
and rgrep
. They are the same as grep -E
, grep -F
and grep -R
respectively, but are deprecated as standalone tools and only provided because some software still relies on them.
In our examples, we'll use the second variant, though, the examples in the proceeding sections should apply to other variants as well.
Searching Through a File
Say we have a test.txt
file, with the following contents:
hello
hElLo
This line does not include the word we're looking for.
helloHello
This is the paragraph that has multiple sentences. We'll put one more hello here.
Test line.
Another hello line.
We'd like to find all lines that contain the word hello
. It doesn't matter where the word is located in the line, nor if it's part of a longer word like helloHello
.
We'll use grep -E
with the hello
pattern, on the test.txt
file:
$ grep -E hello test.txt
Running this command would yield:
hello
helloHello
This is the paragraph that has multiple sentences. We'll put one more hello here.
Another hello line.
Note: It's common practice to put the pattern under quotation marks to visually separate it as a pattern. This also allows us to put multiple words as the search term, instead of just one.
Searching for Multiple Words
Sometimes, we'd like to search for a couple of words instead of one. This is done by simply including the search terms within quotation marks:
$ grep -E "This is" test.txt
Running this would result in:
This is the paragraph that has multiple sentences. We'll put one more hello here.
Keep in mind that grep
is case-sensitive. If we had searched for "this is"
instead, nothing would return.
Using Regular Expressions
Now, let's use a regular expression to single out the word hello
and omit results like helloHello
:
$ grep -E "(\s|[^a-zA-Z0-9_]*)hello\s" test.txt
Running this command will result in:
hello
This is the paragraph that has multiple sentences. We'll put one more hello here.
Another hello line.
Specifying Matching Rules with Flags
There are a few options that help us specify the matching rules more easily:
- -e pattern
This flag means that the proceeding string should be interpreted as a pattern. By default, if you have one pattern, there's no need to flag it with -e
. If you have multiple patterns, you'll have to flag them all.
For example, we can search for multiple patterns like so:
$ grep -E -e "hello" -e "the" test.txt
This command will result in:
hello
This line does not include the word we're looking for.
helloHello
This is the paragraph that has multiple sentences. We'll put one more hello here.
Another hello line.
- -f file
This flag allows us to obtain patterns from a file, one per line. When we're dealing with many patterns, it's easier to situate them in a file rather than having them all within the command-line. Each pattern present in the file is taken into account:
$ grep -E -f patterns.txt test.txt
This is how our patterns.txt
file looks like:
^helloHello$
*line*
Running the command will result in:
This line does not include the word we're looking for.
helloHello
Test line.
Another hello line.
- -i or --ignore-case
This flag overrides the default case-sensitive behavior and returns all matching patterns, regardless of the case:
$ grep -E -i "hello" test.txt
Running this command will result in:
hello
hElLo
helloHello
This is the paragraph that has multiple sentences. We'll put one more hello here.
Another hello line.
This time around, even though our search term was "hello"
, other relevant lines were returned, such as helLo
.
Note: -y
is an obsolete version of -i
that does the same thing, but it's kept only for backwards compatibility.
- -v or --invert-match
Inverts the match - returns the lines that do not match our pattern:
$ grep -E -v "hello" test.txt
Running this command would result in:
hElLo
This line does not include the word we're looking for.
Test line.
- -w or --word-regexp
Searches for "standalone" words - before and after which you'd encounter a space, newline, tab, etc. If they're at the beginning or end of the line, the preceding and proceeding non-word constituents such as numbers, characters or underscores don't matter. It's a shorthand convenience flag to avoid writing the regex from before:
$ grep -E -w "hello" test.txt
The output will be the same as if we've simply input the regular expression for standalone words:
hello
This is the paragraph that has multiple sentences. We'll put one more hello here.
Another hello line.
- -x or --line-regexp
Searches for lines that match the entire pattern. That is to say, if the pattern and the entire line are a match, the line is returned:
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
$ grep -E -x "helloHello" test.txt
It's a convenience flag that allows us to skip writing the regex:
$ grep -E "^helloHello$" test.txt
These will have the same output - in our case a single line that contains exactly what we specified:
helloHello
Controlling Output
Sometimes, the resulting information can be cluttering. If we're working with big files or if we don't want to fully visually see all the results, we'd want to control the output and change the behavior. Fortunately, grep
supports a number of flags and options to do just that:
- -c or --count
Suppresses the output and returns the number of matched lines:
$ grep -E -c "hello" test.txt
This command will result in a single number:
4
- -l
Suppresses the basic output and only shows the names of the files that have matching lines. For the sake of this example, we'll add 4 more files in our directory. Two of them match, and the other two don't match the search term:
$ grep -E -l "hello" *.txt
The last string (*.txt
) in our command tells the program to search all of the files in the current directory that have a .txt
extension. We could also list them one by one, but this way is more neat.
This command would result in:
01_contains_hello.txt
02_contains_hello.txt
test.txt
By contrast, the -L option shows the names of the files that do not have matchings.
- -m num
Stops the search after num
found lines. When searching through multiple files, this number does not carry over between files, and the count begins anew for each one:
$ grep -E -m 2 "hello" test.txt
This command would result in:
hello
helloHello
- -h
Suppresses the names of the files in which matchings were found, given that we search multiple files. Let's search regularly first:
$ grep -E -m 1 "hello" *.txt
This results in:
01_contains_hello.txt:hello
02_contains_hello.txt:hello
test.txt:hello
However, when we add the -h
flag:
$ grep -E -h -m 1 "hello" *.txt
The output is:
hello
hello
hello
- -n
Shows the line number for each matched line:
$ grep -E -n "hello" test.txt
This results in:
1:hello
4:helloHello
5: This is the paragraph that has multiple sentences. We'll put one more hello here.
7:Another hello line.
Practical Usage
Let's go over some examples of practical usage of the grep
command, having the previous sections in mind.
Search by File Extension using Grep
Grep can search through any given input. That input can be the standard input pipe, a specified file, or even an output of another previously executed program.
Even though there are many ways to list the files by an extension, grep
can make that easier for us.
Here we'll use grep
to filter the output of another tool that recursively lists all files in the current working directory. We'll try to find all the files with a .txt
extension and sort them alphanumerically while also ignoring case differs:
$ find . | sort -f | grep -E "*.txt"
Note that in this command we used |
, or the so-called pipe. We used it to redirect the output of one program to the input of another, instead of printing it on standard output like we normally do.
Let's suppose that we have the following files in our current working directory:
./some_file1.txt
./05_contains_hello.txt
./subfolder
./subfolder/py_code.py
./subfolder/some_file.txt
./c_code.c
./markdown_file.md
./some_file2.txt
The output of our command will be:
./05_contains_hello.txt
./some_file1.txt
./some_file2.txt
./subfolder/some_file.txt
Processing File Contents using Grep
We'll use this chance to showcase another neat grep
option. Let's suppose that we have a project directory with a huge number of files with code in them and no access to a fancy IDE.
We want to refactor some function, so it'll be very useful to us to find all of its usages beforehand, lest we break the entire codebase.
In our solution, we'll use -r
, a grep
option for recursive search of all files contained in the current working directory, and all of its subdirectories:
$ grep -E -rn "osCreateMemoryBlock\([^)]*(\s*|\))"
Note that using an intermediate regular expression helps us be more precise about our search: we've even taken into account cases where arguments of the function are written in new lines.
Here, we've combined two options - -r
and -n
into a single -rn
. This is because we'll want to know on which lines the changes need to be made.
Here's what the working directory looks like for this command:
./subfolder2
./subfolder2/c_code.c
./subfolder2/memory_admin.c
./log_client.c
./signals.c
./log_server.c
./fifo_server.c
./fifo_client.c
./skelet.c
./subfolder1
./subfolder1/memory_handler.c
./subfolder1/random_code1.c
./subfolder1/c_code.c
./subfolder1/memory_creater.c
./shared_memory_writer.c
./shared_memory_reader.c
Running the command will output the relative path of each file and the number of each line that contains osCreateMemoryBlock
:
subfolder2/c_code.c:47:void *osCreateMemoryBlock(const char* filePath, unsigned size);
subfolder2/c_code.c:116:void *osCreateMemoryBlock(const char* filePath,
subfolder2/memory_admin.c:54: osMemoryBlock* pMsgBuf = osCreateMemoryBlock(argv[1], sizeof(osMemoryBlock));
subfolder2/memory_admin.c:116:void *osCreateMemoryBlock(const char* filePath,
log_server.c:116:void *osCreateMemoryBlock(const char* filePath,
subfolder1/memory_handler.c:54: osMemoryBlock* pMsgBuf = osCreateMemoryBlock(argv[1], sizeof(osMemoryBlock));
subfolder1/memory_handler.c:116:void *osCreateMemoryBlock(const char* filePath,
subfolder1/memory_creater.c:47:void *osCreateMemoryBlock(const char* filePath, unsigned size);
subfolder1/memory_creater.c:54: osMemoryBlock* pMsgBuf = osCreateMemoryBlock(argv[1], sizeof(osMemoryBlock));
shared_memory_writer.c:33: int *niz = osCreateMemoryBlock(argv[1], size);
Extracting Log File Data using Grep
Say you noticed that your machine hangs too long before shutting down, but you're unsure as to why. You might want to check your system log, though it contains millions of lines. If we have an approximate idea of when this happened, grep
can help us find this information easily:
$ grep -E -n "May 27 19:2[0-1]:[0-9]{2} *" /var/log/syslog
This will output everything written in the system's logs during the period we've specified. It'll also print out the numbers of the lines for our convenience if we'd like to explore the full file later on.
Naturally, this log file is different for everyone, but could look something along the lines of:
1005:May 27 19:21:59 machine kernel: [236041.840122] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 2918)
1006:May 27 19:21:59 machine kernel: [236041.840125] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 3753)
1014:May 27 19:21:59 machine kernel: [236041.841137] mce: CPU4: Core temperature/speed normal
1016:May 27 19:21:59 machine kernel: [236041.841139] mce: CPU4: Package temperature/speed normal
This method can be applied to pretty much any type of log file - just make sure to check its format beforehand, and then proceed to construct an accurate regular expression.
Conclusion
In this article we covered the basics of one of the most famous command-line tools available. It's a very efficient way to search through any text or extract data from it.
Through examples of its usage in real life, it is evident that grep
can do wonders in many fields of interest when combined with some other command-line tools and advanced knowledge on constructing regular expressions.