GNU grep
Table of Contents
- Simple string search
- Case insensitive search
- Invert matching lines
- Line number, count and limiting output lines
- Multiple search strings
- File names in output
- Match whole word or line
- Colored output
- Get only matching portion
- Context matching
- Recursive search
- Search strings from file
- Options for scripting purposes
- Regular Expressions - BRE/ERE
- Multiline matching
- Perl Compatible Regular Expressions
- Gotchas and Tips
- Regular Expressions Reference (ERE)
- Further Reading
This chapter has also been converted to a book with additional examples, exercises and covers popular alternative ripgrep
$ grep -V | head -1
grep (GNU grep) 2.25
$ man grep
GREP(1) General Commands Manual GREP(1)
NAME
grep, egrep, fgrep, rgrep - print lines matching a pattern
SYNOPSIS
grep [OPTIONS] PATTERN [FILE...]
grep [OPTIONS] [-e PATTERN]... [-f FILE]... [FILE...]
DESCRIPTION
grep searches the named input FILEs for lines containing a match to the
given PATTERN. If no files are specified, or if the file “-” is given,
grep searches standard input. By default, grep prints the matching
lines.
In addition, the variant programs egrep, fgrep and rgrep are the same
as grep -E, grep -F, and grep -r, respectively. These variants are
deprecated, but are provided for backward compatibility.
...
Note For more detailed documentation and examples, use info grep
Simple string search
- First specify the search pattern (usually enclosed in single quotes) and then the file input
- More than one file can be specified or input given from stdin
$ cat poem.txt
Roses are red,
Violets are blue,
Sugar is sweet,
And so are you.
$ grep 'are' poem.txt
Roses are red,
Violets are blue,
And so are you.
$ grep 'so are' poem.txt
And so are you.
- If search string contains any regular expression meta characters like
^$\.*[]
(covered later), use the-F
option orfgrep
if available
$ echo 'int a[5]' | grep 'a[5]'
$ echo 'int a[5]' | grep -F 'a[5]'
int a[5]
$ echo 'int a[5]' | fgrep 'a[5]'
int a[5]
- See Gotchas and Tips section if you get strange issues
Case insensitive search
$ grep -i 'rose' poem.txt
Roses are red,
$ grep -i 'and' poem.txt
And so are you.
Invert matching lines
- Use the
-v
option to get lines other than those matching the search string - Tip: Look out for other opposite pairs like
-l -L
,-h -H
, opposites in regular expression, etc
$ grep -v 'are' poem.txt
Sugar is sweet,
$ # example for input from stdin
$ seq 5 | grep -v '3'
1
2
4
5
Line number, count and limiting output lines
- Show line number of matching lines
$ grep -n 'sweet' poem.txt
3:Sugar is sweet,
- Count number of matching lines
$ grep -c 'are' poem.txt
3
- Limit number of matching lines
$ grep -m2 'are' poem.txt
Roses are red,
Violets are blue,
Multiple search strings
- Match any
$ # search blue or you
$ grep -e 'blue' -e 'you' poem.txt
Violets are blue,
And so are you.
If there are lot of search strings, use a file input
Note Be careful to avoid empty lines in the file, it would result in matching all the lines
$ printf 'rose\nsugar\n' > search_strings.txt
$ cat search_strings.txt
rose
sugar
$ # -f option accepts file input with search terms in separate lines
$ grep -if search_strings.txt poem.txt
Roses are red,
Sugar is sweet,
- Match all
$ # match line containing both are & And
$ grep 'are' poem.txt | grep 'And'
And so are you.
File names in output
-l
to get files matching the search-L
to get files not matching the searchgrep
skips the rest of file once a match is found
$ grep -l 'Rose' poem.txt
poem.txt
$ grep -L 'are' poem.txt search_strings.txt
search_strings.txt
- Prefix file name to search results
-h
is default for single file input, no file name prefix in output-H
is default for multiple file input, file name prefix in output
$ grep -h 'Rose' poem.txt
Roses are red,
$ grep -H 'Rose' poem.txt
poem.txt:Roses are red,
$ # -H is default for multiple file input
$ grep -i 'sugar' poem.txt search_strings.txt
poem.txt:Sugar is sweet,
search_strings.txt:sugar
$ grep -ih 'sugar' poem.txt search_strings.txt
Sugar is sweet,
sugar
Match whole word or line
- Word search using
-w
option- word constitutes of alphabets, numbers and underscore character
- This will ensure that given patterns are not surrounded by other word characters
- this is slightly different than using word boundaries in regular expressions
- For example, this helps to distinguish
par
fromspar
,part
, etc
$ printf 'par value\nheir apparent\n' | grep 'par'
par value
heir apparent
$ printf 'par value\nheir apparent\n' | grep -w 'par'
par value
$ printf 'scare\ncart\ncar\nmacaroni\n' | grep -w 'car'
car
- Another useful option is
-x
to match only complete line, not anywhere in the line
$ printf 'see my book list\nmy book\n' | grep 'my book'
see my book list
my book
$ printf 'see my book list\nmy book\n' | grep -x 'my book'
my book
$ printf 'scare\ncart\ncar\nmacaroni\n' | grep -x 'car'
car
Colored output
- Highlight search strings, line numbers, file name, etc in different colors
- Depends on color support in terminal being used
- options to
--color
areauto
when output is redirected (another command, file, etc) the color information won't be passedalways
when output is redirected (another command, file, etc) the color information will also be passednever
explicitly specify no highlighting
$ # can also use grep --color 'blue' as auto is default
$ grep --color=auto 'blue' poem.txt
Violets are blue,
- Sample screenshot
- Example to show difference between
auto
andalways
$ grep --color=auto 'blue' poem.txt > saved_output.txt
$ cat -v saved_output.txt
Violets are blue,
$ grep --color=always 'blue' poem.txt > saved_output.txt
$ cat -v saved_output.txt
Violets are ^[[01;31m^[[Kblue^[[m^[[K,
$ # some commands like 'less' are capable of using the color information
$ grep --color=always 'are' poem.txt | less -R
$ # highlight multiple matching patterns
$ grep --color=always 'are' poem.txt | grep --color 'd'
Roses are red,
And so are you.
Get only matching portion
- The
-o
option to get only matched portion is more useful with regular expressions - Comes in handy if overall number of matches is required, instead of only line wise
$ grep -o 'are' poem.txt
are
are
are
$ # -c only gives count of matching lines
$ grep -c 'e' poem.txt
4
$ grep -co 'e' poem.txt
4
$ # so need another command to get count of all matches
$ grep -o 'e' poem.txt | wc -l
9
Context matching
- The
-A
,-B
and-C
options are useful to get lines after/before/around matching line respectively
$ grep -A1 'blue' poem.txt
Violets are blue,
Sugar is sweet,
$ grep -B1 'blue' poem.txt
Roses are red,
Violets are blue,
$ grep -C1 'blue' poem.txt
Roses are red,
Violets are blue,
Sugar is sweet,
- If there are multiple non-adjacent matching segments, by default
grep
adds a line--
to separate them- non-adjacent here implies that segments are separated by at least one line in input data
$ seq 29 | grep -A1 '3'
3
4
--
13
14
--
23
24
- Use
--no-group-separator
option if the separator line is a hindrance, for example feeding the output ofgrep
to another program
$ seq 29 | grep --no-group-separator -A1 '3'
3
4
13
14
23
24
- Use
--group-separator
to customize the separator
$ seq 29 | grep --group-separator='*****' -A1 '3'
3
4
*****
13
14
*****
23
24
Recursive search
First let's create some more test files
$ mkdir -p test_files/hidden_files
$ printf 'Red\nGreen\nBlue\nBlack\nWhite\n' > test_files/colors.txt
$ printf 'Violet\nIndigo\nBlue\nGreen\nYellow\nOrange\nRed\n' > test_files/vibgyor.txt
$ printf '#!/usr/bin/python3\n\nprint("Hello World")\n' > test_files/hello.py
$ printf 'I like yellow\nWhat about you\n' > test_files/hidden_files/.fav_color.info
From man grep
-r, --recursive
Read all files under each directory, recursively, following
symbolic links only if they are on the command line. Note that
if no file operand is given, grep searches the working
directory. This is equivalent to the -d recurse option.
-R, --dereference-recursive
Read all files under each directory, recursively. Follow all
symbolic links, unlike -r.
Basic recursive search
- Note that
-H
option automatically activates for multiple file input
$ # by default, current working directory is searched
$ grep -r 'red'
poem.txt:Roses are red,
$ grep -ri 'red'
poem.txt:Roses are red,
test_files/colors.txt:Red
test_files/vibgyor.txt:Red
$ grep -rin 'red'
poem.txt:1:Roses are red,
test_files/colors.txt:1:Red
test_files/vibgyor.txt:7:Red
$ grep -ril 'red'
poem.txt
test_files/colors.txt
test_files/vibgyor.txt
Exclude/Include specific files/directories
- By default, recursive search includes hidden files as well
- They can be excluded by file name or directory name
- glob patterns can be used
- for example:
*.[ch]
to specify all files ending with.c
or.h
- The exclusion options can be used multiple times
- for example:
--exclude='*.txt' --exclude='*.log'
or specified from a file using--exclude-from=FILE
- for example:
- To search only files with specific pattern in their names, use
--include=GLOB
- Note: exclusion/inclusion applies only to basename of file/directory, not the entire path
- To follow all symbolic links (not directly specificied as arguments, but found on recursive search), use
-R
instead of-r
$ grep -ri 'you'
poem.txt:And so are you.
test_files/hidden_files/.fav_color.info:What about you
$ # exclude file names starting with `.` i.e hidden files
$ grep -ri --exclude='.*' 'you'
poem.txt:And so are you.
$ # include only file names ending with `.info`
$ grep -ri --include='*.info' 'you'
test_files/hidden_files/.fav_color.info:What about you
$ # exclude a directory
$ grep -ri --exclude-dir='hidden_files' 'you'
poem.txt:And so are you.
$ # If you are using git(or similar), this would be handy
$ # grep --exclude-dir='.git' -rl 'search pattern'
Recursive search with bash options
- Using
bash
optionsglobstar
(for recursion)- Other options like
extglob
anddotglob
come in handy too - See glob for more info on these options
- Other options like
- The
-d skip
option tells grep to skip directories instead of trying to treat them as text file to be searched
$ grep -ril 'yellow'
test_files/hidden_files/.fav_color.info
test_files/vibgyor.txt
$ # recursive search
$ shopt -s globstar
$ grep -d skip -il 'yellow' **/*
test_files/vibgyor.txt
$ # include hidden files as well
$ shopt -s dotglob
$ grep -d skip -il 'yellow' **/*
test_files/hidden_files/.fav_color.info
test_files/vibgyor.txt
$ # use extended glob patterns
$ shopt -s extglob
$ # other than poem.txt
$ grep -d skip -il 'red' **/!(poem.txt)
test_files/colors.txt
test_files/vibgyor.txt
$ # other than poem.txt or colors.txt
$ grep -d skip -il 'red' **/!(poem|colors).txt
test_files/vibgyor.txt
Recursive search using find command
find
is obviously more versatile- See also this guide for more examples/tutorials on using
find
$ # all files, including hidden ones
$ find -type f -exec grep -il 'red' {} +
./poem.txt
./test_files/colors.txt
./test_files/vibgyor.txt
$ # all files ending with .txt
$ find -type f -name '*.txt' -exec grep -in 'you' {} +
./poem.txt:4:And so are you.
$ # all files not ending with .txt
$ find -type f -not -name '*.txt' -exec grep -in 'you' {} +
./test_files/hidden_files/.fav_color.info:2:What about you
Passing file names to other commands
- To pass files filtered to another command, see if the receiving command can differentiate file names by ASCII NUL character
- If so, use the
-Z
so thatgrep
output is terminated with NUL character and commands likexargs
have option-0
to understand it - This helps when file names can have characters like space, newline, etc
- Typical use case: Search and replace something in all files matching some pattern, for ex:
grep -rlZ 'PAT1' | xargs -0 sed -i 's/PAT2/REPLACE/g'
$ # prompt at end of line not shown for simplicity
$ # ^@ here indicates the NUL character
$ grep -rlZ 'you' | cat -A
poem.txt^@test_files/hidden_files/.fav_color.info^@
$ # print first column from all lines of all files
$ grep -rlZ 'you' | xargs -0 awk '{print $1}'
Roses
Violets
Sugar
And
I
What
- simple example to show filenames with space causing issue if
-Z
is not used
$ # 'abc xyz.txt' is a file with space in its name
$ grep -ri 'are'
abc xyz.txt:hi how are you
poem.txt:Roses are red,
poem.txt:Violets are blue,
poem.txt:And so are you.
saved_output.txt:Violets are blue,
$ # problem when -Z is not used
$ grep -ril 'are' | xargs grep 'you'
grep: abc: No such file or directory
grep: xyz.txt: No such file or directory
poem.txt:And so are you.
$ # no issues if -Z is used
$ grep -rilZ 'are' | xargs -0 grep 'you'
abc xyz.txt:hi how are you
poem.txt:And so are you.
- Example for matching more than one search string anywhere in file
$ # files containing 'you'
$ grep -rl 'you'
poem.txt
test_files/hidden_files/.fav_color.info
$ # files containing 'you' as well as 'are'
$ grep -rlZ 'you' | xargs -0 grep -l 'are'
poem.txt
$ # files containing 'you' but NOT 'are'
$ grep -rlZ 'you' | xargs -0 grep -L 'are'
test_files/hidden_files/.fav_color.info
- another example
$ grep -rilZ 'red' | xargs -0 grep -il 'blue'
poem.txt
test_files/colors.txt
test_files/vibgyor.txt
$ # note the use of `-Z` for middle command
$ grep -rilZ 'red' | xargs -0 grep -ilZ 'blue' | xargs -0 grep -il 'violet'
poem.txt
test_files/vibgyor.txt
Search strings from file
- using file input to specify search terms
-F
option will force matching strings literally(no regular expressions)- See also stackoverflow - Fastest way to find lines of a text file from another larger text file - read all answers
$ grep -if test_files/colors.txt poem.txt
Roses are red,
Violets are blue,
$ # get common lines between two files
$ grep -Fxf test_files/colors.txt test_files/vibgyor.txt
Blue
Green
Red
$ # get lines present in vibgyor.txt but not in colors.txt
$ grep -Fvxf test_files/colors.txt test_files/vibgyor.txt
Violet
Indigo
Yellow
Orange
Options for scripting purposes
- In scripts, often it is needed just to know if a pattern matches or not
- The
-q
option doesn't print anything on stdout and exit status is0
if match is found- Check out this practical script using the
-q
option
- Check out this practical script using the
$ grep -qi 'rose' poem.txt
$ echo $?
0
$ grep -qi 'lily' poem.txt
$ echo $?
1
$ if grep -qi 'rose' poem.txt; then echo 'match found!'; else echo 'match not found'; fi
match found!
$ if grep -qi 'lily' poem.txt; then echo 'match found!'; else echo 'match not found'; fi
match not found
- The
-s
option will suppress error messages as well
$ grep 'rose' file_xyz.txt
grep: file_xyz.txt: No such file or directory
$ grep -s 'rose' file_xyz.txt
$ echo $?
2
$ touch foo.txt
$ chmod -r foo.txt
$ grep 'rose' foo.txt
grep: foo.txt: Permission denied
$ grep -s 'rose' foo.txt
$ echo $?
2
Regular Expressions - BRE/ERE
Before diving into regular expressions, few examples to show default grep
behavior vs -F
$ # oops, why did it not match?
$ echo 'int a[5]' | grep 'a[5]'
$ # where did that error come from??
$ echo 'int a[5]' | grep 'a['
grep: Invalid regular expression
$ # what is going on???
$ echo 'int a[5]' | grep 'a[5'
grep: Unmatched [ or [^
$ # phew, -F is a life saver
$ echo 'int a[5]' | grep -F 'a[5]'
int a[5]
$ # [ and ] are meta characters, details in following sections
$ echo 'int a[5]' | grep 'a\[5]'
int a[5]
- By default,
grep
treats the search pattern as BRE (Basic Regular Expression)-G
option can be used to specify explicitly that BRE is used
- The
-E
option allows to use ERE (Extended Regular Expression) which in GNU grep's case only differs in how meta characters are used, no difference in regular expression functionalities - If
-F
option is used, the search string is treated literally - If available, one can also use
-P
which indicates PCRE (Perl Compatible Regular Expression)
Line Anchors
- Often, search must match from beginning of line or towards end of line
- For example, an integer variable declaration in
C
will start with optional white-space, the keywordint
, white-space and then variable(s)- This way one can avoid matching declarations inside single line comments as well.
- Similarly, one might want to match a variable at end of statement
- The meta characters for line anchoring are
^
for beginning of line and$
for end of line
$ echo 'Fantasy is my favorite genre' > fav.txt
$ echo 'My favorite genre is Fantasy' >> fav.txt
$ cat fav.txt
Fantasy is my favorite genre
My favorite genre is Fantasy
$ # start of line
$ grep '^Fantasy' fav.txt
Fantasy is my favorite genre
$ # end of line
$ grep 'Fantasy$' fav.txt
My favorite genre is Fantasy
$ # without anchors
$ grep 'Fantasy' fav.txt
Fantasy is my favorite genre
My favorite genre is Fantasy
- As the meta characters have special meaning (assuming
-F
option is not used), they have to be escaped using\
to match literally - The
\
itself is meta character, so to match it literally, use\\
- The line anchors
^
and$
have special meaning only when they are present at start/end of regular expression
$ echo '^foo bar$' | grep '^foo'
$ echo '^foo bar$' | grep '\^foo'
^foo bar$
$ echo '^foo bar$' | grep '^^foo'
^foo bar$
$ echo '^foo bar$' | grep 'bar$'
$ echo '^foo bar$' | grep 'bar\$'
^foo bar$
$ echo '^foo bar$' | grep 'bar$$'
^foo bar$
$ echo 'foo $ bar' | grep ' $ '
foo $ bar
$ printf 'foo\cbar' | grep -o '\c'
c
$ printf 'foo\cbar' | grep -o '\\c'
\c
Word Anchors
- The
-w
option works well to match whole words. But what about matching only start or end of words? - Anchors
\<
and\>
will match start/end positions of a word \b
can also be used instead of\<
and\>
which matches both edges of a word
$ printf 'spar\npar\npart\napparent\n'
spar
par
part
apparent
$ # words ending with par
$ printf 'spar\npar\npart\napparent\n' | grep 'par\>'
spar
par
$ # words starting with par
$ printf 'spar\npar\npart\napparent\n' | grep '\<par'
par
part
-w
option is same as specifying both start and end word boundaries
$ printf 'spar\npar\npart\napparent\n' | grep '\<par\>'
par
$ printf 'spar\npar\npart\napparent\n' | grep '\bpar\b'
par
$ printf 'spar\npar\npart\napparent\n' | grep -w 'par'
par
\b
has an opposite\B
which is quite useful too
$ # string not surrounded by word boundary either side
$ printf 'spar\npar\npart\napparent\n' | grep '\Bpar\B'
apparent
$ # word containing par but not as start of word
$ printf 'spar\npar\npart\napparent\n' | grep '\Bpar'
spar
apparent
$ # word containing par but not as end of word
$ printf 'spar\npar\npart\napparent\n' | grep 'par\B'
part
apparent
- the word boundary escape sequences differ slightly from
-w
option
$ # this fails because there is no word boundary between space and +
$ echo '2 +3 = 5' | grep '\b+3\b'
$ # this works as -w only ensures that there are no surrounding word characters
$ echo '2 +3 = 5' | grep -w '+3'
2 +3 = 5
$ # doesn't work as , isn't at start of word boundary
$ echo 'hi, 2 one' | grep '\<, 2\>'
$ # won't match as there are word characters before ,
$ echo 'hi, 2 one' | grep -w ', 2'
$ # works as \b matches both edges and , is at end of word after i
$ echo 'hi, 2 one' | grep '\b, 2\b'
hi, 2 one
Alternation
- The
|
meta character is similar to using multiple-e
option - Each side of
|
is complete regular expression with their own start/end anchors - How each part of alternation is handled and order of evaluation/output is beyond the scope of this tutorial
- See this for more info on this topic.
|
is one of meta characters that requires different syntax between BRE/ERE
$ grep 'blue\|you' poem.txt
Violets are blue,
And so are you.
$ grep -E 'blue|you' poem.txt
Violets are blue,
And so are you.
$ # extract case-insensitive e or f from anywhere in line
$ echo 'Fantasy is my favorite genre' | grep -Eio 'e|f'
F
f
e
e
e
$ # extract case-insensitive e at end of line, f at start of line
$ echo 'Fantasy is my favorite genre' | grep -Eio 'e$|^f'
F
e
- A cool usecase of alternation is using
^
or$
anchors to highlight searched term as well as display rest of unmatched lines- the line anchors will match every input line, even empty lines as they are position markers
$ grep --color=auto -E '^|are' poem.txt
Roses are red,
Violets are blue,
Sugar is sweet,
And so are you.
$ grep --color=auto -E 'is|$' poem.txt
Roses are red,
Violets are blue,
Sugar is sweet,
And so are you.
Screenshot for above example:
See also
The dot meta character
The .
meta character matches is used to match any character
$ # any two characters surrounded by word boundaries
$ echo 'I have 12, he has 132!' | grep -ow '..'
12
he
$ # match three characters from start of line
$ # \t (TAB) is single character here
$ printf 'a\tbcd\n' | grep -o '^...'
a b
$ # all three character word starting with c
$ echo 'car bat cod cope scat dot abacus' | grep -ow 'c..'
car
cod
$ echo '1 & 2' | grep -o '.'
1
&
2
Greedy Quantifiers
Defines how many times a character (simplified for now) should be matched
?
will try to match 0 or 1 time- For BRE, use
\?
$ printf 'late\npale\nfactor\nrare\nact\n'
late
pale
factor
rare
act
$ # match a followed by t, with or without c in between
$ printf 'late\npale\nfactor\nrare\nact\n' | grep -E 'ac?t'
late
factor
act
$ # same as using this alternation
$ printf 'late\npale\nfactor\nrare\nact\n' | grep -E 'at|act'
late
factor
act
*
will try to match 0 or more times- There is no upper limit and
*
will try to match as many times as possible- if matching maximum times results in overall regex failing, then next best count is chosen until overall regex passes
- if there are multiple quantifiers, left-most quantifier gets precedence
$ echo 'abbbc' | grep -o 'b*'
bbb
$ # matches 0 or more b only if surrounded by a and c
$ echo 'abc ac adc abbc bbb bc' | grep -o 'ab*c'
abc
ac
abbc
$ # see how it matched everything
$ echo 'car bat cod map scat dot abacus' | grep -o '.*'
car bat cod map scat dot abacus
$ # but here it stops at m
$ echo 'car bat cod map scat dot abacus' | grep -o '.*m'
car bat cod m
$ # stopped at dot, not bat or scat - match as much as possible
$ echo 'car bat cod map scat dot abacus' | grep -o 'c.*t'
car bat cod map scat dot
$ # matching overall expression gets preference
$ echo 'car bat cod map scat dot abacus' | grep -o 'c.*at'
car bat cod map scat
$ # precedence is left to right in case of multiple matches
$ echo 'car bat cod map scat dot abacus' | grep -o 'b.*m'
bat cod m
$ echo 'car bat cod map scat dot abacus' | grep -o 'b.*m*'
bat cod map scat dot abacus
+
will try to match 1 or more times- Another meta character that differs in syntax between BRE/ERE
$ echo 'abbbc' | grep -o 'b\+'
bbb
$ echo 'abbbc' | grep -oE 'b+'
bbb
$ echo 'abc ac adc abbc bbb bc' | grep -oE 'ab+c'
abc
abbc
$ echo 'abc ac adc abbc bbb bc' | grep -o 'ab*c'
abc
ac
abbc
- For more precise control on number of times to match,
{}
is useful- use
\{\}
for BRE
- use
- It can take one of four forms,
{m,n}
,{,n}
,{m,}
and{n}
$ # {m,n} - m to n, including both m and n
$ echo 'ac abc abbc abbbc' | grep -Eo 'ab{1,2}c'
abc
abbc
$ # {,n} - 0 to n times
$ echo 'ac abc abbc abbbc' | grep -Eo 'ab{,2}c'
ac
abc
abbc
$ # {m,} - at least m times
$ echo 'ac abc abbc abbbc' | grep -Eo 'ab{2,}c'
abbc
abbbc
$ # {n} - exactly n times
$ echo 'ac abc abbc abbbc' | grep -Eo 'ab{2}c'
abbc
Character classes
- The meta character pairs
[]
allow to match any of the multiple characters within[]
- Meta characters like
^
,$
have different meaning inside and outside of[]
- Simple example first, matching any of the characters within
[]
$ echo 'do so in to no on' | grep -ow '[nt]o'
to
no
$ echo 'do so in to no on' | grep -ow '[sot][on]'
so
to
on
- Adding a quantifier
- Check out unix words) and sample words file
$ # words made up of letters o and n, at least 2 letters
$ grep -xE '[on]{2,}' /usr/share/dict/words
no
non
noon
on
$ # lines containing only digits
$ printf 'cat\nfoo\n123\nbaz\n42\n' | grep -xE '[0123456789]+'
123
42
- Character ranges
- Matching any alphabet, number, hexadecimal number etc becomes cumbersome if every character has to be individually specified
- So, there's a shortcut, using
-
to construct a range (has to be specified in ascending order) - See ascii codes table for reference
- Note that behavior of range will differ for other character encodings
- See Character Classes and Bracket Expressions as well as LC_COLLATE under Environment Variables sections in
info grep
for more detail
- Matching Numeric Ranges with a Regular Expression
$ printf 'cat\nfoo\n123\nbaz\n42\n' | grep -xE '[0-9]+'
123
42
$ printf 'cat\nfoo\n123\nbaz\n42\n' | grep -xiE '[a-z]+'
cat
foo
baz
$ # only valid decimal numbers
$ printf '128\n34\nfe32\nfoo1\nbar\n' | grep -xE '[0-9]+'
128
34
$ # only valid octal numbers
$ printf '128\n34\nfe32\nfoo1\nbar\n' | grep -xE '[0-7]+'
34
$ # only valid hexadecimal numbers
$ printf '128\n34\nfe32\nfoo1\nbar\n' | grep -xiE '[0-9a-f]+'
128
34
fe32
$ # numbers between 10-29
$ echo '23 54 12 92' | grep -owE '[12][0-9]'
23
12
- Negating character class
- By using
^
as first character inside[]
, we get inverted character class- As pointed out earlier, some meta characters behave differently inside and outside of
[]
- As pointed out earlier, some meta characters behave differently inside and outside of
$ # alphabetic words not starting with c
$ echo '123 core not sink code finish' | grep -owE '[^c][a-z]+'
not
sink
finish
$ # excluding numbers 2,3,4,9
$ # note that 200a 200; etc will also match, usage depends on knowing input
$ echo '2001 2004 2005 2008 2009' | grep -ow '200[^2-49]'
2001
2005
2008
$ # get characters from start of line upto(not including) known identifier
$ echo 'foo=bar; baz=123' | grep -oE '^[^=]+'
foo
$ # get characters at end of line from(not including) known identifier
$ echo 'foo=bar; baz=123' | grep -oE '[^=]+$'
123
$ # get all sequence of characters surrounded by unique identifier
$ echo 'I like "mango" and "guava"' | grep -oE '"[^"]+"'
"mango"
"guava"
- Matching meta characters inside
[]
- Most meta characters like
( ) . + { } | $
don't have special meaning inside[]
and hence do not require special treatment - Some combination like
[.
or=]
cannot be used in this order, as they have special meaning within[]
- See Character Classes and Bracket Expressions section in
info grep
for more detail
- See Character Classes and Bracket Expressions section in
$ # to match - it should be first or last character within []
$ echo 'Foo-bar 123-456 42 Co-operate' | grep -oiwE '[a-z-]+'
Foo-bar
Co-operate
$ # to match ] it should be first character within []
$ printf 'int a[5]\nfoo=bar\n' | grep '[]=]'
int a[5]
foo=bar
$ # to match [ use [ anywhere in the character list
$ # [][] will match both [ and ]
$ printf 'int a[5]\nfoo=bar\n' | grep '[[]'
int a[5]
$ # to match ^ it should be other than first in the list
$ echo '(a+b)^2 = a^2 + b^2 + 2ab' | grep -owE '[a-z^0-9]{3,}'
a^2
b^2
2ab
- Named character classes
- Equivalent class shown is for C locale and ASCII character encoding
- See ascii codes table for reference
- See Character Classes and Bracket Expressions section in
info grep
for more detail
Character classes | Description |
---|---|
[:digit:] |
Same as [0-9] |
[:lower:] |
Same as [a-z] |
[:upper:] |
Same as [A-Z] |
[:alpha:] |
Same as [a-zA-Z] |
[:alnum:] |
Same as [0-9a-zA-Z] |
[:xdigit:] |
Same as [0-9a-fA-F] |
[:cntrl:] |
Control characters - first 32 ASCII characters and 127th (DEL) |
[:punct:] |
All the punctuation characters |
[:graph:] |
[:alnum:] and [:punct:] |
[:print:] |
[:alnum:] , [:punct:] and space |
[:blank:] |
Space and tab characters |
[:space:] |
white-space characters: tab, newline, vertical tab, form feed, carriage return and space |
$ printf '128\n34\nAB32\nFoo\nbar\n' | grep -x '[[:alnum:]]*'
128
34
AB32
Foo
bar
$ printf '128\n34\nAB32\nFoo\nbar\n' | grep -x '[[:lower:]]*'
bar
$ printf '128\n34\nAB32\nFoo\nbar\n' | grep -x '[[:lower:]0-9]*'
128
34
bar
- backslash character classes
Character classes | Description |
---|---|
\w |
Same as [0-9a-zA-Z_] or [[:alnum:]_] |
\W |
Same as [^0-9a-zA-Z_] or [^[:alnum:]_] |
\s |
Same as [[:space:]] |
\S |
Same as [^[:space:]] |
$ printf '123\n$#\ncmp_str\nFoo_bar\n' | grep -x '\w*'
123
cmp_str
Foo_bar
$ printf '123\n$#\ncmp_str\nFoo_bar\n' | grep -x '[[:alnum:]_]*'
123
cmp_str
Foo_bar
$ printf '123\n$#\ncmp_str\nFoo_bar\n' | grep -x '\W*'
$#
$ printf '123\n$#\ncmp_str\nFoo_bar\n' | grep -x '[^[:alnum:]_]*'
$#
Grouping
- Character classes allow matching against a choice of multiple character list and then quantifier added if needed
- One of the uses of grouping is analogous to character classes for whole regular expressions, instead of just list of characters
- The meta characters
()
are used for grouping- requires
\(\)
for BRE
- requires
- Similar to
a(b+c)d = abd+acd
in maths, you geta(b|c)d = abd|acd
in regular expressions
$ # 5 letter words starting with c and ending with ty or ly
$ grep -xE 'c..(ty|ly)' /usr/share/dict/words
catty
coyly
curly
$ # 7 letter words starting with e and ending with rged or sted
$ grep -xE 'e..(rg|st)ed' /usr/share/dict/words
emerged
existed
$ # repeat a pattern 3 times
$ grep -xE '([a-d][r-z]){3}' /usr/share/dict/words
avatar
awards
cravat
$ # nesting of () is allowed
$ grep -E '([as](p|c)[r-t]){2}' /usr/share/dict/words
scraps
$ # can be used to match specific columns in well defined tables
$ echo 'foo:123:bar:baz' | grep -E '^([^:]+:){2}bar'
foo:123:bar:baz
Back reference
- The matched string within
()
can also be used to be matched again by back referencing the captured groups \1
denotes the first matched group,\2
the second one and so on- Order is leftmost
(
is\1
, next one is\2
and so on
- Order is leftmost
- Note that the matched string, not the regular expression itself is referenced
- for ex: if
([0-9][a-f])
matches3b
, then back referencing will be3b
not any other valid match of the regular expression like8f
,0a
etc - Other regular expressions like PCRE do allow referencing the regular expression itself
- for ex: if
$ # note how first three and last three letters are same
$ grep -xE '([a-d]..)\1' /usr/share/dict/words
bonbon
cancan
chichi
$ # note how adding quantifier is not same as back-referencing
$ grep -m4 -xE '([a-d]..){2}' /usr/share/dict/words
abacus
abided
abides
ablaze
$ # words with consecutive repeated letters
$ echo 'eel flee all pat ilk seen' | grep -iowE '[a-z]*(.)\1[a-z]*'
eel
flee
all
seen
$ # 17 letter words with first and last as same letter
$ grep -xE '(.)[a-z]{15}\1' /usr/share/dict/words
semiprofessionals
transcendentalist
- Spotting repeated words
$ cat story.txt
singing tin in the rain
walking for for a cause
have a nice day
day and night
$ grep -wE '(\w+)\W+\1' story.txt
walking for for a cause
- Note that there is an issue for certain usage of back-reference and quantifier
$ # no output
$ grep -m5 -xiE '([a-z]*([a-z])\2[a-z]*){2}' /usr/share/dict/words
$ # works when nesting is unrolled
$ grep -m5 -xiE '[a-z]*([a-z])\1[a-z]*([a-z])\2[a-z]*' /usr/share/dict/words
Abbott
Annabelle
Annette
Appaloosa
Appleseed
$ # no problem if PCRE is used instead of ERE
$ grep -m5 -xiP '([a-z]*([a-z])\2[a-z]*){2}' /usr/share/dict/words
Abbott
Annabelle
Annette
Appaloosa
Appleseed
Multiline matching
- If input is small enough to meet memory requirements, the
-z
option comes in handy to match across multiple lines - Instead of newline being line separator, the ASCII NUL character is used
- So, multiline matching depends on whether or not input file itself contains the NUL character
- Usually text files won't have occasion to use the NUL character and presence of it marks it as binary file for
grep
$ # \0 for ASCII NUL character
$ printf 'red\nblue\n\0green\n' | cat -e
red$
blue$
^@green$
$ # see --binary-files=TYPE option in info grep for binary details
$ printf 'red\nblue\n\0green\n' | grep -a 'red'
red
$ # with -z, \0 marks the different 'lines'
$ printf 'red\nblue\n\0green\n' | grep -z 'red'
red
blue
$ # if no \0 in input, entire input read as single string
$ printf 'red\nblue\ngreen\n' | grep -z 'red'
red
blue
green
\n
is not defined in BRE/ERE- see unix.stackexchange - How to specify characters using hexadecimal codes for a workaround
- if some characteristics of input is known,
[[:space:]]
can be used as workaround, which matches all white-space characters
$ grep -oz 'Roses.*blue,[[:space:]]' poem.txt
Roses are red,
Violets are blue,
Perl Compatible Regular Expressions
$ # see also: https://github.com/learnbyexample/command_help
$ man grep | sed -n '/^\s*-P/,/^$/p'
-P, --perl-regexp
Interpret the pattern as a Perl-compatible regular expression
(PCRE). This is highly experimental and grep -P may warn of
unimplemented features.
- The man page informs that
-P
is highly experimental. So far, haven't faced any issues. But do keep this in mind.- newer versions of
GNU grep
has fixes for some-P
bugs, see release notes for an overview of changes between versions
- newer versions of
- Only a few highlights is presented here
- For more info
man pcrepattern
or read it online- perldoc - re - Perl regular expression syntax, also links to other related tutorials
- stackoverflow - What does this regex mean?
Backslash sequences
Some of the backslash constructs available in PCRE over already seen ones in ERE
\d
for[0-9]
\s
for[ \t\r\n\f\v]
\h
for[ \t]
\n
for newline character\D
,\S
,\H
,\N
etc for their opposites
$ # example for [0-9] in ERE and \d in PCRE
$ echo 'foo=5, bar=3; x=83, y=120' | grep -oE '[0-9]+'
5
3
83
120
$ echo 'foo=5, bar=3; x=83, y=120' | grep -oP '\d+'
5
3
83
120
$ # (?s) allows newlines to be also matches when using . meta character
$ grep -ozP '(?s)Roses.*blue,\n' poem.txt
Roses are red,
Violets are blue,
- See INTERNAL OPTION SETTING in
man pcrepattern
for more info on(?s)
,(?m)
etc - Specifying Modes Inside The Regular Expression also has some detail on such options
Non-greedy matching
- Both BRE/ERE support only greedy matching quantifiers
- match as much as possible
- PCRE supports non-greedy version by adding
?
after quantifiers- match as minimal as possible
- See this Python notebook for an interesting project on palindrome sentences
$ echo 'foo and bar and baz went shopping bytes' | grep -oi '\w.*and'
foo and bar and
$ echo 'foo and bar and baz went shopping bytes' | grep -oiP '\w.*?and'
foo and
bar and
$ # recall that matching overall expression gets preference
$ echo 'foo and bar and baz went shopping bytes' | grep -oi '\w.*and baz'
foo and bar and baz
$ echo 'foo and bar and baz went shopping bytes' | grep -oiP '\w.*?and baz'
foo and bar and baz
$ # minimal matching with single character has simple workaround
$ echo 'A man, a plan, a canal, Panama' | grep -oi 'a.*,'
A man, a plan, a canal,
$ echo 'A man, a plan, a canal, Panama' | grep -oi 'a[^,]*,'
A man,
a plan,
a canal,
Lookarounds
- Ability to add conditions to match before/after required pattern
- There are four types
- positive lookahead
(?=
- negative lookahead
(?!
- positive lookbehind
(?<=
- negative lookbehind
(?<!
- positive lookahead
- One way to remember is that behind uses
<
and negative uses!
instead of=
- When used with
-o
option, lookarounds portion won't be part of output
Fixed and variable length lookbehind
$ # extract digits preceded by single lowercase letter and =
$ # this is fixed length lookbehind because length is known
$ echo 'foo=5, bar=3; x=83, y=120' | grep -oP '(?<=\b[a-z]=)\d+'
83
120
$ # error because {2,} induces variable length matching
$ echo 'foo=5, bar=3; x=83, y=120' | grep -oP '(?<=\b[a-z]{2,}=)\d+'
grep: lookbehind assertion is not fixed length
$ # use \K for such cases
$ echo 'foo=5, bar=3; x=83, y=120' | grep -oP '\b[a-z]{2,}=\K\d+'
5
3
- Examples for lookarounds
$ # extract digits that follow =
$ echo 'foo=5, bar=3; x=83, y=120' | grep -oP '=\K\d+'
5
3
83
120
$ # digits that follow = and has , after
$ echo 'foo=5, bar=3; x=83, y=120' | grep -oP '=\K\d+(?=,)'
5
83
$ # extract words, but not those at start of line
$ echo 'car bat cod map' | grep -owP '(?<!^)\w+'
bat
cod
map
$ # extract words, but not those at start of line or end of line
$ echo 'car bat cod map' | grep -owP '(?<!^)\w+(?!$)'
bat
cod
$ # matching multiple search patterns in any order
$ grep -P '(?=.*are)(?=.*s).*d' poem.txt
Roses are red,
And so are you.
Ignoring specific matches
- A useful construct is
(*SKIP)(*F)
which allows to discard matches not needed - Simple way to use is that regular expression which should be discarded is written first,
(*SKIP)(*F)
is appended and then whichever is required by added after|
- See Excluding Unwanted Matches for more info
$ # all words except bat and map
$ echo 'car bat cod map' | grep -oP '(bat|map)(*SKIP)(*F)|\w+'
car
cod
$ # all words except those surrounded by double quotes
$ echo 'I like "mango" and "guava"' | grep -oP '"[^"]+"(*SKIP)(*F)|\w+'
I
like
and
Re-using regular expression pattern
\1
,\2
etc only matches exact string(?1)
,(?2)
etc re-uses the regular expression itself
$ # (?1) refers to first group \d{4}-\d{2}-\d{2}
$ echo '2008-03-24 and 2012-08-12 foo' | grep -oP '(\d{4}-\d{2}-\d{2})\D+(?1)'
2008-03-24 and 2012-08-12
Gotchas and Tips
- Always quote the search string (unless you know what you are doing :P)
$ # spaces are special
$ grep so are poem.txt
grep: are: No such file or directory
poem.txt:And so are you.
$ grep 'so are' poem.txt
And so are you.
$ # use of # indicates start of comment
$ printf 'foo\na#2\nb#3\n' | grep #2
Usage: grep [OPTION]... PATTERN [FILE]...
Try 'grep --help' for more information.
$ printf 'foo\na#2\nb#3\n' | grep '#2'
a#2
- Another common problem is unquoted search string will be open to shell's own globbing rules
$ # sample output on bash shell, might vary for different shells
$ echo '*.txt' | grep -F *.txt
$ echo '*.txt' | grep -F '*.txt'
*.txt
- Use double quotes for variable expansion, command substitution, etc (Note: could vary based on shell used)
- See mywiki.wooledge Quotes for detailed discussion of quoting in
bash
shell
$ # sample output on bash shell, might vary for different shells
$ color='blue'
$ grep "$color" poem.txt
Violets are blue,
- Pattern starting with
-
$ # this issue is not specific to grep alone
$ # the command assumes -2 is an option and hence the error
$ echo '5*3-2=13' | grep '-2'
Usage: grep [OPTION]... PATTERN [FILE]...
Try 'grep --help' for more information.
$ # workaround by using \-
$ echo '5*3-2=13' | grep '\-2'
5*3-2=13
$ # or use -- to indicate no further options to process
$ echo '5*3-2=13' | grep -- '-2'
5*3-2=13
$ # same issue with printf
$ printf '-1+2=1\n'
bash: printf: -1: invalid option
printf: usage: printf [-v var] format [arguments]
$ printf -- '-1+2=1\n'
-1+2=1
- Tip: Options can be specified at end of command as well, useful if option was forgotten and have to quickly add it to previous command from history
$ grep 'are' poem.txt
Roses are red,
Violets are blue,
And so are you.
$ # use previous command from history, for ex up arrow key in bash
$ # then simply add the option at end
$ grep 'are' poem.txt -n
1:Roses are red,
2:Violets are blue,
4:And so are you.
- Speed boost if input file is ASCII
- See also unix.stackexchange - Counting the number of lines having a number > 100 - where
grep
is blazing fast compared to other solutions
$ time grep -xE '([a-d][r-z]){3}' /usr/share/dict/words
avatar
awards
cravat
real 0m0.145s
$ time LC_ALL=C grep -xE '([a-d][r-z]){3}' /usr/share/dict/words
avatar
awards
cravat
real 0m0.011s
- Speed boost by using PCRE for back-references
- might be faster when using quantifiers as well
$ time LC_ALL=C grep -xE '([a-z]..)\1' /usr/share/dict/words
bonbon
cancan
chichi
murmur
muumuu
pawpaw
pompom
tartar
testes
real 0m0.174s
$ time grep -xP '([a-z]..)\1' /usr/share/dict/words
bonbon
cancan
chichi
murmur
muumuu
pawpaw
pompom
tartar
testes
real 0m0.008s
Regular Expressions Reference (ERE)
Anchors
^
match from start of line$
match end of line\<
match beginning of word\>
match end of word\b
match edge of word\B
match other than edge of word
Character Quantifiers
.
match any single character*
match preceding character/group 0 or more times+
match preceding character/group 1 or more times?
match preceding character/group 0 or 1 times{m,n}
match preceding character/group m to n times, including m and n{m,}
match preceding character/group m or more times{,n}
match preceding character/group 0 to n times{n}
match preceding character/group exactly n times
Character classes and backslash sequences
[aeiou]
match any of these characters[^aeiou]
do not match any of these characters[a-z]
match any lowercase alphabet[0-9]
match any digit character\w
match alphabets, digits and underscore character, short cut for[a-zA-Z0-9_]
\W
opposite of\w
, short cut for[^a-zA-Z0-9_]
\s
match white-space characters: tab, newline, vertical tab, form feed, carriage return, and space\S
match other than white-space characters
Pattern groups
|
matches either of the given patterns()
patterns within()
are grouped and treated as one pattern, useful in conjunction with|
\1
backreference to first grouped pattern within()
\2
backreference to second grouped pattern within()
and so on
Basic vs Extended Regular Expressions
By default, the pattern passed to grep
is treated as Basic Regular Expressions(BRE), which can be overridden using options like -E
for ERE and -P
for Perl Compatible Regular Expression(PCRE). Paraphrasing from info grep
In Basic Regular Expressions the meta-characters
? + { | ( )
lose their special meaning, instead use the backslashed versions\? \+ \{ \| \( \)
Further Reading
man grep
andinfo grep
- At least go through all options ;)
- Usage section in
info grep
has good examples as well
- This chapter has also been converted to a book with additional examples, exercises and covers popular alternative
ripgrep
- A bit of history
- Q&A on stackoverflow/stackexchange are good source of learning material, good for practice exercises as well
- Learn Regular Expressions (has information on flavors other than BRE/ERE/PCRE too)
- Regular Expressions Tutorial
- rexegg - tutorials, tricks and more
- regexcrossword
- stackoverflow - What does this regex mean?
- online regex tester and debugger - by default
pcre
flavor
- Alternatives
- unix.stackexchange - When to use grep, sed, awk, perl, etc