Appendix A:
Regular Expressions
It’s All Greek to Me
Regular Expressions
• A pattern that matches a set of one or more
strings
• May be a simple string, or contain wildcard
characters or modifiers
• Used by programs such as vim, grep, awk,
and sed
• Not the same as shell expansion
Components
• Characters
– Literals
– Special Characters
• Delimiters
– Mark beginning end of regular expressions
– Usually /
– ’ (but not really)
Simple Strings
• Contain no special characters
• Matches only the string
• Ex: /foo/ matches:
– foo
– tomfoolery
– bar.foo.com
Special Characters
• Can match multiple strings
• Represent zero or more characters
• Always match the longest possible string
(we’ll see examples in a bit)
Periods
• Matches any single character
• Ex: /.ing/
– I was talking
– bling
– he called ingred
• Ex: /spar.ing/
– sparring
– sparking
Brackets
• Define a character class
• Match any one character in the class
• If a carat (^) is first character in class,
character class matches any character not in
class
• Other special characters in class lose
meaning
Brackets con’t
•
•
•
•
Ex. /[jJ]ustin/ matches justin and Justin
Ex. /[A-Za-z]/ matches any letter
Ex. /[0-9]/ matches any number
Ex. /[^a-z]/ matches anything but
lowercase letters
Asterisks
• Zero or more occurrences of the previous
character
• So match any number of characters would
be /.*/
• Ex. /t.*ing/
– thing
– this is really annoying
Plus Signs and Question Marks
• Very similar to asterisks, depend on previous
• + matches one or more occurrences (not 0)
• ? Matches zero or one occurrence (no more)
• Ex. /2+4?/ matches one or more 2’s
followed by either zero or one 4
– 22224, 2 match
– 4, 244 do not
• Part of the class of extended R.E.
Carets & Dollar Signs
• If a regular expression starts with a ^, the
string must be at the beginning of a line
• If a regular expression ends with a $, the
string must be at the end of a line
• ^ and $ are referred to as anchors
• Ex. /^T.*T$/ matches any line that starts
and ends with T
Quoting Special Characters
• If you want to use a special character
literally, put a backslash in front of it
• Ex. /and\/or/ matches and/or
• Ex. /\\/ matches \
• Ex. /\**/ matches any number of asterisks
Longest Match
• Regular expressions match the longest string
possible in a line
• Ex. I (Justin) like coffee (lots).
• /(.*)/
– Matches (Justin) like coffee (lots)
• /([^)]*)/
– Matches (Justin)
Boolean OR
• You can pattern match for two distinct strings
using OR (the pipe)
• Ex. /CAT|DOG/
– Matches exactly CAT and exactly DOG
• Simplier expressions can be written just
using a character class
– I.E. /a[bc]/ instead of /ab|ac/
• Also part of extended R.E.
Grouping
• You can apply special characters to groups
of characters in parenthesis
• Also called bracketing
• Matches same as unbracketed expression
• But can use modifiers
• Ex. /\(duck\)*|\(goose\)/
Using with vim
• Use regular expressions for searching and
substituting
• Searching:
– /string or ?string
• Substituting:
–
–
–
–
:[g][address]s/string/replace[/g]
g : global; substitute all lines
string and replace can be R.E.
/g : global; replace all occurrences in the line
Using with vim con’t
• [address]
–
–
–
–
–
–
n : line number
n[+/-]x : line number plus x lines before or after
n1,n2 : from line n1 to n2
. : alias for current line
$ : alias for last line in work buffer
% : alias for entire work buffer
vim examples
• /^if(
• /end\.$
• :%s/[Jj]ustin/Mr\. Awesome/g
Using with vim con’t
• Ampersand (&)
– Alias for matched string when substituting
– Ex: /[A-Z][0-9]/_&_/
• Quoted digit (\n)
– Used with R.E. with multiple quoted parts
– Can be used to rearrange columns
– Ex: /\([^,]*\), \(.*\)/\2 \1/
Using with grep
• To take advantage of extended regular
expressions, use egrep or grep -E instead
• Use single quote as delimiter
• Ex:
– egrep ’^T.*T$’ myfile
Lists all lines in myfile that begin & end with T