Text-Processing One-Liners

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (110.94 KB, 16 trang )

157
■ ■ ■
CHAPTER 24
Text-Processing One-Liners
E
ven though this book is about using the shell’s command language, I use a fair number
of calls to other utilities for text processing. Sed, awk, and grep are the primary UNIX text-
processing utilities, although I have used others. This chapter gives you a collection of
short and useful one-liners that illustrate quite a few methods for gathering specific infor-
mation from various textual sources.
Very often when writing a script, you need to know source data locations before you
start pruning the data for further processing. For instance, you can find the load average
of a running Linux system from the first line of the output of the top utility, the output of
the uptime command, the output of the w command, and in the /proc/loadavg file. There
are almost always multiple ways to gather and process information, and the tools intro-
duced in this chapter should give you an excellent start on knowing what you will need to
do in many situations.
For more information about any of these utilities, consult Appendix C of this book or
the man pages of individual utilities. This chapter is not intended to cover these utilities
exhaustively; several of these utilities have had complete books written about them.
An extremely common use of the utilities discussed in this chapter is to modify or
filter a string that is obtained from any one of a number of sources, such as from an
environment variable or from output of a system command. For consistency in these
examples, the following common variable is echoed and piped to the utility to illustrate
the mode of use:
VAR="The quick brown fox jumped over the lazy dog."
Displaying Specific Fields
The following example is a simple awk statement to extract data fields from a string con-
taining a record with multiple fields, assuming that whitespace characters separate the
fields. The awk field variables start at $1 and increment up through the end of the string.
In our example string, there are nine fields separated by whitespace. The awk positional

variable $0 is special in that it holds the value of the whole string. Quite often, the print
158
CHAPTER 24
■
TEXT-PROCESSING ONE-LINERS
statement will target only a single field, but this example shows how to extract and reorder
several of the input fields:
echo $VAR | awk '{print $1, $8, $4, $5, $6, $7, $3, $9}'
This produces the following output:
The lazy fox jumped over the brown dog.
Specifying the Field Separator
Here is another simple use of awk, where the field separator is specified using the -F
command-line switch. Using this option causes the source string to be split up based
on something other than whitespace. In this case it is the letter o.
echo $VAR | awk -Fo '{print $4}'
This produces the following output:
ver the lazy d
Simple Pattern-Matching
Matching specific fields of the input is very useful in finding data quickly. A grep com-
mand can easily return lines that match a given string, but awk can return lines that match
a specific value in a specific field. The following example finds and displays all lines whose
second field is equal to the string casper in /etc/hosts. The test used for the second field
could be changed from equal (==) to not equal (!=) to find the lines in the file that do not
contain the string casper in the second field, and more complicated conditions can be
constructed in the usual way.
awk '$2 == "casper" {print $0}' /etc/hosts
This produces the following output:
172.16.5.4 casper casper.mydomain.com
CHAPTER 24
■

TEXT-PROCESSING ONE-LINERS
159
Matching Fields Against Several Values
Another pattern-matching technique, which is similar to the previous one, is to look for
one of several alternatives in a specific field. The example here extends the previous one a
bit by looking for lines in my /etc/hosts file whose IP addresses (in field 1) start with
either 127 or 172. Note that each alternative between the slashes (/) is separated by the
pipe (|) character; this is awk notation for the regular expression specifying the pattern
“starting with 127 or starting with 172.” The pattern-matching operator ~ could also be
replaced with the negated operator !~ to return the lines in the file that don’t match the
expression.
awk '$1 ~ /^127|^172/ {print $0}' /etc/hosts
This produces the following output:
127.0.0.1 localhost
172.16.5.2 phred phred.mydomain.com
172.16.5.4 casper casper.mydomain.com
Determining the Number of Fields
This one-liner illustrates the use of a special awk internal variable NF whose value is the
number of fields in the current line of input. You may want to try changing the field sepa-
rator as shown in the earlier example and note the difference in the result.
echo $VAR | awk '{print NF}'
This produces the following output:
9
Determining the Last Field
This is a slightly modified version of the previous example; it adds a dollar sign ($) in
front of the NF variable. This will print out the value of the last field instead of the num-
ber of fields.
echo $VAR | awk '{print $NF}'
160
CHAPTER 24

■
TEXT-PROCESSING ONE-LINERS
The following output results:
dog.
Determining the Second-to-Last Field
We can use NF to get the second-to-last field of the string, as in the next example. This
could be easily modified to reference other positions in the input relative to the last field.
The previous three examples all relate directly to the standard numeric awk field variables.
From our example string, $NF would be equal to $9. This variable is one layer more
abstract than directly referencing a positional variable. It allows you to reference any par-
ticular field of an arbitrary string length through logic.
echo $VAR | awk '{print $(NF-1)}'
You get the following output:
lazy
Passing Variables to awk
In some cases you may not know until the command is run which field you want. You can
deal with this by passing a value to awk when it is invoked. The following example shows
how you can pass the value of the shell variable TheCount to an awk command. The -v
switch to awk specifies that you are going to set a variable. Following the -v switch is the
variable being assigned within awk.
TheCount=3
echo $VAR | awk -v counter=$TheCount '{print $counter}'
This produces the following output:
brown
CHAPTER 24
■
TEXT-PROCESSING ONE-LINERS
161
The -v switch is a relatively new option for assigning a variable, and it may not be ideal
when you’re shooting for portability. In that case, this usage should do the trick:

TheCount=3
echo $VAR | awk '{print $counter}' counter=$TheCount
It produces the following output:
brown
Using a Variable Passed to awk in a Condition
Here is another use of shell variables with the awk command. The NODE=$node assignment
sets the internal awk variable NODE to the value of the shell variable $node. The awk com-
mand then checks whether each line of the input file for $2 is equal to the value of NODE. If
a line is equal, then $3 is output. In this example, the /etc/hosts file was used. The code
works like that in the ”Simple Pattern-Matching” example shown earlier, except that the
value to compare against can be specified independently of the field that is output.
awk -v NODE=$node '$2 == NODE {print $3}' /etc/hosts
The output depends on the contents of your /etc/hosts file, but the intended effect is
to display the domain name corresponding to the specified node name. Try setting the
node variable to the name of your system before running this command. My system is
named casper and this is its hosts file entry:
172.16.5.4 casper casper.mydomain.com
Thus, if on some line in the /etc/hosts file, the system name stored in the node variable
is in field 2, then the third field of that line will be displayed. When I run this command
after setting the shell variable $node to casper, the output is the third field of the /etc/
hosts entry for casper: casper.mydomain.com.
Displaying a Range of Fields (Main Method)
Usually, printing a range of fields from an input line cannot be expressed using simple
syntax. Unless the range is fixed, you generally need to have awk loop through a previously
specified list of fields, printing each one in turn. In this example, the for loop starts with a
fixed field number (here, 3) and ends with the value of the NF variable. You can modify this
162
CHAPTER 24
■
TEXT-PROCESSING ONE-LINERS

easily to permit any range. The printf (formatted print) command in the body of the loop
prints the current field, followed by a space. The last print statement outside the loop
adds a final carriage return at the end of the output.
echo $VAR | awk '{for(i=3; i<=NF; i++) {printf "%s ",$i}; print ""}'
Here is the output:
brown fox jumped over the lazy dog.
Displaying a Range of Fields (Alternate Method)
One last use of external variables being passed to awk is related to potential problems
with awk versions. In some cases, the versions of awk, nawk, or gawk handle the -v switch
differently. There are also issues when passing variables that have spaces included in lit-
eral strings. Most awk commands from the command line are contained within single
quotes: '. When passing external shell variables to awk, in the space within the awk com-
mand where the variable containing spaces would normally be applied you should
embed the shell variable directly into the command by surrounding it with more single
quotes. In the following example, the awk command starts with a single quote and then
begins a for loop. The counter variable i is set to the initial value of 3 and will continue
to loop while i is less than or equal to $end. $end is a shell variable that is embedded
between two single quotes. The first of these quotes ends the initial awk statement and
the shell is then used to expand the value of the $end variable. The second single quote
that follows the $end variable reopens the awk command, which includes the loop incre-
ment value as well as the print statements. The final single quote ends the whole awk
statement.
This example is very simple and nearly the same as the range-printing solution. It illus-
trates the use of a shell variable within an awk command. The differences are that the
ending variable ($end) is passed from the shell environment and it is not contained within
the single quotes of the awk command. The shell variable $end is set to the value 6.
echo $VAR | awk '{for(i=3; i<='$end'; i++) {printf "%s ",$i}; print ""}'
Here is the output:
brown fox jumped over

Text-Processing One-Liners

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về