Tải bản đầy đủ (.pdf) (70 trang)

Mastering unix shell scripting phần 4 ppsx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (424.24 KB, 70 trang )

OS=$(uname)
case $OS in
AIX|HP-UX) SWITCH=’-t’
F1=3
F2=4
F3=5
F4=6
echo “\nThe Operating System is $OS\n”
;;
Linux|SunOS) SWITCH=’-c’
F1=1
F2=2
F3=3
F4=4
echo “\nThe Operating System is $OS\n”
;;
*) echo “\nERROR: $OS is not a supported operating system\n”
echo “\n\t EXITING \n”
exit 1
;;
esac
Listing 7.2 Case statement for the iostat fields of data.
Notice in Listing 7.2 that we use a single case statement to set up the environment
for the shell script to run the correct iostat command for each of the four Unix flavors.
If the Unix flavor is not in the list, then the user receives an error message before the
script exits with a return code of 1, one. Later we will cover the entire shell script.
Syntax for sar
The sar command stands for system activity report. Using the sar command we can take
direct sample intervals for a specific time period. For example, we can take 4 samples
that are 10 seconds each, and the sar command automatically averages the results for us.
Let’s look at the output of the sar command for each of our Unix flavors, AIX,


HP-UX, Linux, and Solaris.
AIX
# sar 10 4
AIX yogi 1 5 000125604800 07/26/02
17:44:54 %usr %sys %wio %idle
17:45:04 25 75 0 0
188 Chapter 7
17:45:14 25 75 0 0
17:45:24 26 74 0 0
17:45:34 25 75 0 0
Average 25 75 0 0
Now let’s look at the average of the samples directly.
# sar 10 4 | grep Average
Average 26 74 0 0
HP-UX
# sar 10 4
HP-UX dino B.10.20 A 9000/715 07/29/102
22:48:10 %usr %sys %wio %idle
22:48:20 40 60 0 0
22:48:30 40 60 0 0
22:48:40 12 19 0 68
22:48:50 0 0 0 100
Average 23 35 0 42
Now let’s only look at the average of the samples directly.
# sar 10 4 | grep Average
Average 25 37 0 38
Linux
# sar 10 4
Linux 2.4.2-2 (bambam) 07/29/2002
10:01:59 PM CPU %user %nice %system %idle

10:02:09 PM all 0.10 0.00 0.00 99.90
10:02:19 PM all 0.00 0.00 0.10 99.90
10:02:29 PM all 11.40 0.00 5.00 83.60
10:02:39 PM all 60.80 0.00 36.30 2.90
Average: all 18.07 0.00 10.35 71.58
Now let’s look at the average of the samples directly.
# sar 10 4 | grep Average
Average: all 18.07 0.00 10.35 71.58
Monitoring System Load 189
Solaris
# sar 10 4
SunOS wilma 5.8 Generic i86pc 07/29/02
23:01:55 %usr %sys %wio %idle
23:02:05 1 1 0 98
23:02:15 12 53 0 35
23:02:25 15 67 0 18
23:02:35 21 59 0 21
Average 12 45 0 43
Now let’s look at the average of the samples directly.
# sar 10 4 | grep Average
Average 12 45 0 43
What Is the Common Denominator?
With the sar command the only common denominator is that we can always grep on
the word “Average.” Like the iostat command, the fields vary between some Unix
flavors. We can use a similar case statement to extract the correct fields for each Unix
flavor, as shown in Listing 7.3.
OS=$(uname)
case $OS in
AIX|HP-UX|SunOS)
F1=2

F2=3
F3=4
F4=5
echo “\nThe Operating System is $OS\n”
;;
Linux)
F1=3
F2=4
F3=5
F4=6
echo “\nThe Operating System is $OS\n”
;;
*) echo “\nERROR: $OS is not a supported operating system\n”
echo “\n\t EXITING \n”
exit 1
;;
esac
Listing 7.3 Case statement for the sar fields of data.
190 Chapter 7
Notice in Listing 7.3 that a single case statement sets up the environment for the
shell script to select the correct fields from the sar command for each of the four Unix
flavors. If the Unix flavor is not in the list, then the user receives an error message
before the script exits with a return code of 1, one. Later we will cover the entire shell
script.
Syntax for vmstat
The vmstat command stands for virtual memory statistics. Using the vmstat command,
we can get a lot of data about the system including memory, paging space, page faults,
and CPU statistics. We are concentrating on the CPU statistics in this chapter, so let’s
stay on track. The vmstat commands also allow us to take direct samples over intervals
for a specific time period. The vmstat command does not do any averaging for us,

however, we are going to stick with two intervals. The first interval is the average of
the system load since the last system reboot, like the iostat command. The last line con-
tains the most current sample.
Let’s look at the output of the vmstat command for each of our Unix flavors, AIX,
HP-UX, Linux, and Solaris.
AIX
[root:yogi]@/scripts# vmstat 30 2
kthr memory page faults cpu

r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 23936 580 0 0 0 0 2 0 103 2715 713 8 25 67 0
1 0 23938 578 0 0 0 0 0 0 115 9942 2730 24 76 0 0
The last line of output is what we are looking for. This is the average of the CPU load
over the length of the interval. We want just the last four columns in the output. The
fields that we want to extract for AIX are in positions $14, $15, $16, and $17.
HP-UX
# vmstat 30 2
procs memory page faults cpu
r b w avm free re at pi po fr de sr in sy cs us sy id
0 39 0 8382 290 122 26 2 0 0 0 3 128 2014 146 14 21 65
1 40 0 7532 148 345 71 0 0 0 0 0 108 5550 379 29 43 27
The HP-UX vmstat output is a long string of data. Notice for the CPU data that HP-
UX supplies only three values: user part, system part, and the CPU idle time. The fields
that we want to extract are in positions $16, $17, and $18.
Monitoring System Load 191
Linux
# vmstat 30 2
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
2 0 0 244 1088 1676 21008 0 0 1 0 127 72 1 1 99

3 0 0 244 1132 1676 21008 0 0 0 1 212 530 37 23 40
Like HP-UX, the Linux vmstat output for CPU activity has three fields: user part,
system part, and the CPU idle time. The fields that we want to extract are in positions
$14, $15, and $16.
Solaris
# vmstat 30 2
procs memory page disk faults cpu
r b w swap free re mf pi po fr de sr cd f0 s0 in sy cs us sy id
0 0 0 558316 33036 57 433 2 0 0 0 0 0 0 0 0 111 500 77 2 8 90
0 0 0 556192 29992 387 2928 0 0 0 0 0 1 0 0 0 155 2711 273 14 60 26
As with HP-UX and Linux, the Solaris vmstat output for CPU activity consists of the
last three fields: user part, system part, and the CPU idle time.
What Is the Common Denominator?
There are at least two common denominators for the vmstat command output between
the Unix flavors. The first is that the CPU data is in the last fields. On AIX the data is in
the last four fields with the added I/O wait state. HP-UX, Linux, and Solaris do not list
the wait state. The second common factor is that the data is always on a row that is
entirely numeric. Again, we need a case statement to parse the correct fields for the
command output. Take a look at Listing 7.4.
OS=$(uname)
case $OS in
AIX)
F1=14
F2=15
F3=16
F4=17
echo “\nThe Operating System is $OS\n”
;;
Listing 7.4 Case statement for the vmstat fields of data.
192 Chapter 7

HP-UX)
F1=16
F2=17
F3=18
F4=1 # This “F4=1” is bogus and not used for HP-UX
echo “\nThe Operating System is $OS\n”
;;
Linux)
F1=14
F2=15
F3=16
F4=1 # This “F4=1” is bogus and not used for Linux
echo “\nThe Operating System is $OS\n”
;;
SunOS)
F1=20
F2=21
F3=22
F4=1 # This “F4=1” is bogus and not used for SunOS
echo “\nThe Operating System is $OS\n”
;;
*) echo “\nERROR: $OS is not a supported operating system\n”
echo “\n\t EXITING \n”
exit 1
;;
esac
Listing 7.4 Case statement for the vmstat fields of data. (continued)
Notice in Listing 7.4 that the F4 variable gets a valid assignment only on the AIX
match. For HP-UX, Linux, and Solaris, the F4 variable is assigned the value of the $1
field, specified by the F4=1 variable assignment. This bogus assignment is made so

that we do not need a special vmstat command statement for each operating system.
You will see how this works in detail in the scripting section.
Scripting the Solutions
Each of the techniques presented is slightly different in execution and output. Some
options need to be timed over an interval for a user-defined amount of time, measured
Monitoring System Load 193
in seconds. We can get an immediate load measurement using the uptime command,
but the sar, iostat, and vmstat commands require the user to specify a period of time to
measure over and the number of intervals to sample the load. If you enter the sar,
iostat, or vmstat commands without any arguments, then the statistics presented are
an average since the last system reboot. Because we want current statistics, the scripts
must supply a period of time to sample. We are always going to initialize the
INTERVAL variable to equal 2. The first line of output is measured since the last system
reboot, and the second line is the current data that we are looking for.
Let’s look at each of these commands in separate shell scripts in the following
sections.
Using uptime to Measure the System Load
Using uptime is one of the best indicators of the system load. The last columns of the
output represent the average of the run queue over the last 5, 10, and 15 minutes for an
AIX machine and over the last 1, 5, and 10 minutes for HP-UX, Linux, and Solaris. A
run queue is where jobs wanting CPU time line up for their turn for some processing
time in the CPU. The priority of the process, or on some systems a thread, has a direct
influence on how long a job has to wait in line before getting more CPU time. The
lower the priority, the more CPU time. The higher the priority, the less CPU time.
The uptime command always has an average of the length of the run queue. The
threshold trigger value that you set will depend on the normal load of your system. My
little C-10 AIX box starts getting very slow when the run queue hits 2, but the S-80 at
work typically runs with a run queue value over 8 because it is a multiprocessor
machine running a terabyte database. With these differences in acceptable run queue
levels, you will need to tailor the threshold level for notification on a machine-by-

machine basis.
Scripting with the uptime Command
Scripting the uptime solution is a short shell script, and the response is immediate. As
you remember in the “Syntax” section, we had to follow the floating load statistics as
the time since the last reboot moved from minutes, to hours, and even days after the
machine was rebooted. The good thing is that the floating fields are consistent across
the Unix flavors studied in this book. Let’s look at the uptime_loadmon.ksh shell
shown in Listing 7.5.
#!/bin/ksh
#
# SCRIPT: uptime_loadmon.ksh
# AUTHOR: Randy Michael
# DATE: 07/26/2002
# REV: 1.0.P
# PLATFORM: AIX, HP-UX, Linux, and Solaris
#
Listing 7.5 uptime_loadmon.ksh shell script listing.
194 Chapter 7
# PURPOSE: This shell script uses the “uptime” command to
# extract the most current load average data. There
# is a special need in this script to determine
# how long the system has been running since the
# last reboot. The load average field “floats”
# during the first 24 hours after a system restart.
#
# set -x # Uncomment to debug this shell script
# set -n # Uncomment to check script syntax without any execution
#
###################################################
############# DEFINE VARIABLES HERE ###############

###################################################
MAXLOAD=2.00
typeset -i INT_MAXLOAD=$MAXLOAD
# Find the correct field to extract based on how long
# the system has been up, or since the last reboot.
if $(uptime | grep day | grep min >/dev/null)
then
FIELD=11
elif $(uptime | grep day | grep hrs >/dev/null)
then
FIELD=11
elif $(uptime | grep day >/dev/null)
then
FIELD=10
elif $(uptime | grep min >/dev/null)
then
FIELD=9
else
FIELD=8
fi
###################################################
######## BEGIN GATHERING STATISTICS HERE ##########
###################################################
echo “\nGathering System Load Average using the \”uptime\” command\n”
# This next command statement extracts the latest
# load statistics no matter what the Unix flavor is.
LOAD=$(uptime | sed s/,//g | awk ‘{print $’$FIELD’}’)
Listing 7.5 uptime_loadmon.ksh shell script listing. (continues)
Monitoring System Load 195
# We need an integer representation of the $LOAD

# variable to do the test for the load going over
# the set threshold defined by the $INT_MAXLOAD
# variable
typeset -i INT_LOAD=$LOAD
# If the current load has exceeded the threshold then
# issue a warning message. The next step always shows
# the user what the current load and threshold values
# are set to.
((INT_LOAD >= INT_MAXLOAD)) && echo “\nWARNING: System load has \
reached ${LOAD}\n”
echo “\nSystem load value is currently at ${LOAD}”
echo “The load threshold is set to ${MAXLOAD}\n”
Listing 7.5 uptime_loadmon.ksh shell script listing. (continued)
There are two statements that I want to point out in Listing 7.5 that are highlighted
in boldface text. First, notice the LOAD= statement. To make the variable assignment we
use command substitution, defined by the VAR=$(command statement) notation.
In the command statement we execute the uptime command and pipe the output to a
sed statement. This sed statement removes all of the commas (,) from the uptime out-
put. We need to take this step because the load statistics are comma separated. Once
the commas are removed, the remaining output is piped to the awk statement that
extracts the correct field that is defined at the top of the shell script by the FIELD vari-
able and based on how long the system has been running.
In this awk statement notice how we find the positional parameter that the $FIELD
variable is pointing to. If you try to use the syntax $$FIELD, the result is the current
process ID ($$) and the word FIELD. To get around this little problem of directly access-
ing what a variable is pointing to, we use the following syntax:
# The $8 variable points to the value 34.
FIELD=8
# Wrong usage
echo $$FIELD

3243FIELD
# Correct usage
echo $’$FIELD’
34
196 Chapter 7
Notice that the latter usage is correct, and the actual result is the value of the $8 field,
which is currently 34. This is really telling us the value of what a pointer is pointing to.
You will see other uses of this technique as we go through this chapter.
The second command statement that I want to point out is the test of the INT_LOAD
value to the INT_MAXLOAD value, which are integer values of the LOAD and MAXLOAD
variables. If the INT_LOAD is equal to, or has exceeded, the INT_MAXLOAD, then we
use a logical AND (&&) to echo a warning to the user’s screen. Using the logical AND
saves a little code and is faster than an if then else statement.
You can see the uptime_loadmon.ksh shell script in action in Listings 7.6 and 7.7.
# ./uptime_loadmon.ksh
Gathering System Load Average using the “uptime” command
System load value is currently at 1.86
The load threshold is set to 2.00
Listing 7.6 Script in action under “normal” load.
Listing 7.6 shows the uptime_loadmon.ksh shell script in action on a machine
that is under a normal load. Listing 7.7 shows the same machine under an excessive
load—at least, it is excessive for this little machine.
# ./uptime_loadmon.ksh
Gathering System Load Average using the “uptime” command
WARNING: System load has reached 2.97
System load value is currently at 2.97
The load threshold is set to 2.00
Listing 7.7 Script in action under “excessive” load.
This is about all there is to using the uptime command. Let’s move on to the sar
command.

Using sar to Measure the System Load
Most Unix flavors have sar data collection set up by default. This sar data is presented
when the sar command is executed without any switches. The data that is displayed is
automatically collected at scheduled intervals throughout the day and compiled into a
Monitoring System Load 197
report at day’s end. By default, the system keeps a month’s worth of data available for
online viewing. This is great for seeing the basic trends of the machine as it is loaded
through the day. If we want to collect data at a specific time of day for a specific period
of time, then we need to add the number of seconds for each interval and the total
number of intervals to the sar command. The final line in the output is an average of all
of the previous sample intervals.
This is where our shell script comes into play. By using a shell script with the times
and intervals defined, we can take samples of the system load over small or large incre-
ments of time without interfering with the system’s collection of sar data. This can be
a valuable tool for things like taking hundreds of small incremental samples as a devel-
opment application is being tested. Of course, this technique can also help in trou-
bleshooting just about any application. Let’s look at how we script the solution.
Scripting with the sar Command
For each of our Unix flavors the sar command produces four CPU load statistics. The
outputs vary somewhat, but the basic idea remains the same. In each case, we define
an INTERVAL variable specifying the total number of samples to take and a SECS vari-
able to define the total number of seconds for each sample interval. Notice that we
used the variable SECS as opposed to SECONDS. We do not want to use the variable
SECONDS because it is a Korn shell built-in variable used for timing in a shell. As I
stated in the introduction, this book uses variable names in uppercase so the reader
will quickly know that the code is referencing a variable; however, in the real world
you may want to use the lowercase version of the variable name. It really would not
matter here because we are defining the variable value and then using it within the
same second, hopefully.
The next step in this shell script is to define which positional fields we need to

extract to get the sar data for each of the Unix operating systems. For this step we use
a case statement using the uname command output to define the fields of data. It turns
out that AIX, HP-UX, and SunOS operating systems all have the sar data located in the
$2, $3, $4, and $5 positions. Linux differs in this respect with the sar data residing in the
$3, $4, $5, and $6 positions. In each case, these field numbers are assigned to the F1, F2,
F3, and F4 variables inside the case statement.
Let’s look at the sar_loadmon.ksh shell script in Listing 7.8 and cover the remain-
ing details at the end.
#!/bin/ksh
#
# SCRIPT: sar_loadmon.ksh
# AUTHOR: Randy Michael
# DATE: 07/26/2002
# REV: 1.0.P
# PLATFORM: AIX, HP-UX, Linux, and Solaris
#
Listing 7.8 sar_loadmon.ksh shell script listing.
198 Chapter 7
# PURPOSE: This shell script takes multiple samples of the CPU
# usage using the “sar” command. The average of
# sample periods is shown to the user based on the
# Unix operating system that this shell script is
# executing on. Different Unix flavors have differing
# outputs and the fields vary too.
#
# REV LIST:
#
#
# set -n # Uncomment to check the script syntax without any execution
# set -x # Uncomment to debug this shell script

#
###################################################
############# DEFINE VARIABLES HERE ###############
###################################################
SECS=30 # Defines the number of seconds for each sample
INTERVAL=10 # Defines the total number of sampling intervals
OS=$(uname) # Defines the Unix flavor
###################################################
##### SETUP THE ENVIRONMENT FOR EACH OS HERE ######
###################################################
# These “F-numbers” point to the correct field in the
# command output for each Unix flavor.
case $OS in
AIX|HP-UX|SunOS)
F1=2
F2=3
F3=4
F4=5
echo “\nThe Operating System is $OS\n”
;;
Linux)
F1=3
F2=4
F3=5
F4=6
echo “\nThe Operating System is $OS\n”
;;
*) echo “\nERROR: $OS is not a supported operating system\n”
echo “\n\t EXITING \n”
exit 1

;;
Listing 7.8 sar_loadmon.ksh shell script listing. (continues)
Monitoring System Load 199
esac
###################################################
######## BEGIN GATHERING STATISTICS HERE ##########
###################################################
echo “Gathering CPU Statistics using sar \n”
echo “There are $INTERVAL sampling periods with”
echo “each interval lasting $SECS seconds”
echo “\n Please wait while gathering statistics \n”
# This “sar” command takes $INTERVAL samples, each lasting
# $SECS seconds. The average of this output is captured.
sar $SECS $INTERVAL | grep Average \
| awk ‘{print $’$F1’, $’$F2’, $’$F3’, $’$F4’}’ \
| while read FIRST SECOND THIRD FOURTH
do
# Based on the Unix Flavor, tell the user the
# result of the statistics gathered.
case $OS in
AIX|HP-UX|SunOS)
echo “\nUser part is ${FIRST}%”
echo “System part is ${SECOND}%”
echo “I/O Wait is ${THIRD}%”
echo “Idle time is ${FOURTH}%\n”
;;
Linux)
echo “\nUser part is ${FIRST}%”
echo “Nice part is ${SECOND}%”
echo “System part is ${THIRD}%”

echo “Idle time is ${FOURTH}%\n”
;;
esac
done
Listing 7.8 sar_loadmon.ksh shell script listing. (continued)
In the shell script in Listing 7.8 we start by defining the data time intervals. In these
definitions we are taking 10 interval samples of 30 seconds each, for a total of 300 sec-
onds, or 5 minutes. Then we grab the Unix flavor using the uname command and
assigning the operating system value to the OS variable. Following these definitions
we define the data fields that contain the sar data for each operating system. In this
case Linux is the oddball with an offset of one position.
200 Chapter 7
Now we get to the interesting part where we actually take the data sample. Look at
the following sar command statement, and we will decipher how it works.
sar $SECS $INTERVAL | grep Average \
| awk ‘{print $’$F1’, $’$F2’, $’$F3’, $’$F4’}’ \
| while read FIRST SECOND THIRD FOURTH
We really need to look at the statement one pipe at a time. In the very first part of the
statement we take the sample(s) over the defined number of intervals. Consider the
following statement and output:
SECS=30
INTERVAL=10
# sar $SECS $INTERVAL
AIX yogi 1 5 000125604800 07/31/02
19:24:00 %usr %sys %wio %idle
19:24:30 0 1 1 98
19:25:00 4 15 13 68
19:25:30 26 28 40 6
19:26:00 13 12 11 64
19:26:30 16 44 0 39

19:27:00 27 73 0 0
19:27:30 20 48 2 30
19:28:00 5 6 9 80
19:28:30 11 9 5 75
19:29:00 9 18 0 73
Average 13 26 8 53
The previous output is produced by the first part of the sar command statement.
Then, all of this output is piped to the next part of the statement, as shown here:
sar $SECS $INTERVAL | grep Average
Average 13 26 8 53
Now we have the row of data that we want to work with, which we grepped out
using the word Average as a pattern match. The next step is to extract the positional
fields that contain the data for user, system, I/O wait, and idle time for AIX. Remem-
ber in the previous script section that we defined the field numbers and assigned them
to the F1, F2, F3, and F4 variables, which in our case results in F1=2, F2=3, F3=4, and
F4=5. Using the following extension to our previous command we get the following
statement:
sar $SECS $INTERVAL | grep Average \
| awk ‘{print $’$F1’, $’$F2’, $’$F3’, $’$F4’}’
Monitoring System Load 201
Notice that we continued the command statement on the next line by placing a back-
slash (\) at the end of the first line of the statement. In the awk part of the statement
you can see a confusing list of dollar signs and "F" variables. The purpose of this set of
characters is to directly access what the "F" variables are pointing to. Let’s run through
this in detail by example.
The F1 variable has the value 2 assigned to it. This value is the positional location of
the first data field that we want to extract. So we want to access the value at the $2 posi-
tion. Makes sense? When we extract the $2 data we get the value 13, as defined in the
previous step. Instead of going in this roundabout method, we want to directly access
the field that the F1 variable points to. Just remember that a variable is only a pointer to

a value, nothing more! We want to point directly to what another variable is pointing
to. The solution is to use the following syntax:
$’$F1’
OR
$\$F1
In any case, the innermost pointer ($) must be escaped, which removes the special
meaning. For this shell script we use the $’$F1’ notation. The result of this notation,
in this example, is 13, which is the value that we want. This is not smoke and mirrors
when you understand how it works.
The final part of the sar command statement is to pipe the four data fields to a while
loop so that we can do something with the data, which is where we end the sar state-
ment and enter the while loop.
The only thing that we do in the while loop is to display the results based on the
Unix flavor. The sar_loadmon.ksh shell script is in action in Listing 7.9.
# ./sar_loadmon.ksh
The Operating System is AIX
Gathering CPU Statistics using sar
There are 10 sampling periods with
each interval lasting 30 seconds
Please wait while gathering statistics
User part is 13%
System part is 26%
I/O wait state is 8%
Idle time is 53%
Listing 7.9 sar_loadmon.ksh shell script in action.
202 Chapter 7
From the output presented in Listing 7.9 you can see that the shell script queries the
system for its operating system, which is AIX here. Then the user is notified of the sam-
pling periods and the length of each sample period. The output is displayed to the user
by field. That is it for using the sar command. Now let’s move on to the iostat command.

Using iostat to Measure the System Load
The iostat command is mostly used to collect disk storage statistics, but by using
the -t, or -c command switch, depending on the operating system, we can see the CPU
statistics as we saw them in the syntax section for the iostat command. We are going to
create a shell script using the iostat command and use almost the same technique as we
did in the last section.
Scripting with the iostat Command
In this shell script we are going to use a very similar technique to the sar shell script in
the previous section. The difference is that we are going to take only two intervals with
a long sampling period. As an example, the INTERVAL variable is set to 2, and the
SECS variable is set to 300 seconds, which is 5 minutes. Also, because we have two
possible switch values, -t and -c, we need to add a new variable called SWITCH. Let’s
look at the iostat_loadmon.ksh shell script in Listing 7.10, and we will cover the
differences at the end in more detail.
#!/bin/ksh
#
# SCRIPT: iostat_loadmon.ksh
# AUTHOR: Randy Michael
# DATE: 07/26/2002
# REV: 1.0.P
# PLATFORM: AIX, HP-UX, Linux, and Solaris
#
# PURPOSE: This shell script take two samples of the CPU
# usage using the “iostat” command. The first set of
# data is an average since the last system reboot. The
# second set of data is an average over the sampling
# period, or $INTERVAL. The result of the data acquired
# during the sampling period is shown to the user based
# on the Unix operating system that this shell script is
# executing on. Different Unix flavors have differing

# outputs and the fields vary too.
#
# REV LIST:
#
#
Listing 7.10 iostat_loadmon.ksh shell script listing. (continues)
Monitoring System Load 203
# set -n # Uncomment to check the script syntax without any execution
# set -x # Uncomment to debug this shell script
#
###################################################
############# DEFINE VARIABLES HERE ###############
###################################################
SECS=300 # Defines the number of seconds for each sample
INTERVAL=2 # Defines the total number of sampling intervals
STATCOUNT=0 # Initializes a loop counter to 0, zero
OS=$(uname) # Defines the Unix flavor
###################################################
##### SETUP THE ENVIRONMENT FOR EACH OS HERE ######
###################################################
# These “F-numbers” point to the correct field in the
# command output for each Unix flavor.
case $OS in
AIX|HP-UX) SWITCH=’-t’
F1=3
F2=4
F3=5
F4=6
echo “\nThe Operating System is $OS\n”
;;

Linux|SunOS) SWITCH=’-c’
F1=1
F2=2
F3=3
F4=4
echo “\nThe Operating System is $OS\n”
;;
*) echo “\nERROR: $OS is not a supported operating system\n”
echo “\n\t EXITING \n”
exit 1
;;
esac
###################################################
######## BEGIN GATHERING STATISTICS HERE ##########
###################################################
echo “Gathering CPU Statistics using vmstat \n”
Listing 7.10 iostat_loadmon.ksh shell script listing. (continued)
204 Chapter 7
echo “There are $INTERVAL sampling periods with”
echo “each interval lasting $SECS seconds”
echo “\n Please wait while gathering statistics \n”
# Use “iostat” to monitor the CPU utilization and
# remove all lines that contain alphabetic characters
# and blank spaces. Then use the previously defined
# field numbers, for example, F1=4,to point directly
# to the 4th position, for this example. The syntax
# for this techniques is ==> $’$F1’.
iostat $SWITCH $SECS $INTERVAL | egrep -v ‘[a-zA-Z]|^$’ \
| awk ‘{print $’$F1’, $’$F2’, $’$F3’, $’$F4’}’ \
| while read FIRST SECOND THIRD FOURTH

do
if ((STATCOUNT == 1)) # Loop counter to get the second set
then # of data produced by “iostat”
case $OS in # Show the results based on the Unix flavor
AIX)
echo “\nUser part is ${FIRST}%”
echo “System part is ${SECOND}%”
echo “Idle part is ${THIRD}%”
echo “I/O wait state is ${FOURTH}%\n”
;;
HP-UX|Linux)
echo “\nUser part is ${FIRST}%”
echo “Nice part is ${SECOND}%”
echo “System part is ${THIRD}%”
echo “Idle time is ${FOURTH}%\n”
;;
SunOS)
echo “\nUser part is ${FIRST}%”
echo “System part is ${SECOND}%”
echo “I/O Wait is ${THIRD}%”
echo “Idle time is ${FOURTH}%\n”
;;
esac
fi
((STATCOUNT = STATCOUNT + 1)) # Increment the loop counter
done
Listing 7.10 iostat_loadmon.ksh shell script listing. (continued)
The similarities are striking between the sar implementation and the iostat script
shown in Listing 7.10. At the top of the shell script we define an extra variable,
Monitoring System Load 205

STATCOUNT. This variable is used as a loop counter, and it is initialized to 0, zero. We
need this counter because we have only two intervals, and the first line of the output is
the load average since the last system reboot. The second, and final, set of data is the
CPU load statistics collected during our sampling period, so it is the most current data.
Using a counter variable, STATCOUNT, we collect the data and assign it to variables on
the second loop iteration, or when the STATCOUNT is equal to 1, one.
In the next section we use the Unix flavor given by the uname command in a case
statement to assign the correct switch to use in the iostat command. This is also where
the F1, F2, F3, and F4 variables are defined with the positional placement of the data
we want to extract from the command output.
Now comes the fun part. Let’s look at the iostat command statement we use to
extract the CPU statistics here.
iostat $SWITCH $SECS $INTERVAL | egrep -v ‘[a-zA-Z]|^$’ \
| awk ‘{print $’$F1’, $’$F2’, $’$F3’, $’$F4’}’ \
| while read FIRST SECOND THIRD FOURTH
The beginning of the iostat command statement uses the correct command switch,
as defined by the operating system, and the sampling time and the number of inter-
vals, which is two this time. From this first part of the iostat statement we get the fol-
lowing output on a Linux system.
SWITCH=’-c’
SECS=300
INTERVAL=2
iostat $SWITCH $SECS $INTERVAL
Linux 2.4.2-2 (bambam) 07/31/2002
avg-cpu: %user %nice %sys %idle
23.15 0.00 26.09 50.76
avg-cpu: %user %nice %sys %idle
31.77 0.00 21.79 46.44
Remember that the first row of data is an average of the CPU load since the last sys-
tem reboot, so we are interested in the last row of output. If you remember from the

syntax section for the iostat command, the common denominator for this output is that
the data rows are entirely numeric characters. Using this as a criteria to extract data, we
add to our iostat command statement as shown here.
iostat $SWITCH $SECS $INTERVAL | egrep -v ‘[a-zA-Z]|^$’
The egrep addition to the previous command statement does two things for us.
First, it excludes all lines of the output that have alphabetic characters, leaving only the
rows with numbers. The second thing we get is the removal of all blank lines from the
output. Let’s look at each of these.
206 Chapter 7
To omit the alpha characters we use the egrep command with the -v option, which
says to display everything in the output except the rows that the pattern matched. To
specify all alpha characters we use the following expression:
[a-zA-Z]
Then to remove all blank lines we use the expression:
^$
The caret character means begins with, and to specify blank lines we use the dollar
sign ($). If you wanted to remove all of the lines in a file that are commented out with
a hash mark (#), then use ^#.
When we join these two expressions in a single extended grep (egrep), we get the
following extended regular expression:
egrep -v ‘[a-zA-Z]|^$’
At this point we are left with the following output:
23.15 0.00 26.09 50.76
31.77 0.00 21.79 46.44
This brings us to the next addition to the iostat command statement in the shell
script. This is where we add the awk part of the statement using the F1, F2, F3, and F4
variables, as shown here.
iostat $SWITCH $SECS $INTERVAL | egrep -v ‘[a-zA-Z]|^$’ \
| awk ‘{print $’$F1’, $’$F2’, $’$F3’, $’$F4’}’
This is the same code that we covered in the last section, where we point directly to

what another pointer is pointing to. For Linux F1=1, F2=2, F3=3, and F4=4. With this
information we know that $’$F1’ on the first line of output is equal to 23.15, and on
the second row this same expression is equal to 31.77. Now that we have the values
we have a final pipe to a while loop. Remember that in the while loop we have added
a loop counter, STATCOUNT. On the first loop iteration, the while loop does nothing.
On the second loop iteration, the values 31.77, 0.00, 21.79, and 46.44 are assigned
to the variables FIRST, SECOND, THIRD, and FOURTH, respectively.
Using another case statement with the $OS value the output is presented to the user
based on the operating system fields, as shown in Listing 7.11.
The Operating System is Linux
Gathering CPU Statistics using vmstat
There are 2 sampling periods with
Listing 7.11 iostat_loadmon.ksh shell script in action. (continues)
Monitoring System Load 207
each interval lasting 300 seconds
Please wait while gathering statistics
User part is 39.35%
Nice part is 0.00%
System part is 31.59%
Idle time is 29.06%
Listing 7.11 iostat_loadmon.ksh shell script in action. (continued)
Notice that the output is in the same format as the sar script output. This is all there
is to the iostat shell script. Let’s now move on to the vmstat solution.
Using vmstat to Measure the System Load
The vmstat shell script uses the exact same technique as the iostat shell script in the
previous section. Only AIX produces four fields of output; the remaining Unix flavors
have only three data points to measure for the CPU load statistics. The rest of the
vmstat output is for virtual memory statistics, which is the main purpose of this com-
mand anyway. Let’s look at the vmstat script.
Scripting with the vmstat Command

When you look at this shell script for vmstat you will think that you just saw this shell
script in the last section. Most of these two shell scripts are the same, with only minor
exceptions. Let’s look at the vmstat_loadmon.ksh shell script in Listing 7.12 and
cover the differences in detail at the end.
#!/bin/ksh
#
# SCRIPT: vmstat_loadmon.ksh
# AUTHOR: Randy Michael
# DATE: 07/26/2002
# REV: 1.0.P
# PLATFORM: AIX, HP-UX, Linux, and Solaris
#
# PURPOSE: This shell script takes two samples of the CPU
# usage using the “vmstat” command. The first set of
# data is an average since the last system reboot. The
# second set of data is an average over the sampling
Listing 7.12 vmstat_loadmon.ksh shell script listing.
208 Chapter 7
# period, or $INTERVAL. The result of the data acquired
# during the sampling perion is shown to the user based
# on the Unix operating system that this shell script is
# executing on. Different Unix flavors have differing
# outputs and the fields vary too.
#
# REV LIST:
#
#
# set -n # Uncomment to check the script syntax without any execution
# set -x # Uncomment to debug this shell script
#

###################################################
############# DEFINE VARIABLES HERE ###############
###################################################
SECS=300 # Defines the number of seconds for each sample
INTERVAL=2 # Defines the total number of sampling intervals
STATCOUNT=0 # Initializes a loop counter to 0, zero
OS=$(uname) # Defines the Unix flavor
###################################################
##### SETUP THE ENVIRONMENT FOR EACH OS HERE ######
###################################################
# These “F-numbers” point to the correct field in the
# command output for each Unix flavor.
case $OS in
AIX) # AIX has four relative columns in the output
F1=14
F2=15
F3=16
F4=17
echo “\nThe Operating System is $OS\n”
;;
HP-UX) # HP-UX has only three relative columns in the output
F1=16
F2=17
F3=18
F4=1 # This “F4=1” is bogus and not used for HP-UX
echo “\nThe Operating System is $OS\n”
;;
Linux) # Linux has only three relative columns in the output
F1=14
F2=15

Listing 7.12 vmstat_loadmon.ksh shell script listing. (continues)
Monitoring System Load 209
F3=16
F4=1 # This “F4=1” is bogus and not used for Linux
echo “\nThe Operating System is $OS\n”
;;
SunOS) # SunOS has only three relative columns in the output
F1=20
F2=21
F3=22
F4=1 # This “F4=1” is bogus and not used for SunOS
echo “\nThe Operating System is $OS\n”
;;
*) echo “\nERROR: $OS is not a supported operating system\n”
echo “\n\t EXITING \n”
exit 1
;;
esac
###################################################
######## BEGIN GATHERING STATISTICS HERE ##########
###################################################
echo “Gathering CPU Statistics using vmstat \n”
echo “There are $INTERVAL sampling periods with”
echo “each interval lasting $SECS seconds”
echo “\n Please wait while gathering statistics \n”
# Use “vmstat” to monitor the CPU utilization and
# remove all lines that contain alphabetic characters
# and blank spaces. Then use the previously defined
# field numbers, for example F1=20,to point directly
# to the 20th position, for this example. The syntax

# for this technique is ==> $’$F1’ and points directly
# to the $20 positional parameter.
vmstat $SECS $INTERVAL | egrep -v ‘[a-zA-Z]|^$’ \
| awk ‘{print $’$F1’, $’$F2’, $’$F3’, $’$F4’}’ \
| while read FIRST SECOND THIRD FOURTH
do
if ((STATCOUNT == 1)) # Loop counter to get the second set
then # of data produced by “vmstat”
case $OS in # Show the results based on the Unix flavor
AIX)
echo “\nUser part is ${FIRST}%”
Listing 7.12 vmstat_loadmon.ksh shell script listing. (continued)
210 Chapter 7
echo “System part is ${SECOND}%”
echo “Idle part is ${THIRD}%”
echo “I/O wait state is ${FOURTH}%\n”
;;
HP-UX|Linux|SunOS)
echo “\nUser part is ${FIRST}%”
echo “System part is ${SECOND}%”
echo “Idle time is ${THIRD}%\n”
;;
esac
fi
((STATCOUNT = STATCOUNT + 1)) # Increment the loop counter
done
Listing 7.12 vmstat_loadmon.ksh shell script listing. (continued)
We use the same variables in Listing 7.12 as we did in Listing 7.10 with the iostat
script. The differences come when we define the “F” variables to indicate the fields to
extract from the output and the presentation of the data to the user. As I stated before,

only AIX produces a fourth field output.
In the first case statement, where we assign the F1, F2, F3, and F4 variables to the
field positions that we want to extract for each operating system, notice that only AIX
assigns F4 variable to a valid field. HP-UX, Linux, and SunOS all have the F4 variable
assigned the field #1, F4=1. I did it this way so that I would not have to rewrite the
vmstat command statement for a second time to extract just three fields. This method
helps to make the code shorter and less confusing—at least I hope it is less confusing!
There is a comment next to each F4 variable assignment that states that this field
assignment is bogus and not used in the shell script.
Other than these minor changes the shell script for the vmstat solution is the same
as the solution for the iostat command. The vmstat_loadmon.ksh shell script is in
action in Listing 7.13 on a Solaris machine.
# ./vmstat_loadmon.ksh
The Operating System is SunOS
Gathering CPU Statistics using vmstat
There are 2 sampling periods with
Listing 7.13 vmstat_loadmon.ksh shell script in action. (continues)
Monitoring System Load 211
each interval lasting 300 seconds
Please wait while gathering statistics
User part is 14%
System part is 54%
Idle time is 31%
Listing 7.13 vmstat_loadmon.ksh shell script in action. (continued)
Notice that the Solaris output shown in Listing 7.13 does not show the I/O wait
state. This information is available only on AIX for the vmstat shell script. The output
format is the same as the last few shell scripts. It is up to you how you want to use this
information. Let’s look at some other options that you may be interested in next.
Other Options to Consider
As with any shell script there is always room for improvement, and this set of shell

scripts is no exception. I have a few suggestions, but I’m sure that you can think of a
few more.
Stop Chasing the Floating uptime Field
In the uptime CPU load monitoring shell script we did not really have to trace down
the location of the latest CPU statistics. Another approach is to use what we know
always to be true. Specifically, we know that the field of interest is always in the third
position field from the end of the uptime command output. Using this knowledge we
can use this little function, get_max, to find the total number of fields in the output. If
we subtract 2 from the total number of positions, then we always have the correct field.
The next code segment is an example of using this technique.
function get_max
{
(($# == 0)) && return -1
echo $#
}
########### MAIN ##############
MAX=$(get_max $(uptime)) # Get the total number of fields in uptime
212 Chapter 7

×