Tải bản đầy đủ (.pdf) (10 trang)

Software Engineering For Students: A Programming Approach Part 28 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (164.88 KB, 10 trang )

248 Chapter 17 ■ Software robustness
Let us turn to examining how an exception is thrown, using the same example. In
Java, the method
parseInt can be written as follows:
public int parseInt(String string) throws NumberFormatException {
int number = 0;
for (int i = 0; i < string.length(); i++) {
char c = string.charAt(i);
if (c < '0' || c > '9') throw new NumberFormatException();
number = number * 10 + (c - '0');
}
return number;
}
You can see that in the heading of the method the exception that may be thrown
is declared, along with the specification of any parameters and return value. If this
method detects that any of the characters within the string are illegal, it executes a
throw instruction. This immediately terminates the method and transfers control to
a
catch block designed to handle the exception. In our example, the catch block
is within the method that calls
parseInt. Alternatively the try-catch combination
can be written within the same method as the throw statement. Or it can be writ-
ten within any of the methods in the calling chain that led to calling
parseInt. Thus
the designer can choose an appropriate place in the software structure at which to
carry exception handling. The position in which the exception handler is written
helps both to determine the action to be taken and what happens after it has dealt
with the situation.
>
>
SELF-TEST QUESTION


17.7 The method parseInt does not throw an exception if the string is
of zero length. Amend it so that it throws the same exception in this
situation.
What happens after an exception has been handled? In the above example, the
catch
block ends with a return statement, which exits from the current method,
actionPerformed and returns control to its caller. This is the appropriate action in
this case – the program is able to recover and continue in a useful way. In general the
options are either to recover from the exception and continue or to allow the program
to gracefully degrade. The Java language mechanism supports various actions:
■ handle the exception. Control flow then either continues on down the program or
the method can be exited using a
return statement.
■ ignore the exception. This is highly dangerous and always leads to tears, probably
after the software has been put into use.
BELL_C17.QXD 1/30/05 4:24 PM Page 248
17.6 Recovery blocks 249
In the above example, the application program itself detected the exception.
Sometimes, however, it is the operating system or the hardware that detects an excep-
tion. An example is an attempt to divide by zero, which would typically be detected by
the hardware. The hardware would alert the run-time system or operating system,
which in turn would enter any exception handler associated with this exception.
The mechanism described above is the exception handling facility provided in Java.
Similar mechanisms are provided in Ada and C++.
In old software systems the simplest solution to handling exceptions was to resort to
the use of a
goto statement to transfer control out of the immediate locality and into
a piece of coding designed to handle the situation. The use of a
goto was particularly
appealing when the unusual situation occurred deep within a set of method calls. The

throw statement has been criticized as being a goto statement in disguise. The
response is that
throw is indeed a “structured goto”, but that its use is restricted to
dealing with errors and therefore it cannot be used in an undisciplined way.
In summary, exception handlers allow software to cope with unusual, but anticipated,
events. The software can take appropriate remedial action and continue with its tasks.
Exception handlers therefore provide a mechanism for forward error recovery. In Java,
the mechanism consists of three ingredients:
1. a
try block, in which the program attempts to behave normally
2. the program
throws an exception
3. a
catch block handles the exceptional situation.
Recovery blocks are a way of structuring backward error recovery to cope with unantic-
ipated faults. In backward error recovery, periodic dumps of the state of the system are
made at recovery points. When a fault is detected, the system is restored to its state at the
most recent recovery point. (The assumption is that this is a correct state of the system.)
The system now continues on from the recovery point, using some alternative course
of action so as to avoid the original problem.
An analogy: if you trip on a banana skin and spill your coffee, you can make a fresh
cup (restore the state of the system) and carry on (carefully avoiding the banana skin).
17.6

Recovery blocks
SELF-TEST QUESTION
17.8 What happens if the return statement is omitted in the above example
of the exception handler?
■ throw another exception. This passes the buck to another exception handler further
up the call chain, which the designer considers to be a more appropriate place to

handle the exception.
BELL_C17.QXD 1/30/05 4:24 PM Page 249
250 Chapter 17 ■ Software robustness
As shown in Figure 17.3, backward error recovery needs:
1. the primary software component that is normally expected to work
2. a check that it has worked correctly
3. an alternative piece of software that can be used in the event of the failure of the
primary module.
We also need, of course, a mechanism for taking dumps of the system state and for
restoring the system state. The recovery block notation embodies all of these features.
Taking as an example a program that uses a method to sort some information, a fault
tolerant fragment of program looks like this:
ensure dataStillValid
by
superSort
else by
quickSort
else by
slowButSureSort
else error
Here supersort is the primary component. When it has tried to sort the infor-
mation, the method
dataStillValid tests to see whether a failure occurred. If there
was a fault, the state of the program is restored to what it was before the sort method
was executed. The alternative method
quickSort is then executed. Should this now
fail, a third alternative is provided. If this fails, there is no other alternative available,
and the whole component has failed. This does not necessarily mean that the whole
program will fail, as there may be other recovery blocks programmed by the user of
this sort module.

What kinds of fault is this scheme designed to cope with? The recovery block mech-
anism is designed primarily to deal with unanticipated faults that arise from bugs
(design faults) in the software. When a piece of software is complete, it is to be expected
that there will be residual faults in it, but what cannot be anticipated is the whereabouts
of the bugs.
User
Normal
module
Checking
module
Alternative
module
Figure 17.3 Components in a recovery block scheme
>
>
BELL_C17.QXD 1/30/05 4:24 PM Page 250
17.6 Recovery blocks 251
Recovery blocks will, however, also cope with hardware faults. For example, suppose
that a fault develops in the region of main memory containing the primary sort method.
The recovery block mechanism can then recover by switching over to an alternative
method. There are stories that the developers of the recovery block mechanism at
Newcastle University, England, used to invite visitors to remove memory boards from
a live computer and observe that the computer continued apparently unaffected.
We now examine some of the other aspects of recovery blocks.
The acceptance test
You might think that acceptance tests would be cumbersome methods, incurring high
overheads, but this need not be so. Consider for example a method to calculate a square
root. A method to check the outcome, simply by multiplying the answer by itself, is short
and fast. Often, however, an acceptance test cannot be completely foolproof – because
of the performance overhead. Take the example of the sort method. The acceptance test

could check that the information had been sorted, that is, is in sequence. However, this
does not guarantee that items have not been lost or created. An acceptance test, there-
fore, does not normally attempt to ensure the correctness of the software, but instead
carries out a check to see whether the results are acceptably good.
Note that if a fault like division by zero, a protection violation, an array subscript out
of range occurs while one of the sort methods is being executed, then these also con-
stitute the result of checks on the behavior of the software. (These are checks carried
out by the hardware or the run-time system.) Thus either software acceptance tests or
hardware checks can trigger fault tolerance.
The alternatives
The software components provided as backups must accomplish the same end as the
primary module. But they should achieve this by means of a different algorithm so that
the same problem doesn’t arise. Ideally the alternatives should be developed by differ-
ent programmers, so that they are not unwittingly sharing assumptions. The alterna-
tives should also be less complex than the primary, so that they will be less likely to fail.
For this reason they will probably be poorer in their performance (speed).
Another approach is to create alternatives that provide an increasingly degraded service.
This allows the system to exhibit what is termed graceful degradation. As an example of
graceful degradation, consider a steel rolling mill in which a computer controls a machine
that chops off the required lengths of steel. Normally the computer employs a sophisticat-
ed algorithm to make optimum use of the steel, while satisfying customers’ orders. Should
this algorithm fail, a simpler algorithm can be used that processes the orders strictly
sequentially. This means that the system will keep going, albeit less efficiently.
Implementation
The language constructs of the recovery block mechanism hide the preservation of vari-
ables. The programmer does not need to explicitly declare which variables should be
stored and when. The system must save values before any of the alternatives is executed,
BELL_C17.QXD 1/30/05 4:24 PM Page 251
252 Chapter 17 ■ Software robustness
and restore them should any of the alternatives fail. Although this may seem a formidable

task, only the values of variables that are changed need to be preserved, and the nota-
tion highlights which ones these are. Variables local to the alternatives need not be
stored, nor need parameters passed by value. Only global variables that are changed need
to be preserved. Nonetheless, storing data in this manner probably incurs too high an
overhead if it is carried out solely by software. Studies indicate that, suitably implement-
ed with hardware assistance, the speed overhead might be no more than about 15%.
No programming language has yet incorporated the recovery block notation. Even
so, the idea provides a framework which can be used, in conjunction with any pro-
gramming language, to structure fault tolerant software.
This form of programming means developing n versions of the same software compo-
nent. For example, suppose a fly-by-wire airplane has a software component that
decides how much the rudder should be moved in response to information about
speed, pitch, throttle setting, etc. Three or more version of the component are imple-
mented and run concurrently. The outputs are compared by a voting module, the
majority vote wins and is used to control the rudder (see Figure 17.4).
It is important that the different versions of the component are developed by differ-
ent teams, using different methods and (preferably) at different locations, so that a mini-
mum of assumptions are shared by the developers. By this means, the modules will use
different algorithms, have different mistakes and produce different outputs (if they do)
under different circumstances. Thus the chances are that when one of the components
fails and produces an incorrect result, the others will perform correctly and the faulty
component will be outvoted by the majority.
Clearly the success of an n-programming scheme depends on the degree of inde-
pendence of the different components. If the majority embody a similar design fault,
they will fail together and the wrong decision will be the outcome. This is a bold
assumption, and some studies have shown a tendency for different developers to com-
mit the same mistakes, probably because of shared misunderstandings of the (same)
specification.
The expense of n-programming is in the effort to develop n versions, plus the pro-
cessing overhead of running the multiple versions. If hardware reliability is also an issue,

17.7

n-version programming
Version 1
Version 2
Version 3
Voting
module
Input
data
Output
data
Figure 17.4 Triple modular redundancy
BELL_C17.QXD 1/30/05 4:24 PM Page 252
17.8 Assertions 253
as in fly-by-wire airplanes, each version runs on a separate (but identical) processor. The
voting module is small and simple, consuming minimal developer and processor time.
For obvious reasons, an even number of versions is not appropriate.
The main difference between the recovery block and the n-version schemes is that
in the former the different versions are executed sequentially (if need be).
Is n-programming forward error recovery or is it backward error recovery? The
answer is that, once an error is revealed, the correct behavior is immediately available
and the system can continue forwards. So it is forward error recovery.
Assertions are statements written into software that say what should be true of the data.
Assertions have been used since the early days of programming as an aid to verifying the
correctness of software. An assertion states what should always be true at a particular
point in a program. Assertions are usually placed:
■ at the entry to a method – called a precondition, it states what the relationship
between the parameters should be
■ at the end of a method – called a postcondition, it states what the relationship

between the parameters should be
■ within a loop – called a loop invariant, it states what is always true, before and after
each loop iteration, however many iterations the loop has performed.
■ at the head of a class – called a class invariant, it states what is always true before
and after a call on any of the class’s public methods. The assertion states a relation-
ship between the variables of an instance of the class.
An example should help see how assertions can be used. Take the example of a class
that implements a data structure called a stack. Items can be placed in the data struc-
ture by calling the public method
push and removed by calling pop. Let us assume that
the stack has a fixed length, described by a variable called
capacity. Suppose the class
uses a variable called
count to record how many items are currently in the stack. Then
we can make the following assertions at the level of the class. These class invariant is:
assert count >= 0;
assert capacity >= count;
These are statements which must always be true for the entire class, before or after
any use is made of the class. We can also make assertions for the individual methods.
Thus for method
push, we can say as a postcondition:
assert newCount = oldCount + 1;
For the method push, we can also state the following precondition:
assert oldCount < capacity;
17.8

Assertions
BELL_C17.QXD 1/30/05 4:24 PM Page 253
254 Chapter 17 ■ Software robustness
Note that truth of assertions does not guarantee that the software is working cor-

rectly. However, if the value of an assertion is false, then there certainly is a fault in the
software. Note also that violation of a precondition means that there is a fault in the
user of the method; a violation of a postcondition means a fault in the method itself.
There are two main ways to make use of assertions. One way is to write assertions as
comments in a program, to assist in manual verification. On the other hand, as indicated
by the notation used above, some programming languages (including Java) allow asser-
tions to be written as part of the language – and their correctness is checked at run-
time. If an assertion is found to be false, an exception is thrown.
There is something of an argument about whether assertions should be used only
during development, or whether they should also be enabled when the software is put
into productive use.
Fault tolerance in hardware has long been recognized – and accommodated. Electronic
engineers have frequently incorporated redundancy, such as triple modular redundancy,
within the design of circuits to provide for hardware failure. Fault tolerance in software
has become more widely addressed in the design of computer systems as it has become
recognized that it is almost impossible to produce correct software. Exception handling
is now supported by all the mainstream software engineering languages – Ada, C++,
Visual Basic, C# and Java. This means that designers can provide for failure in an organ-
ized manner, rather than in an ad hoc fashion. Particularly in safety-critical systems,
either recovery blocks or n-programming is used to cope with design faults and enhance
reliability.
Fault tolerance does, of course, cost money. It requires extra design and program-
ming effort, extra memory and extra processing time to check for and handle excep-
tions. Some applications need greater attention to fault tolerance than others, and
safety-critical systems are more likely to merit the extra attention of fault tolerance.
However, even software packages that have no safety requirements often need fault
tolerance of some kind. For example, we now expect a word processor to perform
periodic and automatic saving of the current document, so that recovery can be per-
formed in the event of power failure or software crash. End users are increasingly
demanding that the software cleans up properly after failures, rather than leave them

with a mess that they cannot salvage. Thus it is likely that ever-increasing attention
will be paid to improving the fault tolerance of software.
17.9

Discussion
SELF-TEST QUESTION
17.9 Write pre- and post-conditions for method pop.
BELL_C17.QXD 1/30/05 4:24 PM Page 254
Exercises 255
17.1 For each of the computer systems detailed in Appendix A, list the faults that can
arise, categorizing them into user errors, hardware faults and software faults. Decide
whether each of the faults is anticipated or unanticipated. Suggest how the faults
could be dealt with.
17.2 Explain the following terms, giving an example of each to illustrate your answer: fault tol-
erance, software fault tolerance, reliability, robustness, graceful degradation.
Summary
Faults in computer systems are caused by hardware failure, software bugs and user
error. Software fault tolerance is concerned with:
■ detecting faults
■ assessing damage
■ repairing the damage
■ continuing.
Of these, faults can be detected by both hardware and software.
One hardware mechanism for fault detection is protection mechanisms, which have
two roles:
1. they limit the spread of damage, thus easing the job of fault tolerance
2. they help find the cause of faults.
Faults can be classified in two categories – anticipated and unanticipated.
Recovery mechanisms are of two types:
■ backward – the system returns to an earlier, safe state

■ forward – the system continues onwards from the error.
Anticipated faults can be dealt with by means of forward error recovery. Exception
handlers are a convenient programming language facility for coping with these faults.
Unanticipated faults – such as software design faults – can be handled using either of:
■ recovery blocks, a backward error recovery mechanism
■ n-programming, a forward error recovery mechanism.
Assertions are a way of stating assumptions that should be valid when software exe-
cutes. Automatic checking of assertions can assist debugging.
Exercises

BELL_C17.QXD 1/30/05 4:24 PM Page 255
256 Chapter 17 ■ Software robustness
17.3 Consider a programming language with which you are familiar. In what ways can you
deliberately (or inadvertently) write a program that will:
1. crash
2. access main memory in an undisciplined way
3. access a file protected from you.
What damage is caused by these actions? How much damage is possible?
Assuming you didn’t already know it, is it easy to diagnose the cause of the prob-
lem? Contemplate that if it is possible deliberately to penetrate a system, then it is
certainly possible to do it by accident, thus jeopardizing the reliability and security
of the system.
17.4 “Compile-time checking is better than run-time checking.” Discuss.
17.5 Compare and contrast exception handling with assertions.
17.6 The Java system throws an IndexOutOfBoundsException exception if a pro-
gram attempts to access elements of an array that lie outside the valid range of
subscripts. Write a method that calculates the total weekly rainfall, given an array
of floating point numbers (values of the rainfall for each of seven days of the
week) as its single parameter. The method should throw an exception of the same
type if an array is too short. Write code to catch the exception.

17.7 Outline the structure of recovery block software to cope with the following situation.
A fly-by-wire aircraft is controlled by software. A normal algorithm calculates the opti-
mal speed and the appropriate control surface and engine settings. A safety module
checks that the calculated values are within safe limits. If they are not, it invokes an
alternative module that calculates some safe values for the settings. If, again, this
module fails to suggest safe values, the pilots are alerted and the aircraft reverts to
manual control.
17.8 Compare and contrast the recovery block scheme with the n-programming scheme
for fault tolerance. Include in your review an assessment of the development times
and performance overheads associated with each scheme.
17.9 Searching a table for a desired object is a simple example of a situation in which it
can be tempting to use a goto to escape from an unusual situation. Write a piece
of program to search a table three ways:
1. using goto
2. using exceptions
3. avoiding both of these.
Compare and contrast the three solutions.
BELL_C17.QXD 1/30/05 4:24 PM Page 256
Answers to self-test questions 257
17.10 Consider a program to make a copy of a disk file. Devise a structure for the program
that uses exception handlers so that it copes with the following error situations:
1. the file doesn’t exist (there is no file with the stated name)
2. there is a hardware fault when reading information from the old file
3. there is a hardware fault when writing to the new file.
Include in your considerations actions that the filing system (or operating system)
needs to take.
17.11 Explain the difference between using a goto statement and using a throw state-
ment. Discuss their relative advantages for dealing with exceptions.
17.12 “There is no such thing as an exceptional situation. The software should explicitly
deal with all possible situations.” Discuss.

17.13 Some word processors provide an undo command. Suppose we interpret a user
wanting to undo what they have done as a fault, what form of error recovery does
the software provide and how is it implemented?
17.14 Examine the architecture and operating system of a computer for which you have
documentation. Investigate what facilities are provided for detecting software and
hardware faults.
17.15 Compare and contrast approaches to fault tolerance in software with approaches for
hardware.
Answers to self-test questions
17.1 1. unanticipated
2. unanticipated
3. unanticipated
4. anticipated
5. anticipated
17.2 stack overflow
use of a null pointer
17.3 The module could check that all the items in the new array are in order.
(This is not foolproof because the new array could contain different data
to the old.)
17.4 Pro: prevent the spread of damage, assist in diagnosing the cause.
Cons: expensive hardware and software, reduction in performance (speed).

BELL_C17.QXD 1/30/05 4:24 PM Page 257

×