Tải bản đầy đủ (.pdf) (10 trang)

Handbook of Reliability, Availability, Maintainability and Safety in Engineering Design - Part 68 ppsx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (152.63 KB, 10 trang )

654 5 Safety and Risk i n Engineering Design
The actual degree of safety—incidents: This is evaluated according to the contri-
bution of the actual physical condition of the equipment to its safety, the actual
downtime frequency,aswellastheactual reportable incident frequency,arising
from the functional failure history of the equipment resulting in an asset loss conse-
quence o f failure. Besides safety operational and physical consequences of failure,
the other consequences (economic, environmental, systems and maintenance) are
typically measured as the cost of losses plus the cost of repair to the failed item
and to any consequential damage (although, in reality, all safety consequences are
eventually also measured as a cost risk). These cost risks of failure are also defined
as the result of multiplying the consequence of failure (i.e. the cost of losses plus the
cost of repair), by the probability of its occurr ence.
Reliability analysis in engineering design tends, h owever, to simplify these risks
to the point of impracticality where, for example, consideration is given only to sin-
gle modes of failure, or only to random failure occurrences, or to maintenance that
results in complete renewal and ‘as new’ conditions. In reality, the situation is much
more complicated with interacting multiple failure modes, variable failure rates, as
well as maintenance-induced failures that influence the rates of deterioration, and
subsequent failure (Woodhouse 1999).
It is somewhat unrealistic to assume a specific failure rate of equipment within
a complex integration of systems with complex failure processes. At best, the intrin-
sic failure characteristics of components of equipment are determined from quan-
titative probability distributions of failure data obtained in a somewhat clinical en-
vironment under certain operating conditions. The true failure process, however, is
subject to many other factors, including premature or delayed preventive mainte-
nance activities conducted during shutdowns of process plant.
It is generally accepted that shutdowns affect the failure characteristics of equip-
ment as a whole, although it is debatable whether the end result is positive or nega-
tive from a residual life point of view, where residual life is defined as the remaining
life expectancy of a component, given its survival to a specific age. This is a concept
of obvious interest, and one of the most important notions in process reliability and


equipment aging studies for safety criticality analysis.
Safety criticality analysis is thus always faced with combinations of interacting
failure modes and variable failure r a tes, where the cumulative effects are much more
important than estimates of specific probabilities of failure. Qualitative estimates of
how long equipment might last in certain engineering processes, based on operating
conditions and failure characteristics, are much more easily made than quantitative
estimates of the chancesoffailure ofindividual equipment.These cumulative effects
are represented in equipment survival curves where a best-fit curve is matched to
specific survival data, and a pattern of risks calculated that would be necessary for
these effects to be realised. In analysing survival data, there is often the need to
determine not only the survival time distribution but also the residual survival time
(or residual life) distribution. A typical equipment survival c urve and hazard curve
are illustrated in Fig. 5.41a and 5.41b (Smith et al. 2000).
Typical impact, risk exposure,lost performance, and direct cost patterns based on
shutdown maintenance intervals for rotating equipment, as well as risk-based main-
5.2 Theoretical Overview of Safety and Risk in Engineering Design 655
Fig. 5.41 a Kaplan–Meier survi val curve for rotating equipment, b estimated hazard curve for
rotating equipment
tenance patterns based on shutdown maintenance intervals for rotating equipment
are illustrated in Fig. 5.42a and 5.42b (APT Maintenance 1999).
b) Risk-Based Maintenance
Risk-based maintenance is fundamentally an evaluation of maintenance tasks, par-
ticularly scheduled preventive maintenan ce activities in shutdown programs. It con-
siders the impact of bringing forwards, or d elaying, activities that are directed at
preventing cost risks to coincide with essential activities that address safety risk s.If
the extent of these risks were known, and what they cost, the optimum amount of
risk to take, and planned costs to incur, could be calculated. Similarly, better deci-
sions could be made if the value of the benefits of improvedperformance, longerlife
and greater reliability was known. These risks and benefits are, however, difficult to
quantify, and many of the factors are indeterminable. Cost/risk optimisation in this

656 5 Safety and Risk in Engineering Design
Fig. 5.42 a Risk exposure pattern for rotating equipment, b risk-based maintenance patterns for
rotating equipment
context can thus be defined as the minimal total impact, and represents a trade-off
between the conflictin g interests of the need to reduce costs at the same time as th e
need to reduce the risks of failure. Both are measured in terms of cost, the former
being the planned downtime cost plus the cost of preventive maintenance in an at-
tempt to increase perf ormance and reliability, and the latter being the cost of losses
due to forced shutdowns plus the cost of repair and consequential damage.
The total impact is the sum of the planned costs and failure costs. When this sum
is at a minimum, an optimal combination of the costs incurred and the failure risks
is reached, as illustrated in Fig. 5.43.
Cost/risk trade-off decisions determine optimal preventive maintenance intervals
for plant shutdown strategies that consider component renewal o r replacement cri-
teria, spares requirements planning, etc. Planned downtime costs plus the costs of
preventive maintenanceare traded-off against the risk consequences of premature or
deferred component renewals or replacements, measured as the cost of losses plus
5.2 Theoretical Overview of Safety and Risk in Engineering Design 657
Fig. 5.43 Typical cost optimisation curve
the cost of repair. In each of these areas, cost/risk evaluation techniques are applied
to assist in the application of a safety-critical maintenance approach.
Component renewal/replacement criteria are directly determined by failure
modes and effects criticality analysis (FMECA), whereby appropriate maintenance
tasks are matched to failure modes. In applying FMECA, the criticality analysis
establishes a priority rating of components according to the consequences and mea-
sures of their various failure modes, which helps to prioritise the preventive main-
tenance activities for scheduled shu tdowns. An example of an FMECA for process
criticality of a control valve, based on failure consequences (downtime) and failure
rate (1/MTBF), is given in Table 5.16.
Reliability, availability, maintainability and safety (RAMS) studies establish the

most effective combination of the different types of maintenance (i.e. a maintenance
strategy) for operational systems and equipment. The deliverable results are opera-
tions and maintenance procedures and work instructions in which the different types
of maintenance are effectively combined for specific equipment.
Failure modes and effects criticality analysis (FMECA), as given in Table 5.16,
is one of the most commonly used techniques for prioritising failures in equipment.
The analysis at systems level involves identifying potential equipment failure modes
and assessing the consequences of these for the system’s performance.
Table 5.17 shows the designation of maintenance activities, the a ppropriatemain-
tenance trade, and the recommended maintenance frequency for each failure mode,
based on MTBF. It is evident that some activities need to be delayed to coinc ide
with others.
Different types and levels of maintenance effort are applied, depending upon the
process or functional criticality (Woodhouse 1999):
• Quantitative risk and performance analysis (such as RAM and FMECA) is war-
ranted for about 5–10% of the most critical failure modes. This is where cost/risk
optimisation is applicable for significant costs or risks that are sensitive to high-
impact strategies.
658 5 Safety and Risk in Engineering Design
Table 5.16 Typical FMECA for process criticality
Component Failure description Failure
mode
Failure
consequences
Failure causes D/T (h) (plus
damage)
MTTR (h)
(repair time)
and damage
MTBF

(months)
Process
criticality
rating
Control valve Fails to open TLF Production Solenoid valve fails, failed
cylinder actuator or air
receiv er failure
9812Medium
critical
Control valve Fails to open TLF Production No PLC output due to
modules electronic f ault or
cabling
4 2 6 Medium
critical
Control valve Fails to seal/close TLF Production Valve disk damaged due to
corrosion wear (same ‘ fails
to open’)
5 4 6 Medium
critical
Control valve Fails to seal/close TLF Production Valve stem cylinders seized
due to chemical deposition
or corrosion
5 4 4 Medium
critical
Instrument
loop (press. 1)
Fails to provide
accurate pressure
indication
TLF Maint. Restricted sensing port due

to blockage of chemical or
physical accumulation
013Low
critical
Instrument
loop (press. 2)
Fails to detect low
pressure condition
TLF Maint. Low pressure switch fails
due to corrosion or
mechanical damage
023Low
critical
Instrument
loop (press. 2)
Fails to detect low
pressure condition
TLF Maint. Pressure switch relay or
cabling failure
084Low
critical
Instrument
loop (press. 2)
Fails to provide
output signal for
alarm
TLF Maint. PLC alarm function or
indicator fails
084Low
critical

5.2 Theoretical Overview of Safety and Risk in Engineering Design 659
Table 5.17 FMECA with preventive maintenance activities
Component Failure
description
Failure causes D/T (h)
(plus
damage)
MTTR (h)
(repair time)
and damage
MTBF
(months)
Maintenance activity Maintenance
trade
Maintenance
frequency
Control valve Fails to open Solenoid valve fails,
failed cylinder
actuatororair
receiv er failure
9 8 12 Service control valve.
Replace components
and test PLC
interface
Instr. tech. 12 monthly
Control valve Fails to open No PLC output due to
modules electronic
fault or cabling
4 2 6 Covered by control
valv e service as

above
Instr. tech. 12 monthly
Control valve Fails to
seal/close
Valve d isk d amaged
due to corrosion wear
(same causes as ‘f ails
to open’)
5 4 6 Remove control
valv e and check
valv e stem, seat and
disk or diaphragm for
deterioration or
corrosion and replace
with overhauled
valv e if required
Fitter 6 monthly
Control valve Fails to
seal/close
Valve stem cylinders
seized due to
chemical deposition
or corrosion
5 4 4 Covered by control
valve condition
assessment and
replace components
Instr. tech. 6 monthly
660 5 Safety and Risk in Engineering Design
Table 5.17 (continued)

Component Failure
description
Failure causes D/T (h)
(plus
damage)
MTTR (h)
(repair time)
and damage
MTBF
(months)
Maintenance activity Maintenance
trade
Maintenance
frequency
Instrument loop
(press. 1)
Fails to
provide
accurate
pressure
indication
Restricted sensing
port due to blockage
of chemical or
physical accumulation
0 1 3 Remove pressure
gauge and check for
blocked sensing lines
and gauge
deterioration.

Replace with new
gauge if required
Instr. tech. 3 monthly
Instrument loop
(press. 2)
Fails to detect
low pressure
condition
Low p ressure switch
fails due to corrosion
or mechanical
damage
0 2 3 Verify correct
operation of pressure
switch and wiring.
Test alarm’s
operation
Instr. tech. 3 monthly
Instrument loop
(press. 2)
Fails to detect
low pressure
condition
Pressure switch relay
or cabling failure
0 8 4 Covered by switch
operation verification
Instr. tech. 3 monthly
Instrument loop
(press. 2)

Fails to
provide
output signal
for alarm
PLC alarm function
or indicator f ails
0 8 4 Covered by switch
operation verification
Instr. tech. 3 monthly
5.2 Theoretical Overview of Safety and Risk in Engineering Design 661
• Rule-based analysis methods (such as RCM and RBI) are more appropriate for
about 40–60% of the critical failure modes, particularly if supplemented with
economic analysis of the resulting impact strategies. This is where cost/risk op-
timisation is applicable for the costs or risks for setting preventive maintenance
intervals.
• Review of existing maintenance (excluding simple FMEA studies) provides
a simple check at the lower levels of criticality to verify that there is a valid
reason for the maintenance activity, and that the cost is reasonable compared to
the consequences.
c) Safety Criticality Analysis and Risk-Based Maintenance
Safety criticality analysis was previously considered as the assessment of failure
risks. In this context, safety criticality analysis is applied to determine the essential
maintenance intervals, and the impact of premature or delayed preventive mainte-
nance activities where failure risks are considered to be safety critical. A safety/risk
scale is applied, based on a specific cost benchmark (usually computed as the cost
of output per time interval) related to the cost of losses and the likelihood of failure.
A safety criticality model to determine the optimal main tenance interval, and
the impact of premature or delayed preventive maintenance activities consider s the
following:
• A quantified description of the degradation process, using estimates wherever

data ar e not available, as well as identification of failure mo des and related
causes.
• Cost calculations for material and maintenance labour costs for each failure
mode, including possible consequential damage.
• Cost/risk calculations for alternative preventive maintenance intervals based on
a specific cost benchmark related to the cost of losses and the likelihood of fail-
ure.
• Cost criticality rating of failure modes, and sensitivity testing to the limits of the
likelihood of failure under uncertainty of unavailable or censored data.
• Identification of key decision drivers (which assumptions have the greatest effect
upon the optimal decision), for review of the preventive maintenance program.
In many cases, there are several interacting failure modes, causes and effects, all
in the same evaluation.
The preventive maintenance program or, in the case of continuous processes, the
shutdown strategy thus becomes a compromise of scheduled times and costs. Some
activities will be performed ahead of their ideal timing, whilst others will be delay ed
to share the downtime opportunity determined by safety-critical shuts.
The r isks and perfor mance impact of delayed activities, and the additional costs
of deliberate over-maintenance in others, both contribute to the costs for a partic-
ular shutdown program. The degree of advantage, on the other hand, is controlled
662 5 Safety and Risk in Engineering Design
by the costs involved. The downtime impact (the cost of losses due to forced shut-
downs as a result of failure, plus the cost of repair to the failed item and to any
consequential damage) often dominates the direct cost advantage (planned shut-
down lost opportunity costs, use of facilities, materials and labour costs, etc.) of
shutting down and starting up again. Such a cost criticality analysis also reveals the
scope for de-bottlenecking improperly evaluated reliability constraints by eliminat-
ing frequent interim shutdowns and extending operational run lengths. The analysis
process is also able to calculate the net payback f or such de-bottlenecking. The
grouping and re-grouping of activities as well as re-programming the preventive

maintenance program (i.e. combining activities in different bundles and moving the
bundles to shorter or longer intervals) are fundamentally a scheduling problem, re-
quiring the application of formalised risk analysis and d ecision criteria based on as-
sessment scales, and the use of computer automated computation. Table 5.18 shows
the application of cost criticality analysis to the FMECA for process criticality of
the control valve given in Table 5.17. It indicates the cost criticality rating of each
failure mode related to the cost of losses and the cost risk based on estimates of the
likelihood of failure. Table 5.19 shows a comparison between the process criticality
rating and the cost criticality rating of each failure mode of the control valve. In this
case, the ratings correspond closely with one another.
The maintenance freque ncies of the preventive maintenance activities that were
typically based on the mean time between failures (MTBF) are, however, not rela-
tive to either the process criticality rating or the cost criticality rating . The mainte-
nance frequencies thus require review to determine the optimal maintenance inter-
vals wher eby the impact of premature or delayed preventive maintenance activities
is considered.
This example of a r e latively important item of equipment, such as a process con-
trol valve, is typical of many such equipment in p rocess plant where RAM, FMECA
or RCM analysisdo notprovidesufficient information for decisive decision-making,
as the equipment’s failure modes are not significantly high risk but rather medium
risk. Where the criticality ratings are not significant (i.e. eviden ce of high critical-
ity), as in this case of the control valve, maintenance optimisation becomes difficult,
necessitating a review of the risk analysis and decision criteria according to qualita-
tive estimates.
d) Risk Analysis and Decision Criteria
In typical process plant shutdown programs, decisions concerning the extent and
timing of component renewal/replacement activities are generally determined by
the dominant failure modes that, in effect, relate to less than a third of the program’s
total preventive maintenance activities. Criticality ranking or prioritising of equ ip-
ment according to the consequences of failure modes is essential for a risk-based

maintenance approach, though comparative studies have shown that qualitative risk
ranking is, in many cases, just as effective in identifying the key shutdown drivers,
often at a fraction of the cost. Typically, these risks can be ranked by designating
5.2 Theoretical Overview of Safety and Risk in Engineering Design 663
Table 5.18 FMECA for cost criticality
Component Failure
description
Failure
mode
Failure causes Defect.
MATL &
LAB
($)/failure
(incl.
damage)
Econ.
$/failure
(prod.
loss)
Total
$/failure
(prod. and
repair)
Risk Cost criticality
rating
Control
valve
Fails to open TLF Solenoid valve fails,
failed cylinder actuator
or air receiver failure

$5,000 $68,850 $73,850 6.00 Medium cost
Control
valve
Fails to open TLF No PLC output due to
modules electronic
fault or cabling
$2,000 $30,600 $32,600 6.00 Medium cost
Control
valve
Fails to
seal/close
TLF Valve disk damaged
due to corrosion wear
(same causes as ‘ fails
to open’)
$5,000 $38,250 $43,250 6.00 Medium cost
Control
valve
Fails to
seal/close
TLF Valve stem cylinders
seized due to chemical
deposition or corrosion
$5,000 $38,250 $43,250 6.00 Medium cost
Instrument
loop
(press. 1)
Fails to provide
accurate
pressure

indication
TLF Restricted sensing port
due to blockage of
chemical or physical
accumulation
$500 $0 $500 2.00 Low cost

×