Tải bản đầy đủ (.pdf) (325 trang)

ROOTCAUSE FAILURE ANALYSISI ROOT CAUSE FAILURE ANALYSIS PLANT ENGINEERING MAINTENANCE pot

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (13.12 MB, 325 trang )

ROOT
CAUSE
FAILURE ANALYSIS
I


ROOT
CAUSE FAILURE
ANALYSIS
PLANT ENGINEERING MAINTENANCE SERIES
Vibration Fundamentals
R.
Keith
Mobley
Root Cause Failure Analysis
R.
Keith
Mobley
Maintenance Fundamentals
R.
Keith
Mobley
ROOT
CAUSE FAILURE
ANALYSIS
R.
Keith
Mobley
Newnes
Boston Oxford Auckland Johannesburg Melbourne New
Delhi


Newnes is
an
imprint of Butterworth-Heinemann.
Copyright
0
1999 by Butterworth-Heinemann
a
A member
of
the Reed Elsevier group
All rights reserved.
No part of
this
publication may
be
reproduced, stored in a retrieval system, or transmitted in
any form or by any means, electronic, mechanical, photocopying, recording,
or
otherwise, with-
out the prior written permission of the publisher.
@
Recognizing the importance of preserving what has been written, Butterworth-Heinemann
prints its books on acid-free paper whenever possible.
Library
of
Congress Cataloging-in-Publication Data
Mobley, R. Keith, 1943-
Root cause failure analysis
/
by

R.
Keith Mobley.
Includes index.
ISBN 0-7506-7 158-0 (alk. paper)
1. Plant maintenance.
p. cm.
-
(Plant engineering maintenance series)
2. System failures (Engineering)
I.
Title.
11.
Series.
TS192.M625 1999
658.2’024~2 1 98-32097
CIP
British Library Cataloguing-in-Publication Data
A catalogue record
for
this book is available from the British Library.
The publisher offers special discounts on bulk orders of this book.
For information, please contact:
Manager of Special Sales
Butterworth-Heinemann
225 Wildwood Avenue
Woburn, MA 01801-2041
Tel: 78 1-904-2500
Fax:
78
1-904-2620

For information on all Newnes publications available, contact our World Wide Web home page
at:
IO9
8
7
6
5
4 3
Printed in the United States of America
CONTENTS
Part
I
Introduction to Root Cause Failure Analysis
Chapter
1
Introduction
Chapter 2 General Analysis Techniques
Chapter
3
Chapter
4
Safety-Related Issues
Chapter
5
Regulatory Compliance Issues
Chapter
6
Process Performance
Root Cause Failure Analysis Methodology
Part

I1
Equipment Design Evaluation Guide
Chapter
7
Chapter
8
Chapter
9
Chapter 10
Chapter
I
I
Chapter 12
Chapter
13
Chapter 14
Chapter
15
Chapter
16
Chapter
17
Chapter
18
Pumps
Fans. Blowers, and Fluidizers
Conveyors
Compressors
Mixers
and

Agitators
Dust Collectors
Process Rolls
GearboxesReducers
Steam Traps
Inverters
Control Valves
Seals and Packing
i
3
6
13
58
67
73
75
77
07
112
173
137
153
164
171
187
1
93
302
‘20
vi

Contents
Part
III
Equipment Troubleshooting Guide
Chapter
19
Chapter
20
Chapter
21
Chapter
22
Chapter
23
Chapter
24
Chapter
25
Chapter
26
Chapter
27
Chapter
28
Chapter
29
Chapter
30
Chapter
3

1
Pumps
Fans,
Blowers,
and
Fluidizers
Conveyors
Compressors
Mixers and Agitators
Dust Collectors
Process
Rolls
Gearboxes or Reducers
Steam
Traps
Inverters
Control Valves
Seals and Packing
Others
List of Abbreviations
Glossary
References
Index
237
239
246
25
1
254
264

266
269
27
1
276
278
280
282
285
288
29
1
305
306
Part
I
INTRODUCTION TO ROOT
CAUSE FAILURE ANALYSIS

INTRODUCTION
Reliability engineering and predictive maintenance have two major objectives: pre-
venting catastrophic failures of critical plant production systems and avoiding devia-
tions from acceptable performance levels that result in personal injury, environmental
impact, capacity
loss,
or
poor product quality. Unfortunately, these events will occur
no matter how effective the reliability program. Therefore, a viable program also must
include a process for fully understanding and correcting the root causes that lead to
events having an impact on plant performance.

This book provides a logical approach to problem resolution. The method can be used
to accurately define deviations from acceptable performance levels, isolate the root
causes of equipment failures, and develop cost-effective corrective actions that pre-
vent recurrence.
This
three-part set is a practical, step-by-step guide for evaluating
most recurring and serious incidents that may occur in a chemical plant.
Part
One, Introduction
to
Root Cause Failure Analysis, presents analysis techniques
used
to
investigate and resolve reliability-related problems. It provides the basic
methodology for conducting a root cause failure analysis (RCFA). The procedures
defined
in
this section should be followed for all investigations.
Part Two provides specific design, installation, and operating parameters for particu-
lar types of plant equipment. This information is mandatory for all equipment-related
problems, and
it
is extremely useful
for
other events as well. Since many of the
chronic problems that occur
in
process plants are directly or indirectly influenced by
the operating dynamics of machinery and systems, this part provides invaluable
guidelines for each type of analysis.

Part Three is a troubleshooting guide for most of the machine types found in a chemi-
cal plant. This part includes quick-reference tables that define the common failure or
3
4
Root
Cause
Failure
Analysis
deviation modes. These tables list the common symptoms of machine and process-
related problems and identify the probable cause(s).
PURPOSE
OF
THE
ANALYSIS
The purpose
of
RCFA is to resolve problems that affect plant performance.
It
should
not
be an attempt
to&
blame
for
the incident.
This must be clearly understood by the
investigating team and those involved in the process.
Understanding that the investigation is not an attempt to
fix
blame is important for

two reasons. First, the investigating team must understand that the real benefit of this
analytical methodology is plant improvement. Second, those involved in the incident
generally will adopt a self-preservation attitude and assume that the investigation is
intended to find and punish the person
or
persons responsible for the incident. There-
fore, it is important for the investigators to allay this fear and replace it with the posi-
tive team effort required to resolve the problem.
EFFECTIVE
USE
OF
THE
ANALYSIS
Effective use of RCFA requires discipline and consistency. Each investigation must be
thorough and each of the steps defined in
this
manual must be followed.
Perhaps the most difficult part
of
the analysis is separating fact from fiction. Human
nature dictates that everyone involved in an event or incident that requires a RCFA is
conditioned by
his
or her experience. The natural tendency of those involved is to fil-
ter input data based on this conditioning. This includes the investigator. However,
often such preconceived ideas and perceptions destroy the effectiveness of RCFA.
It is important for the investigator or investigating team to put aside its perceptions,
base the analysis on pure fact, and not assume anything. Any assumptions that enter
the analysis process through interviews and other data-gathering processes should be
clearly stated. Assumptions that cannot be confirmed or proven must be discarded.

PERSONNEL
REQUIREMENTS
The personnel required to properly evaluate an event using RCFA can
be
quite sub-
stantial. Therefore, this analysis should be limited to cases that truly justify the expen-
diture. Many of the costs of performing an investigation and acting on its
recommendations are hidden but nonetheless are real. Even a simple analysis requires
an investigator assigned to the project until it is resolved. In addition, the analysis
requires the involvement
of
all plant personnel directly or indirectly involved in the
incident. The investigator generally must conduct numerous interviews. In addition,
many documents must be gathered and reviewed to extract the relevant information.
Introduction
5
In more complex investigations, a team of investigators is needed. As the scope and
complexity increase,
so
do the costs.
As a result of the extensive personnel requirements, general use of this technique
should be avoided.
Its
use should
be
limited to those incidents
or
events that have
a
measurable negative impact on plant performance, personnel safety,

or
regulatory
compliance.
WHEN
TO
USE
THE
METHOD
The use of RCFA should be carefully scrutinized before undertaking a full investiga-
tion because of the high cost associated with performing such an in-depth analysis.
The method involves performing an initial investigation to classify and define the
problem. Once this is completed, a full analysis should be considered only if the event
can be fully classified and defined, and it appears that a cost-effective solution can be
found.
Analysis generally is not performed on problems that are found to be random, nonre-
curring events. Problems that often justify the use
of
the method include equipment,
machinery,
or
systems failures; operating performance deviations; economic
perfor-
mance issues; safety; and regulatory compliance issues.
GENERAL ANALYSIS TECHNIQUES
A
number of general techniques are useful for problem solving. While many com-
mon, or overlapping, methodologies are associated with these techniques, there also
are differences.
This
chapter provides a brief overview of the more common methods

used to perform an RCFA.
FAILURE MODE
AND
EFFECTS
ANALYSIS
A
failure mode and effects analysis
(FMEA)
is a design-evaluation procedure used to
identify potential failure modes and determine the effect of each on system perfor-
mance. This procedure formally documents standard practice, generates
a
historical
record, and serves as a basis for future improvements. The FMEA procedure is a
sequence of logical steps, starting with the analysis of lower-level subsystems or com-
ponents. Figure 2-1 illustrates a typical logic tree that results with a FMEA.
The analysis assumes a failure point
of
view and identifies potential modes of fail-
ure along with their failure mechanism. The effect of each failure mode then is
traced up to the system level. Each failure mode and resulting effect is assigned a
criticality rating, based on the probability of occurrence, its severity, and its delecta-
bility.
For
failures scoring high on the criticality rating, design changes to reduce it
are recommended.
Following this procedure provides a more reliable design. Also such correct use of the
MEA process results in two major improvements:
(1)
improved reliability by antici-

pating problems and instituting corrections prior to producing product and
(2)
improved validity of the analytical method, which results from strict documentation
of the rationale for every step in the decision-making process.
6
General
Analysis
Techniques
7
+q
disciplines
Figure
2-1
Failure mode and effects analysis (FMEA)
flow
diagram.
I;
i
Acceptfailureeffed
L
Two major limitations restrict the use of
FMEA:
(1) logic trees used for this type of
analysis are based on probability of failure at the component level and
(2)
full applica-
tion is very expensive. Basing logic trees on the probability of failure is a problem
because available component probability data are specific to standard conditions and
extrapolation techniques cannot be used to modify the data for particular applications.
FAULT-TREE

ANALYSIS
Fault-tree analysis is a method of analyzing system reliability and safety. It provides
an objective basis for analyzing system design, justifying system changes, performing
trade-off studies, analyzing common failure modes, and demonstrating compliance
with safety and environment requirements. It is different from a failure mode and
effect analysis
in
that it is restricted to identifying system elements and events that
lead to one particular undesired event. Figure
2-2
shows the steps involved
in
per-
forming a fault-tree analysis.
Many reliability techniques are inductive and concerned primarily with ensuring that
hardware accomplishes its intended functions. Fault-tree analysis is a detailed
deduc-
rive
analysis that usually requires considerable information about the system.
It
ensures that all critical aspects of a system are identified and controlled. This method
represents graphically the Boolean logic associated with a particular system failure.
8
Root
Cause
Failure
Analysis
rn
Define
top

event
Establish boundaries
m
Understand system
7
Construct
fault
tree
1
I
-
Analyze tree
.
Take corrective action
Figure
2-2
ljpical fault-tree process.
called the
top event,
and basic failures or causes, called
primary events.
Top events
can be broad, all-encompassing system failures or specific component failures.
Fault-tree analysis provides options for performing qualitative and quantitative reli-
ability analysis. It helps the analyst understand system failures deductively and points
out the aspects of a system that are important with respect to
the
failure of interest.
The analysis provides insight into system behavior.
A

fault-tree model graphically and logically presents the various combinations of pos-
sible events occurring in a system that lead to the top event. The term
event
denotes a
dynamic change of state that occurs in a system element, which includes hardware,
software, human, and environmental factors.
A
fault event
is an abnormal system
state.
A
nom1 event
is expected to occur.
The structure of a fault tree
is
shown in Figure
2-3.
The undesired event appears
as
the top event and is linked to more basic fault events by event statements and logic
gates.
General
Analysis
Techniques 9
+
+
L
,/
Pnmry
Fuse Failure

(Closed)
\
7
,-
i/
Pnmary
Whg
Fadwe
(Shorhd)
)t
'.
~~ ~
/
\-A'
Figure 2-3 Example
of
a
fault-tree logic tree.
CAUSE-AND-EFFECT
ANALYSIS
Cause-and-effect analysis is a graphical approach to failure analysis. This also is
referred to as
jshbone analysis,
a name derived from the fish-shaped pattern used
to
plot the relationship between various factors that contribute to a specific event. Typi-
cally, fishbone analysis plots
four
major classifications of potential causes (i.e
human, machine, material, and method) but can include any combination

of
catego-
ries. Figure
24
illustrates a simple analysis.
Like most of the failure analysis methods, this approach relies on a logical evaluation
of
actions
or
changes that lead to a specific event, such as machine failure. The only
difference between this approach and other methods is the use
of
the fish-shaped
graph to plot the cause-effect relationship between specific actions,
or
changes, and
the end result
or
event.
This approach has one serious limitation.
The jshbone graph provides no clear
sequence of events that leads to failure.
Instead, it displays all the possible causes that
10
Root
Cause
Failure
Analysis
Figure
2-4

ljpicalfishbone diagram plots
four
categories
of
causes.
may have contributed to the event. While this is useful, it does not isolate the specific
factors that caused the event. Other approaches provide the means to isolate specific
changes, omissions, or actions that caused the failure, release, accident, or other event
being investigated.
SEQUENCE-OF-EVENTS ANALYSIS
A number of software programs (e.g., Microsoft’s Visio) can be used to generate a
sequence-ofevents
diagram.
As part of the RCFA program, select appropriate
soft-
ware to use, develop a standard format (see Figure
2-5),
and be sure to include each
event that is investigated in the diagram.
Using such a diagram from the start of an investigation helps the investigator organize
the information collected, identify missing or conflicting information, improve his or
her understanding by showing the relationship between events and the incident, and
highlight potential causes of the incident.
The sequence-of-events diagram should be a dynamic document generated soon after
a problem is reported and continually modified until the event is fully resolved.
Figure
2-6
is an example of such a diagram.
Proper use of this graphical tool greatly improves the effectiveness of the problem-
solving team and the accuracy of the evaluation.

To
achieve maximum benefit from
General
Analysis
Techniques
11
EVENTS:
&enh
are
diaplnyed
as
r&ulgdu
bmeq
whichare
axmected
by
flow
dimdon
a-
that
+e
the
properaequenxformnts.
M
box
ahould
containonly
one
event
and

the
date
md
time
Uut it
unured.
Use
pmk,
haual
non-judgemnt.l
wd
and
quantKy
when
pible.
QUALIFER9
qdfying data peltknt
to
uut
event
Each
event
ahouM
becluiAed
by
using
oval
dah
blacks
that

pmvide
Eachovalaharldcontlinonlyonegualifierthatpmvidescbrihh
a
unique
restriction.
orothereondition
that
may
have
inauared
the
event
Eachqdifierovsl&wldbecmmc&dtotheapprqmi&eventboxusing
adlreetion.mnvtht~itruoeLtion
to.sppiflceveIlt.
FORCING
FUNCTIONS:
Frtas
that
cvuld
be
mntributed
to
the
event
should
be
displayed
as
a

haugon-
hped
data
box.
~haugon~dcontninonemmiaelydehedfordng~
Fomhghurtiau.houldbeumnatedtoarpdficmntluingadirectim
umw
that
confirms
its
.ssod.k
with
that event.
INCIDENT:
Thelncidentboxcabriefstatenlentofthereamnforthe
inVestig.tiOn.
The
lncidRabox
should
be
inmestd
at
the
pmper
pant
in
the
event
qence
andamneckd

to
the
evmt
boxes
using
diActim
am.
‘Ihae
should
be
only
one
wentdatd
box
mcluded
in
mch
inveahpw
ASSUMPIIONS:
I
Figure
2-5
Symbols
used in sequence-of-events diagram.
0-l
w-,
this technique, be consistent and thorough when developing the diagram. The follow-
ing guidelines should
be
considered when generating a sequence-of-events diagram:

Use a logical order, describe events in active rather than passive terms, be precise, and
define
or
qualify each event or forcing function.
12 Root
Cause
Failure
Analysis
OM13197
Figure
2-6
Typical
sequence-of-events
diagram.
In the example illustrated in Figure
2-6,
repeated trips of the fluidizer used to transfer
flake from the Cellulose Acetate
(CA)
Department to the preparation area triggered an
investigation. The diagram shows each event that led to the initial and second fluidizer
trip. The final event, the silo inspection, indicated that the root cause of the problem
was failure of the level-monitoring system. Because of this failure, Operator A over-
filled
the
silo. When this happened, the flake compacted in the silo and backed up in
the pneumatic-conveyor system. This backup plugged
an
entire section of the pneu-
matic-conveyor piping, which resulted in an extended production outage while the

plug was removed.
Logical
Order
Show events in a logical order from the beginning to the end
of
the sequence. Initially,
the sequence-of-events diagram should include all pertinent events, including those
that cannot be confirmed.
As
the investigation progresses, it should
be
refined to show
only those events that
are
confirmed to be relevant to the incident.
General
Analysis
Techniques 13
Active Descriptions
Event boxes in a sequence-of-events diagram should contain action steps rather than
passive descriptions of the problem. For example, the event should read: “Operator
A
pushes pump start button” not “The wrong pump was started.”
As
a general rule, only
one subject and one verb should be used in each event box. Rather than “Operator
A
pushed the pump stop button and verified the valve line-up,’’ two event boxes should
be used. The first box should say “Operator
A

pushed the pump stop button” and the
second should say “Operator
A
verified valve line-up.”
Do
not
use people’s names on the diagram.
Instead use job functions
or
assign a code
designator for each penon involved in the event
or
incident. For example, three oper-
ators should be designated Operator
A,
Operator
B,
and Operator
C.
Be Precise
Precisely and concisely describe each event, forcing function, and qualifier. If a con-
cise description is not possible and assumptions must be provided
for
clarity, include
them as annotations. This is described in Figure
2-5
and illustrated in Figure
2-6.
As
the investigation progresses, each assumption and unconfirmed contributor to the

event must
be
either confirmed
or
discounted.
As
a result, each event, function,
or
qualifier generally will be reduced to a more concise description.
Define Events and Forcing Functions
QualiJiers
that provide
all
confirmed background
or
support data needed to accurately
define the event
or
forcing function should
be
included in a sequence-of-events dia-
gram. For example, each event should include date and time qualifiers that
fix
the time
frame of the event.
When confirmed qualifiers are unavailable, assumptions may be used to define uncon-
firmed
or
perceived factors that may have contributed to the event
or

function. How-
ever, every effort should be made during the investigation to eliminate the
assumptions associated with the sequence-of-events diagram and replace them with
known facts.
3
ROOT
CAUSE FAILURE
ANALYSIS METHODOLOGY
RCFA is a logical sequence of steps that leads the investigator through the process
of
isolating
the
facts surrounding an event or failure. Once the problem has been fully
defined, the analysis systematically determines the best course of action that will resolve
the event and
assure
that it is not repeated. Because of the cost associated with perform-
ing such an analysis, care should
be exercised before an investigation is undertaken.
The first step in this process is obtaining a clear definition of the potential problem or
event. The logic tree illustrated in Figure
3-1
should be followed for the initial phase
of the evaluation.
REPORTING
AN INCIDENT
OR
PROBLEM
The investigator seldom is present when an incident or problem occurs. Therefore, the
first step is the initial notification that an incident or problem has taken place. Typi-

cally, this report will be verbal, a brief written note, or a notation in the production log
book. In most cases, the communication will not contain a complete description of the
problem. Rather, it will be a very brief description of the perceived symptoms
observed by the person reporting the problem.
Symptoms and Boundaries
The most effective means of problem or event definition is to determine its
real
symp-
toms and establish limits that bound the event. At
this
stage of the investigation, the task
can be accomplished by an interview with the person who first observed the problem.
Perceived Causes
of
Problem
At
this
point, each person interviewed will have a definite opinion about the incident, and
will have
his
or her description of the event and an absolute reason for the occurrence. In
14
Root
Cause
Failure
AnaIysis
Methodology
15
I
YOS

._
Figure
3-I
Initial
mot
cause failure analysis
logic
tree.
many cases. these perceptions
are
totally wrong, but they cannot
be
discounted. Even
though many
of
the opinions expressed by the people involved with
or
reporting an
event may be invalid. do not discount them without investigation. Each opinion
16
Root
Cause
Failure
Analysis
should be recorded and used as part of the investigation. In many cases, one or more
of the opinions will hold the key to resolution of the event. The following are some
examples where the initial perception was incorrect.
One example
of
this phenomenon is a reported dust collector baghouse problem. The

initial report stated that dust-laden
air
was being vented from the baghouses on a ran-
dom, yet recurring, basis. The person reporting the problem was convinced that
chronic failure of the solenoid-actuated pilot valves controlling the blow-down of the
baghouse, without a doubt, was the cause. However, a quick design review found that
the solenoid-controlled valves
nomZZy
are
closed.
This type of solenoid valve
can-
notfail
in the
open
position and, therefore, could not be the source of the reported
events.
A conversation with a process engineer identified the diaphragms used to seal the
blow-down tubes
as
a potential problem source. This observation, coupled with inad-
equate plant
air,
turned out to be the root cause of the reported problem.
Another example illustrating preconceived opinions is the catastrophic failure of a
Hefler chain conveyor. In this example, all the bars on the left side of the chain were
severely bent before the system could be shut down. Even though no foreign object
such as a bolt was found, this was assumed to be the cause for failure. From the evi-
dence, it was clear that some obstruction had caused the conveyor damage, but the
more important question was, Why did it happen?

Hefler conveyors are designed with an intentional failure point that should have pre-
vented the extensive damage caused by this event. The main drive-sprocket design
includes a
shearpin
that generally prevents this type of catastrophic damage. Why did
the conveyor fail? Because the shear pins had been removed and replaced with Grade-
5
bolts.
Event-Reporting Format
One
factor that severely limits the effectiveness of RCFA is the absence of a formal
event-reporting format. The use of a format that completely bounds the potential
problem or event greatly reduces the level of effort required to complete an analysis.
A
form similar to the one shown in Figure
3-2
provides the minimum level of data
needed to determine the effort required for problem resolution.
INCIDENT CLASSIFICATION
Once the incident has been reported, the next step is to identify and classify the type
of problem. Common problem classifications are equipment damage or failure, oper-
ating performance, economic performance, safety, and regulatory compliance.

×