Safer Surgery part 23 pps

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.07 MB, 10 trang )

Safer Surgery
194
Performance Dimension Training (PDT): Increase rating accuracy by
facilitating dimension-relevant evaluations.
Behavioural Observation Training (BOT): Increase rating accuracy by
focusing on the observation of behaviour.
Frame of Reference training (FOR): Increase rating accuracy by focusing
on the different levels of performance (Salas et al. 2001).
Reportedly, it is seemingly straightforward to train a single group of raters and
achieve a relatively high level of inter-rater agreement and accuracy when compared
to a standard set rated by an expert (Salas et al. 2001). The potential sources of
error means that it is not as easy to translate this to many groups who are trained in
separate centres or by different trainers. The rate of between-group rater agreement
and accuracy is known to drop signicantly without addressing such errors.
Salas et al. (2001) outlined guidelines for training raters in the use of behavioural
markers. These guidelines have been developed to try and minimize the sources of
error described above. There is limited recent evidence on effective rater training in
the medical domain and so these guidelines were followed to develop our rater training
for ANTS. These guidelines were previously followed by Flin and Glavin’s research
group (Fletcher et al. 2003) to evaluate inter-rater agreement for ANTS.
Baker et al (2001) employed an eight-hour rater training programme for a
behavioural marker system. They achieved an adequate level of inter-rater agreement
and accuracy in this time frame. Workpackage Report 7 from the University of
Aberdeen investigating ANTS reports a four-hour training programme (Fletcher
et al. 2002). With this minimal amount of training, inter-rater agreement of rwg
= 0.55–0.67 was achieved at an elemental level, and 0.56–0.65 for a categorical
level. This was also without feedback and calibration.
During the initial evaluation of ANTS by the Scottish team, feedback from
experts and calibration were deliberately excluded from rater training to isolate
the impact of these on inter-rater agreement. In this way, the training provided to
their participants was intentionally limited to try and isolate the reliability of the

tool itself. With our research we hoped to move this forward a phase by including
feedback from experts to test calibration, and thus improve inter-rater agreement.
Rater Training Day
When planning the project we aimed to train 15 participants as raters. This was the
number we reasonably thought we could expect since we were only able to recruit
from our own department. We chose a single day of training as a feasible length.
Were the college to have a widespread roll-out of ANTS as an assessment tool, a
short course would be necessary if many assessors were to be trained.
Surprisingly, 26 participants applied and attended the training which was held
at a venue with excellent audio-visual capacity, desk space and good catering for
participants and trainers.
•
•
•
Using ANTS for Workplace Assessment
195
Pre-reading was sent out prior to the workshop including the ANTS Handbook
(<www.abdn.ac.uk/ANTS>) and ‘Recommendations for the use of behavioural
markers’ (Klampfer et al. 2001).
An outline of the rater training is as follows:
welcome and introduction;
ANTS background information;
Behavioural Observation Training;
Rater Error Training;
Performance Dimension and Frame Of Reference training;
assessment practice and calibration.
The attendees found the behavioural observation training to be an enjoyable
experience. This session involved observing continuity errors in lms, amongst
other activities. The performance dimension and frame of reference training utilized
the excerpts that were selected from the videos. This was the rst opportunity our

raters had to practise their assessment skills.
Five videos were shown in the afternoon for the purpose of practice assessment
and calibration. ANTS scores from these ve videos were collected for statistical
analysis. Score sheets were collected from each rater prior to any discussion about
the video. Expert ratings and a discussion followed for the purpose of calibration.
Many of the learning points that we gained from this project arose during these
sessions and will be discussed later.
Results
Participants rated performance in the ve test videos, scoring in each case for 15
identied skill elements. Following lengthy discussion with our statistician, intra-
class correlation (ICC) (Shrout and Fleiss 1979) was chosen to demonstrate agreement
between the 26 raters. ICC allows different sources of variability to be included and
is essentially another way of calculating inter-rater agreement. By choosing ICC we
were able to include multiple sources of variability. This reects the many sources of
variability that are introduced when investigating inter-rater agreement.
ICC = var(target) / var(target) + var(judge) + var(residual).
Intraclass correlations were calculated for each element of ANTS (see Figure
12.1). A random effects linear model approach was used to estimate variance
components and to ultimately estimate the ICC. None even reached the minimal
acceptable value of 0.7, let alone the higher value of 0.9 that would be considered
necessary for a high stakes assessment. We therefore showed a lack of reliability.
As you will see later in this chapter, we learnt a number of lessons while trying to
achieve inter-rater reliability with such a large group.
•
•
•
•
•
•
Safer Surgery

196
Comparison of scores with those of the ‘expert raters’ also showed unsatisfactory
results.
Qualitative Data
Pre-workshop Questionnaire
All participants were involved in the supervision of ANZCA trainees with a range
of experience in terms of supervision from 1–30 years. Of the 26 participants,
eight had one year or less of supervising experience.
Only seven participants already had a system for assessment in the workplace.
Of these, one was a regional education ofcer, and the others were generally
experienced (5–30 years as a consultant). New supervisors were less likely to have
a method for assessment. However, there were still some experienced anaesthetists
amongst those who felt they had no real system for assessment.
The most common aims the participants identied for the workshop were to
‘provide better or constructive feedback’, and ‘to have a systematic method for
assessment’ (22 out of 26). No one felt that the elements of the current in-training
assessment process were useful for assessment.
Figure 12.1 Intraclass correlations calculated for each component of ANTS
Using ANTS for Workplace Assessment
197
Post-workshop Questionnaire
All participants thought ANTS was a useful system for structuring assessment and
the majority found the system easy to use. Four participants found it difcult to
place appropriate rating scores next to the observed behaviour.
Pre-reading was universally helpful and one participant even requested
extra pre-reading. The level for most of the training was scored at ‘just right’.
The amount of information in each section was ‘just right’ for the majority of
participants. Specic comments about the training were few, but noted to adapt
future training sessions.
Most participants felt they needed more practice before using ANTS for assessment.

Despite this, most felt they had received enough training to use the ANTS system.
All participants felt that ANTS was useful for consultants to give training to
junior anaesthetists. One comment was that it sets a ‘gold standard’ for behaviours
in theatre. All felt it is important for trainees to have or develop these skills.
Each participant felt that ANTS was useful as a formative assessment tool.
The comments associated with this question included the fact that it was useful to
give structured feedback. Most also felt that it would highlight the importance of
non-technical skills.
The question as to whether ANTS was suitable as a summative assessment tool
divided the participants. Thirteen raters thought it was suitable as a summative
assessment tool and 13 thought more work should be done. During which year of
training summative assessment of non-technical skills should be performed also
divided opinion. This could be determined by a future project but would require
large numbers of both trainees and raters. The comments associated with this
question give an insight into what the average ANZCA fellow may think about
using ANTS in its current form.
Comments from ANZCA fellows regarding the use of ANTS as a summative
assessment tool:
Only a small number of expert raters should perform this assessment.
Such an assessment may have severe implications for supervision of
trainees.
This should only be used for trainees who are already struggling.
Anaesthesia trainees should be exposed to non-technical skills training
early to encourage good behaviours.
Discussion
It seems unusual that the Scottish team investigating ANTS achieved an inter-rater
reliability of r=0.5–0.7 with minimal training yet, in comparison, our correlation
is so poor. What was different? We were hoping to see the introduction of ANTS
as a summative assessment tool and unfortunately, this does not look to be a great
start.

•
•
•
•
Safer Surgery
198
We believe there are a number of reasons why we could not achieve sufcient
agreement between raters in one day. Some of these were obvious from the discussion
during calibration on the training day, and some have become clearer afterwards.
Following the viewing of each video, the expert ratings were discussed to see if
we could achieve further agreement amongst the group before the next video. We
observed a number of interesting opinions, discussions and thoughts about using
ANTS from our raters. At the time we thought we were seeing potential problems
with our rater training, but what we actually saw were some warnings for the use
of workplace-based assessment in general.
Misclassication
When discussing the viewed scenario it seemed that everybody was seeing the
same behaviours yet there was a denite lack of agreement. This was because the
behaviours were being placed into different elements.
Anybody who has used the ANTS system will know that one behaviour can
potentially be placed beside more than one element. When behaviours are placed
beside a different set of elements by raters, it can result in different ratings for each
element. This will ultimately result in a lack of agreement.
Disagreement on Safety Standards
Not everyone agrees on safety standards within anaesthesia. An example of this
is ‘test ventilation’. Some anaesthetists believe this is mandatory practice, while
others think it is dangerous. This is a problem that is directly relevant to the future of
workplace assessment. Since ANTS is essentially based on patient safety, it is vital
that assessors agree on what is ‘safe’ if they are to enforce this view on trainees.
We believe these two problems were mainly responsible for the lack of inter-

rater agreement. This was obvious from some of the ‘lively’ discussions during
calibration. However, this still does not account for the large difference between
the Scottish raters and the Australian ones.
Herein lies the next difference. When we designed our study, we kept in mind
the fact that ANTS could be used, in the future, on a large scale and by a variety
of anaesthetists.
The participants in our rater training day were all specialist anaesthetists of
varying experience, but without specic training in education or simulation. In
fact, most of them play no formal role in education or training. All the anaesthetists
in the Scottish study had some involvement in education and training activities
(Fletcher et al. 2002). The large difference in inter-rater agreement could be
an effect of education. The same argument that holds for teaching anaesthesia
trainees formally about non-technical skills, if we are to assess them, may apply to
consultants doing the assessing.
The Scottish raters were also assessing acted scenarios as opposed to the real
cases that our raters observed. This may be yet another source of bias that was
Using ANTS for Workplace Assessment
199
introduced. We learnt many lessons from our rater training day, and realized how
many sources of bias can be introduced.
Lessons Learnt
ANTS as an appropriate instrument for workplace-based assessment for
anaesthetists-in-training:
a. Validity. Content validity was established in previous work on
ANTS. Our trainee raters agreed in the post-workshop questionnaire.
Criterion validity is almost impossible to establish due to a lack of gold
standard. We compressed the data to improve our reliability. Whilst
this appears to have face validity, we have not formally assessed the
validity of this approach.
b. Reliability. We failed to demonstrate acceptable inter-rater reliability

with the level of training we offered. Acceptable reliability could
be gained by compressing the data. If the data are compressed
to an average score from each rater then the inter-rater agreement
substantially increases. A longer course may offer higher reliability
but may not be feasible. Reliability is difcult to demonstrate because
variability is multifactorial. Addressing it would require larger sample
sizes or greater control over video content. Changing the latter would
detract from the validity of the real world experience.
c. Acceptability. There was high acceptability of this tool amongst
both video subjects and workshop participants. However, the
voluntary nature of the participation introduces bias. Data from
these motivated individuals may not translate to a wider population.
Resistance to implementation could be predicted to occur with many
of the stakeholders. This would include trainees needing to accept
the importance of this dimension of their practice and qualied
anaesthetists, who work with trainees, having to incorporate the
principles of ANTS explicitly in their practice. Those involved with
the central examination process would need to accept devolution of
their power and local centres would have to accept an increase in non-
clinical workload. Trainees may also worry about the introduction of
local bias into their assessment, a process which until now, has been
central and viewed as impartial.
d. Feasibility. ANTS does not appear to be a feasible tool to use for
summative assessment in its current state. Signicant training and
practice are likely to be needed, leading to low interest in potential
assessors. A widespread implementation would require large human
and nancial resources. Potentially, it could be used as a screening
tool by assessors with limited training and using a compressed scoring
system. Those rated as underperforming could be assessed further
1.

Safer Surgery
200
by a small number of highly trained and committed assessors. The
validity of this modication would also need to be examined.
2. General lessons learnt about implementation of a workplace-based
assessment tool.
a. The use of video footage was invaluable to teach potential assessors,
though still failed to show all the information the audience wanted.
High audiovisual quality was critical. Video has been shown to have
similar validity and reliability to real-time observation (Hays et al.
2002) but is unlikely to be useful in our setting for trainee assessment
due to the increased cost.
b. The workshop demonstrated that techniques borrowed from other
industries were appropriate in this setting. The dynamics of the group
impacted more on results than we would have expected, with vigorous
discussion becoming unhealthy at times. We appeared unable to
dislodge preconceptions about some behaviours, despite others in the
group making it clear these beliefs were held by only a very small
minority. Observations of those with dissenting opinions in the group
tallied closely with their standing as an outlier in the rating process.
The workshop can therefore also be an opportunity to decide who is
reliable enough to become an assessor.
Conclusion
The overriding strength of ANTS is its content validity, with effective coverage
of the domains of non-technical practice of anaesthesia. This coverage is also its
downfall, giving it a complexity that limits its feasibility as a summative assessment
tool. The poor inter-rater reliability that we demonstrated is likely to be a feature of
any workplace-based assessment tool for anaesthesia as the subtleties of medical
practice make maintaining high validity and reliability together difcult. Simplifying
ANTS can improve reliability and may do this without impairing validity although

successfully applying the tool will remain complex. The well-mapped-out domains
of anaesthesia practice should make this an ideal speciality to pioneer workplace-
based assessment in medicine. Given our responsibility to government and patients to
provide safe, appropriately trained anaesthetists, it seems unfeasible to not introduce
an appropriate comprehensive assessment programme.
References
Baker, D., Mulqueen, C. and Dismukes, R. (2001) Training raters to assess resource
management skills. In E. Salas, C. Bowers and E. Edens (eds), Improving
Teamwork in Organizations (pp. 131–45). Mahwah, NJ: Lawrence Erlbaum
Associates.
Using ANTS for Workplace Assessment
201
CanMEDS (2000) Extract from the CanMEDS 2000 Project Societal Needs
Working Group Report. Medical Teacher 22(6), 549–54.
Downing, S.M. (2004) Reliability: On the reproducibility of assessment data.
Medical Education 38(9), 1006–12.
Fletcher, G., Flin, R., McGeorge, P., Glavin, R., Maran, N. and Patey, R. (2002)
WP7 Report: Evaluation of the Prototype Anaesthetist’s Non-Technical Skills
(ANTS) Behavioural Marker System: University of Aberdeen Workpackage
Report for SCPMDE. Available from < [last
accessed March 2009].
Fletcher, G., Flin, R., McGeorge, P., Glavin, R., Maran., N and Patey., R. (2003)
Anaesthetists’ Non-Technical Skills (ANTS): Evaluation of a behavioural
marker system. British Journal of Anaesthesia 90(5), 580–8.
Fletcher, G., Flin, R., McGeorge, P., Glavin, R., Maran, N. and Patey, R. (2004)
Rating non-technical skills: Developing a behavioural marker system for use
in anaesthesia. Cognition, Technology and Work 6, 165–71.
Gleason, A.J., Daly, J.O. and Blackham, R.E. (2007) Prevocational medical training
and the Australian Curriculum Framework for Junior Doctors: A junior doctor
perspective. Medical Journal of Australia 186(3), 114–16.

Hays, R.B., Davies, H.A., Beard, J.D., Caldon, L.J.M., Farmer, E.A., Finucane,
P.M., McCrorie, P., Newble, D.I., Schuwirth, L.W. and Sibbald, G.R. (2002)
Selecting performance assessment methods for experienced physicians.
Medical Education 36(10), 910–17.
Klampfer, B., Flin, R., Helmreich, R.L., Hausler, R., Sexton, B., Fletcher, G., Field,
P., Staender, S., Lauche, K., Dieckmann, P. and Amacher, A. (2001) Enhancing
Performance in High Risk Environments: Recommendations for Using
Behavioural Markers. Zurich: Group Interaction in High Risk Environments.
Swissair Training Centre.
Salas, E., Bowers, C.A. and Edens, E. (eds) (2001) Improving Teamwork in
Organizations: Applications of Resource Management Training. Mahwah, NJ:
Lawrence Erlbaum Associates.
Shrout, P.E. and Fleiss, J.L. (1979) Intraclass correlations: Uses in assessing rater
reliability. Psychological Bulletin 86(2), 420–8.
Spike, N., Alexander, H., Elliott, S., Hazlett, C., Kilminster S., Prideaux, D., and
Roberts, T. (2000) In-training assessment – its potential in enhancing clinical
teaching. Medical Education 34(10), 858–61.
Woehr, D.J. and Huffcutt, A.I. (1994) Rater training for performance appraisal: A
quantitative review. Journal of Occupational and Organizational Psychology
67(3), 189–205.
This page has been left blank intentionally
Chapter 13
Measuring Coordination Behaviour in
Anaesthesia Teams During Induction of
General Anaesthetic
s
Michaela Kolbe, Barbara Künzle, Enikö Zala-Mezö, Johannes Wacker
and Gudela Grote
Introduction
Working in groups is widespread in medicine, especially in the operating room.

Anaesthesia is a classic small-group performance situation where a variety of
organizational, group process and personality factors are crucial to outcomes such
as patient safety. Human factors such as breakdown in the quality of teamwork
have been identied as a main source of failures in medical treatment (Arbous
et al. 2001, Cooper et al. 2002, Gaba 2000, Helmreich and Davies 1996, Lingard
et al. 2004, Reason 2005, Sexton et al. 2000). There is growing evidence that
the ability of medical teams to deal with the required complex work processes
strongly depends on adaptive team coordination (e.g., Manser et al. 2008, Risser
et al. 1999, Rosen et al. 2008, Salas et al. 2007b, Schaafstal et al. 2001, Zala-
Mezö et al. 2009). Coordination has been dened as the ‘structured patterning
of within-group activities by which groups strive to achieve their goal’ (Arrow
et al. 2000, p. 104). However, for anaesthesia teams, there is very little empirical
evidence in which specic coordination behaviours can help teams maintain
effective clinical performance – especially in transitions from routine situations
to the management of non-routine events. In our ongoing work, we attempt to ll
this gap by analysing coordination behaviour and clinical performance in routine
and non-routine events. In this chapter, we will analyse the relevance of adaptive
coordination in anaesthetic work and present our approach to measuring team
coordination behaviour in anaesthesia.
Teamwork in Anaesthesia
The induction of anaesthesia is particularly demanding compared to the other
tasks involved in the anaesthetic process (see Phipps et al. 2008). Clinical team
performance is inuenced by a variety of factors such as team member experience

Safer Surgery part 23 pps

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về