Safer Surgery
14
process produced a list of 150 unsorted non-technical skills such as ‘coordinates
the team’, and ‘conrms understanding with assistant’ as raw input data for system
development in phase 2.
Phase 2: Development of the NOTSS System
The goal of Phase 2 was to develop a system that could be used by surgeons to rate
other surgeons’ behaviours in vivo in the operating theatre rather than to develop
a comprehensive taxonomy or research instrument. The tri-level hierarchical
format used for behavioural marker systems in anaesthesia (Fletcher et al. 2004)
and European civil aviation (Flin et al. 2003) was adopted. This format structures
skills into category and element levels with observable behaviours (markers)
indicative of good and poor performance for each element. The prototype system
was developed in three stages to (i) rene the skill set that emerged from phase 1,
(ii) sort those skills into a skills taxonomy, and (iii) identify observable behaviours
that were indicative of each skill in the taxonomy.
The aim of Stage 1 was to rene the skills that emerged from Phase 1 and
remove duplication without diluting the conceptual breadth of the skills that
emerged from the task analysis. This process was to form the basis of the system.
To achieve this, the multidisciplinary research group reduced and rened the list
of 150 skills extracted from the transcripts, considering the results of the literature
review, survey, and observations in theatre. The skills taxonomy was developed
according to design criteria derived from the JARTEL (Joint Aviation Requirements:
Translation and Elaboration of Legislation) project (Flin et al. 2003), an expert
panel on behavioural markers (Klampfer et al. 2001) and from Cognitive Task
Analysis (Seamster et al. 1997).
1
The reduced skills list was then thematically
organized and broad categories emerged broken down into component elements.
In Stage 2, an iterative process was used with four independent panels of
consultant surgeons from four hospitals, who modied the structure into a skills
taxonomy. The panels checked the wording and labelling of elements, and ensured
that the framework was relevant to the surgical domain. This formed the basis for
the behavioural marker system.
In stage three, observable behaviours (markers) indicative of good and poor
performance were developed for each element by 16 consultant surgeons. The
surgeons were asked to think of behaviours that could be either directly observed
or inferred through communication. Two subsequent multidisciplinary review
meetings rened this set of illustrative behaviours, all phrased as active verbs.
This ensured that the system had cognitive and interpersonal functionality, was
grounded in surgery, and complied with the guidelines on system design (Gordon
1993) and criteria for development of behavioural markers mentioned earlier.
1 See Table 1 in Yule et al. (2006b) for the full set of design criteria.
Development and Evaluation of the NOTSS Behaviour Rating System
15
The NOTSS Rating Scale
The aim of the system is to allow surgeons to rate skills they observe. After
considering the possible rating scale formats, a four-point scale was chosen, as
follows: 4 good, 3 acceptable, 2 marginal, 1 poor, and N/A not applicable. The
‘not applicable rating’ applies when the behaviour was not required in a given
clinical scenario. If the skill should have been observed but was not, then a rating
of 1 (poor) should be given. Behaviours which potentially endanger patient safety
should also be given this rating.
Phase 3: System Evaluation
The aim of Phase 3 was to evaluate the NOTSS v1.1 system, specically to assess
its psychometric properties of (i) sensitivity, (ii) inter-rater reliability, and (iii)
internal structure and consistency. To exert some control over the evaluation and
stimuli used, we used a pseudo-experimental design which involved 44 consultant
surgeons rating standardized video clips of surgeons’ intraoperative behaviour.
To achieve this we lmed eleven video scenarios illustrating a range of surgeons’
non-technical skills in general and orthopaedic surgery. The scenarios were
lmed using a patient simulator in operating rooms with practising surgeons,
anaesthetists and nurses acting the main roles. The scenarios were designed by
surgeons, anaesthetists and psychologists who were experienced in non-technical
skills training. From these, three scenarios were selected for training and six for the
evaluation, the longest of which ran for 5 minutes and 40 seconds. The participating
surgeons attended a half-day training session on how to use the NOTSS system,
with some guidance on behaviour rating (Baker et al. 2001). They were instructed
to watch each scenario and to rate the observed skills of the consultant surgeon
using the NOTSS rating form. Participants were informed of the simulated nature
of the scenarios.
Table 2.1 shows the criteria for each of the evaluation metrics used in this
study and the corresponding results. For more details on the evaluation see
Yule et al. (2008a) and Yule et al. (2009). This table shows that the system was
moderately sensitive, but operated best when observers had to make a decision
regarding whether the behaviour was acceptable or not. Within-group agreement
was acceptable for the interpersonal skill categories but below acceptable criteria
for cognitive skills. Internal reliability was high with an overall mean difference
of 0.25 scale points between categories and elements.
There were also differences in the way the scenarios were rated, two scenarios
yielded either oor or ceiling ratings as the behaviours were explicitly good or
poor, and other scenarios displayed more ambiguous behaviours and were rated
in the mid-range of the scale. Orthopaedic surgeons were found to agree on rated
behaviours signicantly more than general surgeons (Yule et al. 2008a).
Safer Surgery
16
On the basis of the evaluation a number of changes were made to the taxonomy,
the most important being the removal of ‘Task Management’. This was done
because conceptually, many of the task management behaviours were actually more
reective of situation awareness; some reliability tests did not reach an acceptable
threshold for the category and practically, removing a category and elements from
the taxonomy reduced the cognitive load for raters who have a nite capacity for
Type of
evaluation
Why it is
important
How calculated
and criteria
Result of test
Sensitivity This is a measure
of
how accurate the
group of raters are
in absolute ratings
of behaviour
compared with
reference ratings
Mean number
of scale point
difference
between raters
and reference,
represented as a
decimal, usually
<1
Mean sensitivity
across all
categories was .67
Within-group
agreement (r
wg
)
18
This is a measure
of statistical
agreement between
a number of raters.
In this study, it
represents the
degree to which
the groups of
participants agree
on the absolute
ratings they give
to behaviours in
the scenarios that
reect the NOTSS
categories and
elements
Scores lie between
0 (no agreement)
and 1 (perfect
agreement); scores
above .7 are
deemed
acceptable.
r
wg
was calculated
for the NOTSS
categories and
elements ratings
for each of the
6 experimental
groups
r
wg
exceeded the
criteria of >.7 for
two categories:
Leadership and
Communication
&
Teamwork.
r
wg
for Decision-
making and Task
Management
approached the
criterion but
the value of r
wg
for Situation
Awareness was .51
Internal reliability
There should be
a high degree
of consistency
between the
category rating
and the ratings for
the two or three
underpinning
elements due to
conceptual overlap
The mean absolute
difference between
raters’ element
ratings and their
rating for the
corresponding
category
. L
ower
scores (tending
to zero) indicate
closer agreement
Mean difference
for all categories
was < 0.25 of
a scale point
between elements
and category on
a 4-point scale.
C
onsistency
between category
and element
deemed very high
for all categories
Table 2.1 Summary of NOTSS v1.1 evaluation results (see Yule et al.
2008a for detailed results)
Development and Evaluation of the NOTSS Behaviour Rating System
17
holding a number of categories and elements in working memory while engaged
in a real-time observation and rating task (Yule et al. 2008a). This produced the
NOTSS taxonomy version 1.2 (see Figure 2.2).
The NOTSS v1.2 Handbook
A user handbook (Flin et al. 2006b) was then written which contained background
information on the development of NOTSS, advice for using system in clinical
practice, denitions and behavioural examples of the NOTSS categories and
elements, a set of rating forms for users, indicative good and poor behaviours
for each element, and advice on how to use the rating scale. Practical tips to aid
surgeons embed non-technical skills observations into clinical practice were
included, as was advice for surgical trainers planning to use NOTSS with higher
surgical trainees.
Phase 4: System Usability
A follow-up study was conducted to evaluate system usability with 22 surgical
trainers and their trainees from three Scottish hospitals. The trainers were asked to
use the NOTSS rating form and supporting handbook to rate and provide feedback
to trainees as soon as possible after each of ten cases where the trainee had
contributed signicantly to the operation. Inguinal hernia repair and laparoscopic
cholecystectomy were typical operations observed during this trial but it was
recommended that specic use of NOTSS be determined by the educational
needs of the trainees. For example, with junior trainees, the focus of training is
on developing basic surgical expertise, so it was advised that the NOTSS system
Figure 2.2 NOTSS skills taxonomy v1.2
Category Element
Situation Awareness Gathering information
Understanding information
Projecting and anticipating future state
Decision-making Considering options
Selecting and communicating option
Implementing and reviewing decisions
Communication and Teamwork Exchanging information
Establishing a shared understanding
Coordinating team
Leadership Setting and maintaining standards
Supporting others
Coping with pressure
Safer Surgery
18
be used for general discussion of non-technical skills and their importance to
clinical practice. For more senior trainees such as specialist registrars (SpRs), it
was suggested that the NOTSS system be used to rate skills and provide feedback
during increasingly challenging cases.
Most of the consultant surgeons had been trained to use the system in the three-
hour group session for the system evaluation study reported previously. Those
who did not participate in this session were given the same training course in a
one-to-one setting. Trainees attended an information session about non-technical
skills and the usability trial at their hospital. During this session, it was explained
that the NOTSS system has been designed to aid the development of professional
skills and that we were evaluating the system rather than assessing their skills
during the study. An online post-trial questionnaire was used to establish if using
NOTSS was of any value as an adjunct to the currently available surgical education
and assessment methods. An initial invitation to complete it was followed up
with a reminder after one month and a further reminder a month later. Self-report
measures were selected as the most appropriate method of gathering data on user
experiences although are not without limitations, as such data are by their nature
subjective, and susceptible to memory decay and social desirability bias.
In total, eleven consultant surgeons completed the usability trial. Data on
trainee surgeons were not tracked (to ensure that they were condent that the
purpose of the study was solely to assess the usability of the tool, rather than their
own competence) but analysis of completed feedback forms indicate that at least
12 trainees took part. The NOTSS system was used to observe and debrief on
non-technical skills during a total of 43 cases (mean 4 per consultant, range 1–8
cases). In all cases, the trainee was lead surgeon. In some cases the consultant
was an unscrubbed observer and on other occasions was scrubbed and assisting
as well as observing. The majority of trainers (90 per cent) thought that they had
received enough training to use the system and preferred to conduct the debrief
immediately after the operation (81 per cent) in the operating theatre suite. The
median length of debrief session was 3–5 minutes. See Figure 2.3 for an example
of a NOTSS rating card completed after a laparoscopic cholecystectomy which
mainly focused on the trainee’s ability to gather information about the patient,
communicate decisions to the team and work with the assistant and consultant
surgeon in a coordinated manner.
All trainers used ‘communication & teamwork’, 90 per cent used ‘situation
awareness’, 72 per cent used decision-making, and just over half (54 per cent)
used the leadership category. Some categories were not used by some trainers due
to the level of the trainee and the complexity of the procedure being completed.
The majority of surgical trainers thought that the NOTSS system was useful for
debrieng trainees and a valuable adjunct to currently available assessment tools.
The trainers were all in agreement that NOTSS provided a common language to
discuss non-technical skills and was useful to support reective practice, but there
were mixed opinions regarding the ease of rating non-technical skills. Although
45 per cent of trainers agreed that cognitive and interpersonal skills were easy
Development and Evaluation of the NOTSS Behaviour Rating System
19
Figure 2.3 Completed NOTSS rating form
Safer Surgery
20
to rate, 27 per cent found interpersonal skills difcult to rate compared with
only 9 per cent who felt cognitive skills were difcult to rate (Yule et al. 2008b).
The remaining trainers were ambivalent regarding ease of rating. Time can be a
precious commodity in the operating theatre but only 9 per cent of trainers thought
using NOTSS to debrief added too much time to their operating list and 73 per cent
thought that routine use of NOTSS would enhance safety in the operating theatre.
All trainers thought that NOTSS has a place in surgical education and assessment.
Comments from trainers indicated that positive aspects of the system for surgical
education were the transparent structure; common language; ability to objectively
assess skills; framework for providing feedback; ease of use in real-life situations,
and that using the system made time to discuss aspects of surgical performance
that are ‘usually ignored’. Although some trainers reported no difculties rating
behaviours using NOTSS, four main problems were articulated. These related to
understanding some descriptors in the NOTSS handbook; selecting an appropriate
trainee and case; observing and rating behaviours while also scrubbed, and an
over-reliance on communication to infer cognitive skills.
Discussion
The aims of the NOTSS project was to develop and evaluate a behavioural marker
system for surgeons’ non-technical skills using human factors methods and basing
the system development and associated rating scale on a skills taxonomy. These
aims were met and the prototype NOTSS system is being used by practising
surgeons and research groups in Australasia, Japan, Europe, and North America.
Further development of the tool is required and there remain some unanswered
questions such as the amount of training required for a practising surgeon to be
able to use the tool reliably, and whether observations and ratings have to be made
by surgeons (as opposed to anaesthetists, nurses or even psychologists) to be valid
and meaningful. A research group at Shefeld (see Chapter 4 of this volume)
are attempting to answer some of these questions. Other research teams have
developed tools to observe and rate the behaviours of surgical teams (Undre et al.
2007 – Imperial College) or have adapted the NOTECHS tool from civil aviation
(Flin et al. 2003) for use with surgeons in operating theatre (Sevdalis et al. 2008
– Imperial College, Mishra et al. 2008 – University of Oxford). These lines of
research differ in concept and approach but nonetheless enrich our understanding
of non-technical skills in surgery.
The focus of surgical training still heavily favours technical skill acquisition,
yet surgeons increasingly operate in teams with whom they may be unfamiliar,
especially in an emergency setting. The adoption of specic training in non-
technical areas of expertise is still done on an ad hoc basis although the Royal
Colleges of Surgery in Great Britain and Ireland all provide training in this
emerging area to some extent. These courses have so far been taken by enthusiastic
surgeons, both consultant and trainee but are not compulsory aspects of surgical
Development and Evaluation of the NOTSS Behaviour Rating System
21
training. The Royal College of Surgeons of Ireland however, provides funding
for all trainee surgeons to attend a human factors training course. As part of the
NOTSS evaluation, it emerged that training in using the system was not sufcient
for many users as they did not have background knowledge in psychology and
human factors. Therefore, we developed and ran training courses for surgeons,
introducing human factors and the basics of workplace assessment of behaviour.
This developed into a two-day course, specically on the NOTSS system in
2006, run with the Royal College of Surgeons of Edinburgh. This course was
then further developed to include wider surgical safety issues to become the SOS
(Safer Operative Surgery) courses which were run in 2007. These courses were
designed for higher trainee and consultant surgeons only and were based on task
analysis of surgeons’ non-technical skills, the NOTSS behaviour rating system, and
underlying psychology (Flin et al. 2007). In 2008/09 the Royal College of Surgeons
of Edinburgh is developing these courses for a multidisciplinary audience.
The Future of Non-Technical Skills in Surgical Education
Although not formally achieved yet, the future of surgical training will need to
encompass more than just clinical and technical skills (Davidson 2002). If the
aviation model was to be adopted in surgery then experienced consultant surgeons
would be taken off clinical work for a period to concentrate on assessing other
consultants’ non-technical skills. Assessments would be done using a framework
such as NOTSS to rate observable skills in a simulated environment and during
real cases in the operating theatre (similar to LOSA checks in aviation, see Chapter
25 in this volume by Musson). The assessors would be trained, calibrated, and
their competence to rate others assured at an acceptable a priori level. Crucially,
the assessments would be ‘high stakes’ and surgeons would have to pass the
assessment by displaying appropriate behaviours in order to continue operative
surgery. Surgeons who did not pass would be able to attend a remedial training
course for those skills requiring attention. This would require courses to be
developed (e.g., Flin et al. 2007), and the surgeon to then be assessed at a future
point before being allowed back into clinical practice. This process would apply to
consultant surgeons although senior trainee surgeons would be assessed and given
feedback on their non-technical skills as part of their ongoing training and may
have to pass a non-technical skills assessment as part of the selection process into
consultant grades. Research teams may be involved in the training and assurance
of assessors, instructors and practising surgeons, and would be interested in the
development of measures of behaviour and performance.
This model may not be appropriate for surgery and competence assessment
at this time, but in the near future, recertication will be introduced as a part of
revalidation, which will require global assessment of professional performance
including the skills referred to above. Moreover there are some promising
advancements: research teams are developing, validating and collecting data
Safer Surgery
22
with observational tools, appraisals are commonplace, and the introduction of
Procedure-Based Assessment (PBAs) has demonstrated that there is more to
surgery than technical skills, and that workplace assessment is the method by
which consultant surgeons of the future will be assessed. Perhaps as important
is that in some hospitals non-technical language is becoming common parlance
both intra-operatively and in the coffee room. However, the surgeons who use
behaviour rating scales and discuss non-technical skills with their trainees are still
in the minority. In order for widespread change in practice, a trigger is required,
such as ofcial endorsement by the Postgraduate Medical Education and Training
Board (PMETB) or the Intercollegiate Surgical Curriculum Programme (ISCP),
or inclusion in the processes of revalidation of doctors which is currently being
discussed.
The Future of NOTSS Research: Integrating Systemic Issues in the
Operating Theatre
NOTSS has been widely cited in the clinical literature, adopted by professional
bodies for training, and the system is being used by other research groups around
the world. However, a reliance solely on individual skills or even those of the
surgical team will not achieve the levels of safety required by patients. Feedback
from users of the NOTSS system indicated that aspects of surgery such as
scheduling, anaesthetic care, competence and experience of other staff, availability
of equipment in theatre, new technology and training also have an impact on
surgical performance and surgical outcomes. Attention to these components from
systems-based thinking have been found to be particularly useful in understanding
and improving the safety and reliability of complex systems in other high
consequence industries such as power generation and aviation (Perrow 1999).
There is emerging research on the impact of distractions (Sevdalis et al. 2007) and
latent failures (Catchpole et al. 2007) on patient safety in the operating theatre,
and tools for understanding the systemic causes of adverse events in the operating
theatre (Taylor-Adams and Vincent 2004) but we do not yet have a complete
understanding of the systems aspects that affect patient safety.
The Accreditation Council for Graduate Medical Education in the USA
explicitly demands that resident trainee surgeons obtain specic knowledge,
skills and attributes to demonstrate ‘systems-based practice’ (ACGME, 2007).
Professional skills training needs to incorporate content on systems thinking in
order to meet the demands of modern surgery, and this content should be based
on research evidence. In addition to the dangers that systems pose for safety, there
are also strengths embedded in surgical systems that make surgeons and surgical
teams resilient in the face of dynamic, error-producing conditions. A new project,
funded by the Royal College of Surgeons of Edinburgh is attempting to make
these aspects of the surgical system explicit and measurable. With this research
strategy, in time we will understand more about individual skills, the role of the
Development and Evaluation of the NOTSS Behaviour Rating System
23
team and how they interact with the system to protect or harm patients, and have
evidence-based tools and training to support the surgeons of the future.
References
ACGME (2007) Common Program Requirements: General Competencies
Accreditation Counsel for Graduate Medical Education. Available from:
<www.acgme.org/outcome/comp/GeneralCompetenciesStandards21307.pdf>
[accessed October 2008].
Baldwin, P.J., Paisley, A.M. and Paterson-Brown, S. (1999) Consultant surgeons’
opinions of the skills required of basic surgical trainees. British Journal of
Surgery 86, 1078–82.
Baker, D., Mulqueen, C. and Dismukes, R. (2001) Training raters to assess resource
management skills. In E. Salas, C. Bowers and E. Edens (eds), Improving
Teamwork in Organizations. New Jersey: LEA, 131–45.
Catchpole, K.R., Giddings, A.E.B., Wilkinson, M., Hirst, G., Dale, T. and de Leval,
M. (2007) Improving patient safety by identifying latent failures in successful
operations. Surgery 142, 102–10.
Christian, C., Gustafson, M., Roth, E., Sheridan T., Gandhi, T., Dwyer, K., Zinner,
M. and Dierks, M. (2006) A prospective study of patient safety in the operating
room. Surgery 139, 159–73.
Crandall, B., Klein, G. and Hoffman, R. (2006) Working Minds: A Practitioner’s
Guide to Cognitive Task Analysis. Boston: MIT Press.
Davidson, P. (2002) The surgeon of the future and implications for training. ANZ
Journal of Surgery 72, 822–8.
Edmondson, A.C. (2003) Speaking up in the operating room: How team leaders
promote learning in interdisciplinary action teams. Journal of Management
Studies 40(6), 1419–52.
Flanagan, J. (1954) The critical incident technique. Psychological Bulletin 51,
327–58.
Fletcher, G., Flin, R., McGeorge, P., Glavin, R., Maran, N. and Patey, R. (2004)
Rating non-technical skills: Developing a behavioural marker system for use
in anaesthesia. Cognition Technology and Work 6, 165–71.
Flin, R., Goeters, K., Amalberti, R., et al. (2003) The development of the
NOTECHS system for evaluating pilots’ CRM skills. Human Factors and
Aerospace Safety 3, 95–117.
Flin, R., Yule, S., McKenzie, L., Paterson-Brown, S. and Maran, N. (2006a)
Attitudes to teamwork and safety in the operating theatre. The Surgeon 4,
145–51.
Flin, R., Yule, S., Paterson-Brown, S., Maran, N. and Rowley, D. (2006b) The Non-
Technical Skills for Surgeons (NOTSS) System Handbook (v1.2). Available at:
<www.abdn.ac.uk/iprc/notss>