Tải bản đầy đủ (.pdf) (5 trang)

Đề ôn thi thử môn hóa (525)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (148.4 KB, 5 trang )


43. Watson RS, Crow SS, Hartman ME, Lacroix J, Odetola FO. Epidemiology and outcomes of pediatric multiple organ dysfunction
syndrome. Pediatr Crit Care Med. 2017;18:S4-S16.
44. Bodet-Contentin L, Frasca D, Tavernier E, Feuillet F, Foucher Y, Giraudeau B. Ventilator-free day outcomes can be misleading. Crit Care
Med. 2018;46:425-429.
45. Fiser DH. Assessing the outcome of pediatric intensive care [comment]. J Pediatr. 1992;121:68-74.
46. Pollack MM, Holubkov R, Funai T, et al. Pediatric intensive care
outcomes: development of new morbidities during pediatric critical
care. Pediatr Crit Care Med. 2014;15:821-827.
47. Merritt C, Menon K, Agus MSD, et al. Beyond survival: pediatric
critical care interventional trial outcome measure preferences of
families and healthcare professionals. Pediatr Crit Care Med. 2018;
48. Heyland DK, Hopman W, Coo H, Tranmer J, McColl MA. Longterm health-related quality of life in survivors of sepsis. Short form
36: a valid and reliable measure of health-related quality of life. Crit
Care Med. 2000;28:3599-3605.
49. Curley MA, Wypij D, Watson RS, et al. Protocolized sedation vs usual
care in pediatric patients mechanically ventilated for acute respiratory
failure: a randomized clinical trial. JAMA. 2015;313:379-389.
50. Aspesberro F, Mangione-Smith R, Zimmerman JJ. Health-related
quality of life following pediatric critical illness. Intensive Care Med.
51. Heneghan C, Goldacre B, Mahtani KR. Why clinical trial outcomes
fail to translate into benefits for patients. Trials. 2017;18.
52. Rho JH, Bauman AJ, Boettger HG, Yen TF. A search for porphyrin
biomarkers in Nonesuch Shale and extraterrestrial samples. Space
Life Sci. 1973;4:69-77.
53. Lassere MN. The Biomarker-Surrogacy Evaluation Schema: a review
of the biomarker-surrogate literature and a proposal for a criterionbased, quantitative, multidimensional hierarchical levels of evidence
schema for evaluating the status of biomarkers as surrogate endpoints.

Stat Methods Med Res. 2008;17:303-340.
54. Pocock SJ, Clayton TC, Stone GW. Challenging issues in clinical
trial design: part 4 of a 4-part series on statistics for clinical trials.
J Am Coll Cardiol. 2015;66:2886-2898.

55. Mdege ND, Brabyn S, Hewitt C, Richardson R, Torgerson DJ.
The 2 x 2 cluster randomized controlled factorial trial design is
mainly used for efficiency and to explore intervention interactions: a systematic review. J Clin Epidemiol. 2014;67:10831092.
56. Sedgwick P. What is a crossover trial? BMJ. 2014;348:g3191.
57. Piaggio G, Elbourne DR, Altman DG, Pocock SJ, Evans SJ. Reporting of noninferiority and equivalence randomized trials: an
extension of the CONSORT statement. JAMA. 2006;295:11521160.
58. US Food and Drug Administration. Step 3: Clinical Research. fda.
59. Hernan MA, Robins JM. Per-Protocol analyses of pragmatic trials.
N Engl J Med. 2017;377:1391-1398.
60. Pocock SJ, McMurray JJV, Collier TJ. Statistical controversies in
reporting of clinical trials: part 2 of a 4-part series on statistics for
clinical trials. J Am Coll Cardiol. 2015;66:2648-2662.
61. Wood AM, White IR, Thompson SG. Are missing outcome data
adequately handled? A review of published randomized controlled trials in major medical journals. Clin Trials. 2004;1:368376.
62. Cook RJ, Sackett DL. The number needed to treat: a clinically useful
measure of treatment effect. BMJ. 1995;310:452-454.
63. Moore GW, Hutchins GM, Miller RE. Token swap test of significance for serial medical data bases. Am J Med. 1986;80:182-190.
64. Feinstein AR. The unit fragility index: an additional appraisal of
“statistical significance” for a contrast of two proportions. J Clin
Epidemiol. 1990;43:201-209.
65. Walsh M, Srinathan SK, McAuley DF, et al. The statistical significance of randomized controlled trial results is frequently fragile: a
case for a Fragility Index. J Clin Epidemiol. 2014;67:622-628.
66. Matics TJ, Khan N, Jani P, Kane JM. The Fragility of statistically significant findings in pediatric critical care randomized controlled
trials. Pediatr Crit Care Med. 2019;20(6):e258-e262.

67. Dervan LA, Watson RS. The Fragility of using p value less than 0.05
as the dichotomous arbiter of truth. Pediatr Crit Care Med. 2019;20:

Prediction of Short-Term Outcomes During
Critical Illness in Children


Physiologic instability is a key factor in the prediction of shortterm outcomes in critically ill patients.
Prediction tools are central to controlling for severity of illness
in studies and unit-based quality assessments for both internal
and external benchmarking.
Regression analysis is typically the central technique for
constructing outcomes prediction tools.

Ongoing efforts to provide high-quality, error-free care require

both the evaluation of complex systems and an assessment of the
quality of care. Outcomes research is an important aspect of both
requirements. Scoring systems add objectivity to these assessments, especially in critical care units. Controlling for population
differences, such as differences in severity of illness, enables both
the inclusion of different healthcare systems in a single investigative effort and contrasting individual healthcare systems in quality
of care assessments. Measuring mortality adjusted for physiologic
status and other case mix factors has been the core methodology
of adult, pediatric, and neonatal intensive care assessments for
decades for both internal and external benchmarking.
However, mortality rates in most pediatric intensive care units
(PICUs) have decreased since these methods were developed.
Medical therapies increasingly focus on reducing morbidity in
survivors. Unfortunately, most quantitative outcome assessment
methods continue to focus on the dichotomous outcomes of survival and death. Recently, there has been a new appreciation of the
importance of other patient outcomes, such as discharge functional status, and better understanding of their determinants. The
future will most likely see a diversity of patient outcomes of interest, methods to associate risk factors with these outcomes, and use
of these risk factors for outcome prediction.

Historical Perspective
The “modern” history of intensive care unit (ICU) scoring systems
started with the Clinical Classification Scoring (CCS) system and
the Therapeutic Intervention Scoring System (TISS).1 Although

Assessment of the validity of prediction tools centers on two
statistical measures: discrimination and calibration.
Although mortality has historically been the outcome of
interest, prediction tools for morbidity have recently been
developed, as well as for clinical outcomes such as length of
stay and reintubation.

simple, the CSS system established the basis of severity of illness
as a concept related to both physiologic instability and amount
and intensity of therapy, ranging from routine inpatient care to
the need for frequent physician and nursing assessments and/or
therapeutic interventions. The TISS was based on the concept
that sicker patients receive more therapy, such as mechanical ventilation or vasoactive agent infusions; thus, the number and sophistication of therapies serves as a proxy for severity of illness.
Initially, 76 therapies and monitoring techniques were graded
from 1 to 4 on the basis of complexity, skill, and cost. The TISS
score still exists today, although the number of therapies has been
reduced and objectivity has been added to the score.2
The concepts of sequential or multiple organ system failures
(MOSFs) were also important in the development of the concepts
of severity of illness. Mortality rates increased as the number of
failed organ systems increased. The MOSF syndrome was initially
described in children in 1986.3 Although there have been numerous minor adjustments to the definition of an organ system failure, it continues to be based on the initial concepts of failure defined as extreme physiologic dysfunction or use of a therapy
preventing that dysfunction.
Organ system failures have also been proposed as an outcome
measure; since death is uncommon in PICUs, it is appealing to
postulate that the number of organ failures or the temporal resolution of these organ failures could be a practical outcome. New
or progressive multiple-organ dysfunction has been used as an
outcome measure for large recently completed and ongoing studies.4,5 Additionally, recent studies have examined the relationship

between the number of dysfunctional organ systems and patient

CHAPTER 12  Prediction of Short-Term Outcomes During Critical Illness in Children

outcomes, including in general pediatric critical care patients6 as
well as subgroups of patients with severe sepsis7 or bone marrow
Physiologic status is the underlying foundational concept for
MOSF and the TISS score. Conceptually, severity of illness may
be considered a continuous variable with extremes of outcomes
(survival, death) occurring at low and high values. The threshold
value determining survival or death is unknown and may vary
from patient to patient. Physiologic instability has been an exceptionally productive concept expressed in multiple scoring systems
in pediatric, neonatal, and adult intensive care with systems such
as the Pediatric Risk of Mortality (PRISM) score, Score for Neonatal Acute Physiology (SNAP), Acute Physiology and Chronic
Health Evaluation (APACHE), and many others.
Recently, the development of new morbidity during critical
illness has also been related to physiologic instability, with the
morbidity risk rising as the instability increases until, at higher
states of instability, high morbidity risk transitions to mortality
risk. Interest in and investigation of morbidity have been hindered by the lack of measurement methods that are reliable, relevant, and practical for large studies. The development of the
Functional Status Scale and its use in a national study of more
than 10,000 critically ill children hold promise that morbidity
will be a more important and relevant outcome in critical care
assessments.9 Since its publication, the Functional Status Scale has
been used to measure outcomes in general PICU patients as well
as subgroups of children with traumatic brain injury and other
traumatic injuries, those undergoing stem cell transplantation,
and those requiring extracorporeal membrane oxygenation.

Conceptual Framework
When possible, the severity method should include variables fundamental to the issues being assessed. The fundamental role of
pediatric critical care has been to monitor and treat physiologic
instability. The development of severity measures has mirrored
this role, first as descriptive categories, then as quantification of
therapy designed to treat physiologic instability, and, finally, with
physiologic instability itself as the foundational concept. Databases have become larger, and the availability of descriptive, categorical, and diagnostic data that they contain has increased.
These data can also be associated with severity of illness and are
being used for quality measures such as standardized mortality
ratios and measures of severity of illness in academic studies.
However, variables such as diagnosis and operative status are
proxy variables whose risk estimation is, at least in part, one or
more steps removed from physiologic status. Therefore, they are
only indirect measures of severity that are vulnerable to “gaming”
to alter an individual site’s results. Methods based on primarily
categorical data often do not perform well across variable critical
care environments.

Statistical Issues
Regression analysis is typically the central technique for constructing outcome prediction tools. The type of outcome variable
(e.g., continuous, dichotomous) is one determinant of the type of
regression analysis used. Multiple linear regression analysis is most
often used for models that seek to predict outcomes that are continuous variables (e.g., length of stay). Logistic regression analysis


is most often used for models that seek to predict outcomes that
are categorical variables (e.g., survival/death).

As data science applications in medicine have become more
sophisticated and datasets have become larger, many areas of
analysis have incorporated the use of machine-learning models to
understand and predict patient outcomes. These can generally be
thought of as a continuum of methods to approach data with different strengths and weaknesses. Traditional statistical analysis
typically attempts to assign a relationship between a set of variables within a sample, while machine learning attempts to generate a function or pattern that can be generalized for prediction. In
general, the data characteristics assumed in a machine-learning
approach will be less restrictive than those for traditional statistical modeling. Finally, machine-learning approaches are especially
well suited to large datasets, while traditional statistical modeling
becomes more unwieldy with more complex inputs. However, as
with traditional statistical tests, machine-learning algorithms each
have unique characteristics that impact overall performance.
Regardless of how a prediction tool is created, the assessment
of its validity centers on two statistical measures: discrimination
and calibration.10 Discrimination is the accuracy of a model in
differentiating outcome groups and is most often assessed by the
area under the receiver operating characteristic curve (AUC),
which is equivalent to the C statistic. Broadly, this represents the
average sensitivity of the test when modeled over all possible
specificities. An AUC 5 1 represents a model with perfect accuracy; an AUC 5 0.5 represents a model with no apparent accuracy. A rough guide for model discriminatory performance is as
follows: AUC 5 0.9–1.0 (excellent), 0.8–0.9 (good), 0.7–0.8
(fair), 0.6–0.7 (poor), and 0.5–0.6 (unacceptable).
Calibration refers to the ability of a model to assign the correct probability of outcome to patients over the entire range of
risk prediction. In practical terms for an outcome such as mortality, calibration assesses whether the model-estimated probability
of mortality for patients with a particular covariate pattern agrees
with the actual observed mortality rate. The most accepted
method for measuring calibration is the Hosmer-Lemeshow
goodness-of-fit test. Although the AUC is helpful in determining
overall characteristics of the test, it does not allow for comparison
between the individual specificity or sensitivity of the test. Additionally, as researchers use large datasets more commonly, an

artificial increase in the AUC may be seen due to the large sample
sizes, particularly when the model is overfit. The use of positive
predictive values, which incorporates the prevalence of the queried condition, may better represent the performance of predictive models. Finally, the AUC may be not fully representative of
unbalanced patient samples. This is a particular concern with
outcomes such as mortality in pediatric critical care, which occur
relatively rarely. It remains important to consider a variety of test
characteristics when assessing the suitability of a specific test or
An important issue in developing and evaluating severity models is the population used to derive and validate the method. The
models are based on the populations used to develop them. For
example, the Vermont Oxford Neonatal outcome predictor was
developed in a large population from inborn nurseries and has
been criticized for its lack of applicability to referral centers. The
Paediatric Index of Mortality (PIM) and its subsequent updates
(PIM2 and PIM3) were developed in predominantly Australian
and European populations where the relationship of categorical
and physiologic variables to outcome may be different than in the
United States or developing countries.


S E C T I O N I I   Pediatric Critical Care: Tools and Procedures

Current Prediction Tools for Assessment of
Mortality Risk
Neonatal Intensive Care Unit Prediction Methods
Three well-established prediction methods are used for the assessment of severity of illness and mortality risk in neonates: the
Clinical Risk Index for Babies II (CRIB II),11 SNAP-II,12 and the
Vermont Oxford Network risk adjustment.13 All scores can be

calculated during the first 12 hours of life.
CRIB II is the second generation of CRIB, which was developed in the United Kingdom from 812 neonates born at less than
31 weeks’ gestation or weighing less than 1500 g.14 CRIB II is a
simplified version of CRIB, validated on 3027 neonates born at
32 weeks’ gestation or less. It is a five-item score composed of sex,
gestation, birth weight, admission temperature, and worst base
excess in first 12 hours of life.
SNAP-II is the second generation of SNAP, which was a
physiology-based severity of illness score with 34 variables for
babies of all birth weights from the United States and Canada.15
SNAP-II simplified SNAP to six physiologic variables: mean
blood pressure, lowest temperature, Pao2/Fio2 ratio, lowest serum
pH, seizure activity, and urine output. In an effort to improve the
predictive capabilities of SNAP-II for mortality, three additional
variables were added: birth weight, small for gestation age, and
Apgar (appearance, pulse, grimace, activity, and respiration) score
below 7 at 5 minutes. The resulting nine-variable score for prediction of mortality risk was named Score for Neonatal Acute Physiology with Perinatal Extension (SNAPPE-II).
The Vermont Oxford Network is a network of more than 800
institutions worldwide that maintains databases on interventions
and outcomes for infants cared for at member institutions. The
basic Vermont Oxford Network risk adjustment model includes
variables for gestational age, race, sex, location of birth, multiple
birth, 1-minute Apgar score, small for gestational age, major birth
defect, and mode of delivery, with additional features included in
prediction models for very- and extremely-low-birth-weight infants, those with chronic lung disease, or those with birth defects.
Revalidation efforts of these tools employing a variety of data
sources have demonstrated largely similar discriminatory abilities
among the tools. Using data from the Vermont Oxford Network,
Zupancic et al. validated SNAPPE-II on nearly 10,000 infants
with similar performance to the Vermont Oxford Network risk

adjustment.16 Within this study cohort, the addition of congenital anomalies to SNAPPE-II improved discrimination significantly. Reid et al. compared CRIB-II and SNAPPE-II in a cohort
of Australian preterm infants and found similar performance between the tools and good overall discriminatory ability.17

Pediatric Intensive Care Unit Prediction Tools
The prediction of mortality in the PICU has centered primarily
on the use of two different acuity scoring systems, the PRISM
score18 and the PIM3.19 Historically, these systems have been
thought to be quite effective in discrimination but to lack robust
PRISM is a fourth-generation physiology-based score for quantifying physiologic status and mortality prediction (Table 12.1). The
original tool was developed on 11,165 patients from 32 different
PICUs in the United States and includes 21 physiologic variables.
The mortality predictions are routinely updated, the last update

being completed on 19,000 patients. Among PRISM’s strengths are
its flexibility to extend beyond mortality prediction to provide riskadjusted PICU length-of-stay estimates.20,21 Historically, PRISM
mortality risk assessments were made using physiologic data from
the initial 12 hours of PICU care. Notably, PRISM quantifies
physiologic status and uses categorical variables to facilitate accurate
estimation of mortality risk.
Recently, the Collaborative Pediatric Critical Care Research
Network (CPCCRN) of the National Institute of Child Health
and Human Development used data from more than 10,000 patients to improve PRISM by reducing bias and other potential
sources of error.22 The new version of PRISM uses only the first
PICU admission, and hospital outcome is predicted. Initially,
PRISM used PICU outcome and subsequent PICU admissions in
the same hospitalizations with additional mortality risk. However,
decisions around discharge timing and location are important
aspects of quality of care. For example, an inappropriately discharged PICU patient with a subsequent PICU readmission during the same hospitalization was previously credited as a good
outcome for the first admission, while the subsequent admission

had an additional mortality risk credited to the subsequent PICU
admission mortality risk. Therefore, the subsequent PICU admission mortality risk was inflated even though it was associated with
the premature or inappropriate discharge.
Second, the PRISM observation time period has changed from
the sampling period for the first 12 hours of care to a significantly
shorter time period (2 hours before admission to 4 hours after
admission for laboratory data and the first 4 hours of PICU care
for other physiologic variables) since this better represents the patient’s underlying physiology instead of response to therapy.23
Third, admission of cardiovascular surgery patients for “optimizing” therapy or observation before their intervention is now common in many institutions, which necessitated a new definition of
the PRISM observation period. An objective method to determine
the PRISM observation for cardiovascular patients is now available. Finally, when PRISM was initially developed, the scores for
physiologic derangements for each variable were calibrated to mortality odds ratios so that the PRISM score for each variable represented equivalent risk. Due to concerns that these variables may no
longer represent equivalent risk, the new PRISM algorithm partitions PRISM into neurologic and nonneurologic components for
outcome prediction. PRISM algorithms for mortality prediction
and morbidity prediction are publicly available.21,23
The PIM3 mortality prediction model was developed from
53,112 patients from 60 PICUs in Australia, New Zealand,
Ireland, and the United Kingdom (Table 12.2). PIM3 requires
10 variables collected from the time of initial patient contact to 1
hour after arrival in the PICU.19 In contrast with PRISM III,
PIM3 uses only four physiologic variables but includes six categorical variables that classify patients on the basis of reason for
admission, use of mechanical ventilation in the first hour, and
diagnostic risk strata. PIM3 has not been extensively tested in the
United States.
Numerous obvious differences distinguish PRISM III from
PIM3 (e.g., interval for data collection, number of physiologic
variables, inclusion of nonphysiologic data). The impact that these
differences have on mortality prediction in the form of bias must
be considered.10,21 Foundationally, PRISM quantifies physiologic
instability (PRISM III score) and uses categorical variables to facilitate accurate estimation of mortality risk while PIM estimates

mortality risk only. PIM has not performed well in cardiovascular
surgical populations in which outcome is strongly associated with

CHAPTER 12  Prediction of Short-Term Outcomes During Critical Illness in Children


Pediatric Risk of Mortality (PRISM) Score IV
For computation of mortality and morbidity risk, physiologic variables are measured only in the first 4 hours of pediatric intensive care unit (PICU) care and
laboratory variables are measured in the time period from 2 hours before PICU admission through the first 4 hours. See references for the appropriate
time periods to assess cardiovascular surgical patients younger than 3 months of age. The neurologic PRISM IV consists of the mental status and pupillary
reflex parameters. Only the first PICU admission is scored. Check publications for the most up-to-date prediction algorithms.


Pco2 (mm Hg)

Systolic blood pressure (mm Hg)









Total CO2 (mmol/L)





Pao2 (mm Hg)
















.200 mg/dL or .11


Potassium (mmol/L)




,33°C or .40°C


Mental status



.11.9 mg/dL or .4.3


Stupor/coma or GCS ,8

All other ages

.14.9 mg/dL or .5.4



.0.85 mg/dL or .75



.0.90 mg/dL or .80



.0.90 mg/dL or .80



.1.30 mg/dL or .115



Heart rate (beats/min)
Pupillary reflexes

















One fixed


Both fixed


Acidosis (pH or total CO2)














Blood urea nitrogen


White blood cell count (cells/mm3) ,3000


Platelet count (3103 cells/mm3) 100–200







PT .22 or PTT .85


All other ages

PT .22 or PTT .57


PT or PTT (sec)

GCS, Glasgow Coma Scale; PT, prothrombin time; PTT, partial thromboplastin time.

postoperative physiologic status.24 PIM3 uses a 1-hour (vs. 4 hours

for PRISM III) PICU observation time, which might imply that it
is potentially less affected by PICU therapies. However, the variable
observation period before PICU admission could impose significant institution-level bias on the basis of the percent of patients
transported to the PICU from other locations and involvement of

the PICU team in the transport or emergency department care.
PIM3 includes a therapeutic intervention (mechanical ventilation)
as a predictor variable that introduces bias from the prehospital and
emergency department settings and introduces a therapy into the
score when the use of the score to evaluate quality of care is closely
related to the provision of therapy.
