Tải bản đầy đủ (.pdf) (682 trang)

2015 (statistics for social and behavioral sciences) russell g almond, robert j mislevy, linda s steinberg, duanli yan, david m williamson (auth ) bayesian networks in educational assessment springer v

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (14.19 MB, 682 trang )


Statistics for Social and Behavioral
Sciences

Series Editor
Stephen E. Fienberg
Carnegie Mellon University Dept. Statistics
Pittsburgh
Pennsylvania
USA


Statistics for Social and Behavioral Sciences (SSBS) includes monographs and
advanced textbooks relating to education, psychology, sociology, political science, public policy, and law.

More information about this series at />

Russell G. Almond • Robert J. Mislevy
Linda S. Steinberg • Duanli Yan
David M. Williamson

Bayesian Networks in
Educational Assessment

2123


Russell G. Almond
Florida State University
Tallahassee
Florida


USA

Duanli Yan
Educational Testing Service
Princeton
New Jersey
USA

Robert J. Mislevy
Educational Testing Service
Princeton
New Jersey
USA

David M. Williamson
Educational Testing Service
Princeton
New Jersey
USA

Linda S. Steinberg
Pennington
New Jersey
USA

Statistics for Social and Behavioral Sciences
ISBN 978-1-4939-2124-9
ISBN 978-1-4939-2125-6 (eBook)
DOI 10.1007/978-1-4939-2125-6
Library of Congress Control Number: 2014958291

Springer New York Heidelberg Dordrecht London
c Springer Science+Business Media New York 2015
This work is subject to copyright. All rights are reserved by the Publisher, whether the
whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any
other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter
developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in
this publication does not imply, even in the absence of a specific statement, that such names
are exempt from the relevant protective laws and regulations and therefore free for general
use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the
publisher nor the authors or the editors give a warranty, express or implied, with respect
to the material contained herein or for any errors or omissions that may have been made.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)


Dedication
Forward into future times we go
Over boulders standing in out way
Rolling them aside because we know
Others follow in our steps one day
Under deepest earth the gems are found
Reaching skyward ’till we grasp the heights
Climbing up to where the view surrounds
Hidden valleys offer new delights
Inch by inch and yard by yard until
Luck brings us to the hidden vale
Desiring a place to rest yet still
Returning home now to tell the tale

Ever knowing when that day does come
New hands will take up work left undone


Acknowledgements

Bayesian Inference in Educational Assessments (BNinEA) is the direct issue
of two projects, and the descendant, cousin, or sibling of many more. We are
grateful for all we have learned in these experiences from many collaborators
and supporters over the years.
The first direct ancestor is the series of workshops we have presented at
the annual meeting of the National Council on Measurement in Education
(NCME) almost every year since 2001. We are grateful to NCME for this
opportunity and to ETS for support in developing the materials they have
granted us permission to use. Workshop participants will recognize many concepts, algorithms, figures, and hands-on exercises from these sessions.
Its second direct ancestor is the Portal project at Educational Testing
Service. It was here that Linda, Russell, and Bob fleshed out the evidencecentered design (ECD) assessment framework, implemented it in an object
model and design system, and carried out the applications using Bayes nets.
We are grateful to Henry Braun and Drew Gitomer, successive vice-presidents
of Research, and Len Swanson, head of New Product Development, for supporting Portal. Our collaborators included Brian Berenbach, Marjorie Biddle,
Lou DiBello, Howie Chernik, Eddie Herskovits, Cara Cahallan Laitusis, Jan
Lukas, Alexander Matukhin, and Peggy Redman.
Biomass (Chaps. 14 and 15) was a ground-up demonstration of Portal,
ECD design, standards-based science assessment, web-delivered interactive
testing, with automated scoring of inquiry investigations and Bayes net measurement models. The Biomass team included Andy Baird, Frank Jenkins,
our subject matter lead Ann Kindfield, and Deniz Senturk. Our subject matter consultants were Scott Kight, Sue Johnson, Gordon Mendenhall, Cathryn
Rubin, and Dirk Vanderklein.
The Networking Performance Skill System (NetPASS) project was an
online performance-based assessment activity for designing and troubleshooting computer networks. It was developed in collaboration with the Cisco Networking Academy Program, and led by John Behrens of Cisco. NetPASS featured principled design of proficiency, task, and evidence models and a Bayes



VIII

Acknowledgements

net psychometric model using the methods described in BNinEA. Team members included Malcolm Bauer, Sarah DeMark, Michael Faron, Bill Frey, Dennis
Frezzo, Tara Jennings, Peggy Redman, Perry Reinert, and Ken Stanley. The
NetPASS prototype was foundational for the Packet Tracer simulation system and Aspire game environment that Cisco subsequently developed, and
millions of Cisco Network Academy (CNA) students around the world have
used operationally to learn beginning network engineering skills.
The DISC scoring engine was a modular Bayes-net-based evidence accumulation package developed for the Dental Interactive Simulations Corporation
(DISC), by the Chauncey Group International, ETS, and the DISC Scoring Team: Barry Wohlgemuth, DISC President and Project Director; Lynn
Johnson, Project Manager; Gene Kramer; and five core dental hygienist members, Phyllis Beemsterboer, RDH, Cheryl Cameron, RDH, JD, Ann Eshenaur,
RDH, Karen Fulton, RDH, and Lynn Ray, RDH. Jay Breyer was the Chauncey
Group lead, and was instrumental in conducting the expert–novice studies and
constructing proficiency, task, and evidence models.
Adaptive Content with Evidence-based Diagnosis (ACED) was the brainchild of Valerie J. Shute. It had a large number of contributors including Larry
Casey, Edith Aurora Graf, Eric Hansen, Waverly Hester, Steve Landau, Peggy
Redman, Jody Underwood, and Diego Zapata-Rivera. ACED development
and data collection were sponsored by National Science Foundation Grant
No. 3013202. The complete ACED models and data are available online; see
the Appendix for details.
Bob’s initial forays into applying Bayesian networks in educational assessment were supported in part by grants from the Office of Naval Research
(ONR) and from the National Center for Research on Evaluation, Standards,
and Student Testing (CRESST) at the University of California at Los Angeles. We are grateful to Charles Davis, Project Officer of ONR’s Model-Based
Measurement program and Eva Baker, Director of CRESST, for their support. Much of the work we draw on here appears in ONR and CRESST
research reports. The findings and opinions expressed in BNinEA, however,
do not reflect the positions or policies of ONR, the National Institute on Student Achievement, Curriculum, and Assessment, the Office of Educational
Research and Improvement, or the U.S. Department of Education.
Some of the ideas in this book are based on Russell’s previous work on

the Graphical-Belief project. Thanks to Doug Martin at StatSci for sponsoring that project as well as the NASA Small Business Innovation Research
(SBIR) program for supporting initial development. David Madigan made a
number of contributions to that work, particularly pointing out the importance of the weight of evidence. Graphical-Belief is based on the earlier
work on Belief while Russell was still a student at Harvard. Art Dempster
and Augustine Kong both provided valuable advise for that work. The work
of Glenn Shafer and David Schum in thinking about the representation of
evidence has been very useful as well. Those contributions are documented in
Russell’s earlier book.


Acknowledgements

IX

Along with the NCME training sessions and working on NetPASS and
DISC, David completed his doctoral dissertation on model criticism in Bayes
nets in assessment at Fordham University under John Walsh, with Russell
and Bob as advisors.
Hydrive was an intelligent tutoring system for helping trainees learn to
troubleshoot the hydraulics subsystems of the F-15 aircraft. Drew Gitomer
was the Principal Investigator and Linda was the Project Manager. The
project was supported by Armstrong Laboratories of the US Air Force, under
the Project Officer Sherrie Gott. Design approaches developed in Hydrive
were extended and formalized in ECD. Bob and Duanli worked with Drew
and Linda to create and test an offline Bayes net scoring model for Hydrive.
Russell and Bob used drafts of BNinEA in classes at Florida State University (FSU) and the University of Maryland, respectively. We received much
helpful feedback from students to clarify our ideas and sharpen our presentations. Students at Maryland providing editorial and substantive contributions included Younyoung Choi, Roy Levy, Junhui Liu, Michelle Riconscente,
and Daisy Wise Rutstein. Students at FSU, providing feedback and advice,
included Mengyao Cui, Yuhua Guo, Yoon Jeon Kim, Xinya Liang, Zhongtian
Lin, Sicong Liu, Umit Tokac, Gertrudes Velasquez, Haiyan Wu, and Yan Xia.

Kikumi Tatsuoka has been a visionary pioneer in the field of cognitive
assessment, whose research is a foundation upon which our work and that
many others in the assessment and psychometric communities builds. We
are grateful for her permission to use her mixed-number subtraction data in
Chaps. 6 and 11.
Brent Boerlage, of Norsys Software Corp., has supported the book in a
number of ways. First and foremost, he has made the student version of Netica
available for free, which has been exceedingly useful in our classes and online
training. Second, he has offered general encouragement for the project and
offered to add some of our networks to his growing Bayes net library.
Many improvements to a draft of the book resulted from rigorous attention
from the ETS review process. We thank Kim Fryer, the manager of editing
services in the Research and Development division at ETS, Associate Editors
Dan Eignor and Shelby Haberman, and the reviewers of individual chapters:
Malcolm Bauer, Jianbin Fu, Aurora Graf, Shelby Haberman, Yue Jia, Feifei
Li, Johnny Lin, Ru Lu, Frank Rijmen, Zhan Shu, Sandip Sinharay, Lawrence
Smith, Matthias von Davier, and Diego Zapata-Rivera.
We thank ETS for their continuing support for BNinEA and the various
projects noted above as well as support through Bob’s position as Frederic
M. Lord Chair in Measurement and Statistics under Senior Vice-President
for Research, Ida Lawrence. We thank ETS for permission to use the figures
and tables they own and their assistance in securing permission for the rest,
through Juana Betancourt, Stella Devries, and Katie Faherty.
We are grateful also to colleagues who have provided support in more general and pervasive ways over the years, including John Mark Agosta, Malcolm
Bauer, Betsy Becker, John Behrens, Judy Goldsmith, Geneva Haertel, Sidney


X

Acknowledgements


Irvine, Kathryn Laskey, Roy Levy, Bob Lissitz, John Mazzeo, Ann Nicholson,
Val Shute, and Howard Wainer.
It has taken longer than it probably should have to complete Bayesian
Networks in Educational Assessment. For their continuing encouragement
and support, we are indebted to our editors at Springer: John Kimmel, who
brought us in, and Jon Gurstelle and Hannah Bracken, who led us out.


Using This Book

An early reviewer urged us to think of this book not as a primer in Bayesian
networks (there are already several good titles available, referenced in this
volume), but to focus instead on the application: the process of building the
model. Our early reviewers also thought that a textbook would be more useful
than a monograph, so we have steered this volume in that particular way. In
particular, we have tried to make the book understandable to any reasonably
intelligent graduate students (and several of our quite intelligent graduate
students have let us know when we got too obscure), as this should provide
the broadest possible audience.
In particular, most chapters include exercises at the end. We have found
through both our classes and the NCME training sessions, that students do
not learn from our lectures or writing (no matter how brilliant) but from
trying to apply what they heard and read to new problems. We would urge
all readers, even just the ones skimming to try the exercises. Solutions are
available from Springer or from the authors.
Another thing we have found very valuable in using the volume educationally is starting the students early with a Bayesian network tool. Appendix A
lists several tools, and gives pointers to more. Even in the early chapters,
merely using the software as a drawing tool helps get students thinking about
the ideas. Of course, student projects are an important part of any course

like this. Many of the Bayes net collections used in the example are available
online; Appendix A provides the details.
We have divided the book into three parts, which reflect different levels
of complexity. Part I is concerned with the basics of Bayesian networks, particularly developing the background necessary to understand how to use a
Bayesian network to score a single student. It begins with a brief overview of
the ECD. The approach is key to understanding how to use Bayesian networks
as measurement models as an integral component of assessment design and
use from the beginning, rather than simply as a way to analyze data once it
is in hand. (To do the latter is to be disappointed—and it is not the fault of
Bayes nets!) It ends with Chap. 7, which goes beyond the basics to start to


XII

Using This Book

describe how the Bayesian model supports inference more generally. Part II
takes up the issue of the calibrating the networks using data from students.
This is too complex a topic to cover in great depth, but this section explores
parameterizations for Bayesian networks, looks at updating models from data
and model criticism, and ends with a complete example. Part III expands
from the focus on mechanics to embedding the Bayesian network in an assessment system. Two chapters describe the conceptual assessment framework
and the four-process delivery architecture of ECD in greater depth, showing
the intimate connections among assessment arguments, design structures, and
the function of Bayesian networks in inference. Two more chapters are then
devoted to the implementation of Biomass, one of the first assessments to be
designed from the ground up using ECD.
When we started this project, it was our intention to write a companion volume about evidence-centered assessment design. Given how long this
project has taken, that second volume will not appear soon. Chapters 2, 12,
and 13 are probably the best we have to offer at the moment. Russell has

used them with some success as standalone readings in his assessment design
class. Although ECD does not require Bayesian networks, it does involve a
lot of Bayesian thinking about evidence. Readers who are primarily interested
in ECD may find that reading all of Part I and exploring simple Bayes net
examples helps deepen their understanding of ECD, then moving to Chaps. 12
and 13 if they want additional depth, and the Biomass chapters to see the
ideas in practice.
Several of our colleagues in the Uncertainty in Artificial Intelligence community (the home of much of the early work on Bayesian Networks) have
bemoaned the fact that most of the introductory treatises on Bayesian networks fall short in the area of helping the reader translate between a specific
application and the language of Bayesian networks. Part of the challenge here
is that it is difficult to do this in the absence of a specific application. This
book starts to fill that gap. One advantage of the educational application is
that it is fairly easy to understand (most people having been subjected to
educational assessment at least once in their lives). Although some of the
language in the book is specific to the field of education, much of the development in the book comes from the authors’ attempt to translate the language
of evidence from law and engineering to educational assessment. We hope that
readers from other fields will find ways to translate it to their own work as
well.
In an attempt to create a community around this book, we have created
a Wiki for evidence-centered assessment design ( />ecdwiki/ECD/ECD/). Specific material to support the book, including example
networks and data, are available at the same site ( />BN/BN). We would like to invite our readers to browse the material there and
to contribute (passwords can be obtained from the authors).


Notation

Random Variables
Random variables in formulae are often indicated by capital letters set in italic
type, e.g., X, while a value of the corresponding random variable is indicated
as a lowercase letter, e.g., x.

Vector-valued random variables and constants are set in boldface. For
example, X is a vector valued random variable and x is a potential value
for X.
Random variables in Bayesian networks with long descriptive names are
usually set in italic type when referenced in the text, e.g., RandomVariable.
If the long name consists of more than one word, capitalization is often used
to indicate word boundaries (so-called CamelCase).
When random variables appear in graphs they are often preceded by an
icon indicating whether they are defined in the proficiency model or the evidence model. Variables preceded by a circle, , are proficiency variables, while
variables preceded by a triangle, , are defined locally to an evidence model.
They are often but not always observable variables.
The states of such random variables are given in typewriter font, e.g., High
and Low.
Note that Bayesian statistics does not allow fixed but unknown quantities.
For this reason the distinction between variable and parameter in classical
statistics is not meaningful. In this book, the term “variable” is used to refer
to a quantity specific to a particular individual taking the assessment and the
term “parameter” is used to indicate quantities that are constant across all
individuals.

Sets
Sets of states and variables are indicated with curly braces, e.g., {High, Medium,
Low}. The symbol x ∈ A is used to indicate that x is an element of A. The


XIV

Notation

elements inside the curly braces are unordered, so {A1, A2} = {A2, A1}.

The use of parenthesis indicates that the elements are ordered, so that
(A1, A2) = (A2, A1).
The symbols ∪ and ∩ are used for the union and intersection of two sets.
If A and B are sets, then A ⊂ B is used to indicate that A is a proper subset
of B, while A ⊆ B also allows the possibility that A = B.
If A refers to an event, then A refers to the complement of the event; that
is, the event that A does not occur.
Ordered tuples indicating vector valued quantities are indicated with
parenthesis, e.g., (x1 , . . . , xk ).
Occasionally, the states of a variable have a meaningful order. The symbol
is used to state that one symbol is lower than the other. Thus High Low.
The quantifier ∀x is used to indicate “for all possible values of x.” The
quantifier ∃x is used to indicate that an element x exists that satisfies the
condition.
For intervals of real numbers a square bracket, ‘[’ (‘]’), is used to indicate
that the lower (upper) bound is included in the interval. Thus:
[0, 1] is equivalent to {x : 0 ≤ x ≤ 1}
[0, 1) is equivalent to {x : 0 ≤ x < 1}
(0, 1] is equivalent to {x : 0 < x ≤ 1}
(0, 1) is equivalent to {x : 0 < x < 1}

Probability Distributions and Related Functions
The notation P(X) is used to refer to the probability of an event X. It is also
used to refer to the probability distribution of a random variable X with the
hope that the distinction will be obvious from context.
To try to avoid confusions with the distributions of the parameters of distributions, the term law is used for a probability distribution over a parameter
and the term distribution is used for the distribution over a random variable,
although the term distribution is also used generically.
The notation P(X|Y ) is used to refer to the probability of an event X
given that another event Y has occurred. It is also used for the collection of

probability distributions for a random variable X given the possible instantiations of a random variable Y . Again we hope that this loose use of notation
will be clear from context.
If the domain of the random variable is discrete, then the notation p(X) is
used for the probability mass function. If the domain of the random variable is
continuous, then the notation f (X) is used to refer to the probability density.
The notation E[g(X)] is used for the expectation of the function g(X)
with respect to the distribution P(X). When it is necessary to emphasize


Notation

XV

the distribution, then the random variables are placed as a subscript. Thus,
EX [g(X)] is the expectation of g(X) with respect to the distribution P(X)
and EX|Y [g(X)] is the expectation with respect to the distribution P(X|Y ).
The notation Var(X) is used to refer to the variance of the random variable X. The Var(X) is a matrix giving the Var(Xk ) on the diagonal and the
covariance of Xi and Xj in the off-diagonal elements.
If A and B are two events or two random variables, then the notation
A ⊥⊥ B and I(A|∅|B) is used to indicate that A is independent of B. The
notations A ⊥⊥ B | C and I(A|C|B) indicate that A is independent of B
when conditioned on the value of C (or the event C).
The notation N(μ, σ 2 ) is used to refer to a normal distribution with mean
μ and variance σ 2 ; N+ (μ, σ 2 ) refers to the same distribution truncated at
zero (so the random variable is strictly positive). The notation Beta(a, b) is
used to refer to a beta distribution with parameters a and b. The notation
Dirichlet(a1 , . . . , aK ) is used to refer to K-dimensional Dirichlet distribution
with parameters a1 , . . . , aK . The notation Gamma(a, b) is used for a gamma
distribution with shape parameter a and scale parameter b.
The symbol ∼ is used to indicate that a random variable follows a particular distribution. Thus X ∼ N (0, 1) would indicate that X is a random

variable following a normal distribution with mean 0 and variance 1.

Transcendental Functions
Unless specifically stated otherwise, the expression log X refers to the natural
logarithm of X.
The notation exp X is used for the expression eX , the inverse of the log
function.
The notation logit x, also Ψ (x), is used for the cumulative logistic function:
logit x = Ψ (x) =

ex
.
1 + ex

The notation y!, y factorial, is used for y! = yk=1 k, where y is a positive
integer.
The notation Γ (x) is used for the gamma function:


Γ (z) =

tx−1 e−t dt .

0

Note that Γ (n) = (n − 1)! when n is a positive integer.
The notation B(a, b) is used for the beta function:
1

B(a, b) =

0

ta−1 (1 − t)b−1 dt =

Γ (a)Γ (b)
Γ (a + b)


XVI

Notation

The notation

n
y

is used to indicate the combinatorial function

n!
(n−y)!y! .
y!
y1 !·····yK ! ,

The extended combinatorial function y1 ···n yK is used to indicate
where
yk = n.
The notation Φ(x) is used for the cumulative unit normal distribution
function.


Usual Use of Letters for Indices
The letter i is usually used to index individuals.
The letter j is usually used to index tasks, with J being the total number
of tasks.
The letter k is usually used to index states of a variable, with K being the
total number of states. The notation k[X] is an indicator which is 1 when the
random variable X takes on the kth possible value, and zero otherwise.
If x = (x1 , . . . , xK ) is a vector, then x< k refers to the first k−1 elements of
x, (x1 , . . . , xk−1 ), and x > k refers to the last K − k elements (xk+1 , . . . , xK ).
They refer to the empty set when k = 1 or k = K. The notation x−k refers
to all elements except the jth; that is, (x1 , . . . , xk−1 , xk+1 , . . . , xK ).


Contents

Part I Building Blocks for Bayesian Networks
1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 An Example Bayes Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Cognitively Diagnostic Assessment . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Cognitive and Psychometric Science . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Ten Reasons for Considering Bayesian Networks . . . . . . . . . . . . . 14
1.5 What Is in This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2

An
2.1
2.2

2.3
2.4

Introduction to Evidence-Centered Design . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Assessment as Evidentiary Argument . . . . . . . . . . . . . . . . . . . . . .
The Process of Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Basic ECD Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 The Conceptual Assessment Framework . . . . . . . . . . . . . .
2.4.2 Four-Process Architecture for Assessment Delivery . . . .
2.4.3 Pretesting and Calibration . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19
20
21
23
26
27
34
38
39

3

Bayesian Probability and Statistics: a Review . . . . . . . . . . . . . .
3.1 Probability: Objective and Subjective . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Objective Notions of Probability . . . . . . . . . . . . . . . . . . . .
3.1.2 Subjective Notions of Probability . . . . . . . . . . . . . . . . . . . .
3.1.3 Subjective–Objective Probability . . . . . . . . . . . . . . . . . . . .

3.2 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Independence and Conditional Independence . . . . . . . . . . . . . . . .
3.3.1 Conditional Independence . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 Common Variable Dependence . . . . . . . . . . . . . . . . . . . . . .
3.3.3 Competing Explanations . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 The Probability Mass and Density Functions . . . . . . . . . .
3.4.2 Expectation and Variance . . . . . . . . . . . . . . . . . . . . . . . . . .

41
41
42
43
45
46
51
53
54
55
57
57
60


XVIII Contents

3.5 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.1 Re-expressing Bayes Theorem . . . . . . . . . . . . . . . . . . . . . . .
3.5.2 Bayesian Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.3 Conjugacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.5.4 Sources for Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.5 Noninformative Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.6 Evidence-Centered Design and the Bayesian Paradigm .

62
63
63
67
72
74
76

4

Basic Graph Theory and Graphical Models . . . . . . . . . . . . . . . . 81
4.1 Basic Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.1.1 Simple Undirected Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.1.2 Directed Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.1.3 Paths and Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.2 Factorization of the Joint Distribution . . . . . . . . . . . . . . . . . . . . . 86
4.2.1 Directed Graph Representation . . . . . . . . . . . . . . . . . . . . . 86
4.2.2 Factorization Hypergraphs . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2.3 Undirected Graphical Representation . . . . . . . . . . . . . . . . 90
4.3 Separation and Conditional Independence . . . . . . . . . . . . . . . . . . 91
4.3.1 Separation and D-Separation . . . . . . . . . . . . . . . . . . . . . . . 91
4.3.2 Reading Dependence and Independence from Graphs . . 93
4.3.3 Gibbs–Markov Equivalence Theorem . . . . . . . . . . . . . . . . . 94
4.4 Edge Directions and Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.5 Other Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.5.1 Influence Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.5.2 Structural Equation Models . . . . . . . . . . . . . . . . . . . . . . . . 99
4.5.3 Other Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5

Efficient Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.1 Belief Updating with Two Variables . . . . . . . . . . . . . . . . . . . . . . . 106
5.2 More Efficient Procedures for Chains and Trees . . . . . . . . . . . . . 111
5.2.1 Propagation in Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.2.2 Propagation in Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.2.3 Virtual Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.3 Belief Updating in Multiply Connected Graphs . . . . . . . . . . . . . . 122
5.3.1 Updating in the Presence of Loops . . . . . . . . . . . . . . . . . . 122
5.3.2 Constructing a Junction Tree . . . . . . . . . . . . . . . . . . . . . . . 123
5.3.3 Propagating Evidence Through a Junction Tree . . . . . . . 134
5.4 Application to Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.4.1 Proficiency and Evidence Model Bayes Net Fragments . 137
5.4.2 Junction Trees for Fragments . . . . . . . . . . . . . . . . . . . . . . . 139
5.4.3 Calculation with Fragments . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.5 The Structure of a Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.5.1 The Q-Matrix for Assessments Using Only Discrete
Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.5.2 The Q-Matrix for a Test Using Multi-observable Tasks . 147
5.6 Alternative Computing Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 149


Contents

XIX


5.6.1 Variants of the Propagation Algorithm . . . . . . . . . . . . . . 150
5.6.2 Dealing with Unfavorable Topologies . . . . . . . . . . . . . . . . . 150
6

Some Example Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.1 A Discrete IRT Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.1.1 General Features of the IRT Bayes Net . . . . . . . . . . . . . . . 161
6.1.2 Inferences in the IRT Bayes Net . . . . . . . . . . . . . . . . . . . . . 162
6.2 The “Context” Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.3 Compensatory, Conjunctive, and Disjunctive Models . . . . . . . . . 172
6.4 A Binary-Skills Measurement Model . . . . . . . . . . . . . . . . . . . . . . . 178
6.4.1 The Domain of Mixed Number Subtraction . . . . . . . . . . . 178
6.4.2 A Bayes Net Model for Mixed-Number Subtraction . . . . 180
6.4.3 Inferences from the Mixed-Number Subtraction Bayes
Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

7

Explanation and Test Construction . . . . . . . . . . . . . . . . . . . . . . . . 197
7.1 Simple Explanation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
7.1.1 Node Coloring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
7.1.2 Most Likely Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
7.2 Weight of Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7.2.1 Evidence Balance Sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
7.2.2 Evidence Flow Through the Graph . . . . . . . . . . . . . . . . . . 205
7.3 Activity Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
7.3.1 Value of Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
7.3.2 Expected Weight of Evidence . . . . . . . . . . . . . . . . . . . . . . . 213
7.3.3 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

7.4 Test Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
7.4.1 Computer Adaptive Testing . . . . . . . . . . . . . . . . . . . . . . . . 216
7.4.2 Critiquing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
7.4.3 Fixed-Form Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.5 Reliability and Assessment Information . . . . . . . . . . . . . . . . . . . . 224
7.5.1 Accuracy Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
7.5.2 Consistency Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
7.5.3 Expected Value Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
7.5.4 Weight of Evidence as Information . . . . . . . . . . . . . . . . . . 232

Part II Learning and Revising Models from Data
8

Parameters for Bayesian Network Models . . . . . . . . . . . . . . . . . 241
8.1 Parameterizing a Graphical Model . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.2 Hyper-Markov Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
8.3 The Conditional Multinomial—Hyper-Dirichlet Family . . . . . . . 246
8.3.1 Beta-Binomial Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247


XX

Contents

8.3.2 Dirichlet-Multinomial Family . . . . . . . . . . . . . . . . . . . . . . . 248
8.3.3 The Hyper-Dirichlet Law . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
8.4 Noisy-OR and Noisy-AND Models . . . . . . . . . . . . . . . . . . . . . . . . . 250
8.4.1 Separable Influence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
8.5 DiBello’s Effective Theta Distributions . . . . . . . . . . . . . . . . . . . . . 254
8.5.1 Mapping Parent Skills to θ Space . . . . . . . . . . . . . . . . . . . . 256

8.5.2 Combining Input Skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
8.5.3 Samejima’s Graded Response Model . . . . . . . . . . . . . . . . . 260
8.5.4 Normal Link Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
8.6 Eliciting Parameters and Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
8.6.1 Eliciting Conditional Multinomial and Noisy-AND . . . . . 269
8.6.2 Priors for DiBello’s Effective Theta Distributions . . . . . . 272
8.6.3 Linguistic Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
9

Learning in Models with Fixed Structure . . . . . . . . . . . . . . . . . . 279
9.1 Data, Models, and Plate Notation . . . . . . . . . . . . . . . . . . . . . . . . . 279
9.1.1 Plate Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
9.1.2 A Bayesian Framework for a Generic Measurement
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
9.1.3 Extension to Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
9.2 Techniques for Learning with Fixed Structure . . . . . . . . . . . . . . . 287
9.2.1 Bayesian Inference for the General Measurement Model 288
9.2.2 Complete Data Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
9.3 Latent Variables as Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . 297
9.4 The EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
9.5 Markov Chain Monte Carlo Estimation . . . . . . . . . . . . . . . . . . . . . 305
9.5.1 Gibbs Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
9.5.2 Properties of MCMC Estimation . . . . . . . . . . . . . . . . . . . . 309
9.5.3 The Metropolis–Hastings Algorithm . . . . . . . . . . . . . . . . . 312
9.6 MCMC Estimation in Bayes Nets in Assessment . . . . . . . . . . . . . 315
9.6.1 Initial Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
9.6.2 Online Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
9.7 Caution: MCMC and EM are Dangerous! . . . . . . . . . . . . . . . . . . . 324

10 Critiquing and Learning Model Structure . . . . . . . . . . . . . . . . . . 331

10.1 Fit Indices Based on Prediction Accuracy . . . . . . . . . . . . . . . . . . 332
10.2 Posterior Predictive Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
10.3 Graphical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
10.4 Differential Task Functioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
10.5 Model Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
10.5.1 The DIC Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
10.5.2 Prediction Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
10.6 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
10.6.1 Simple Search Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
10.6.2 Stochastic Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356


Contents

XXI

10.6.3 Multiple Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
10.6.4 Priors Over Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
10.7 Equivalent Models and Causality . . . . . . . . . . . . . . . . . . . . . . . . . . 358
10.7.1 Edge Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
10.7.2 Unobserved Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
10.7.3 Why Unsupervised Learning cannot Prove Causality . . . 360
10.8 The “True” Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
11 An Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
11.1 Representing the Cognitive Model . . . . . . . . . . . . . . . . . . . . . . . . . 372
11.1.1 Representing the Cognitive Model as a Bayesian
Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
11.1.2 Representing the Cognitive Model as a Bayesian
Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
11.1.3 Higher-Level Structure of the Proficiency Model; i.e.,

p(θ|λ) and p(λ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
11.1.4 High Level Structure of the Evidence Models; i.e., p(π) . 381
11.1.5 Putting the Pieces Together . . . . . . . . . . . . . . . . . . . . . . . . 382
11.2 Calibrating the Model with Field Data . . . . . . . . . . . . . . . . . . . . . 382
11.2.1 MCMC Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
11.2.2 Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
11.2.3 Online Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
11.3 Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
11.3.1 Observable Characteristic Plots . . . . . . . . . . . . . . . . . . . . . 398
11.3.2 Posterior Predictive Checks . . . . . . . . . . . . . . . . . . . . . . . . . 401
11.4 Closing Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405

Part III Evidence-Centered Assessment Design
12 The Conceptual Assessment Framework . . . . . . . . . . . . . . . . . . . 411
12.1 Phases of the Design Process and Evidentiary Arguments . . . . . 414
12.1.1 Domain Analysis and Domain Modeling . . . . . . . . . . . . . . 414
12.1.2 Arguments and Claims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
12.2 The Student Proficiency Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
12.2.1 Proficiency Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
12.2.2 Relationships Among Proficiency Variables . . . . . . . . . . . 428
12.2.3 Reporting Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
12.3 Task Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
12.4 Evidence Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
12.4.1 Rules of Evidence (for Evidence Identification) . . . . . . . . 444
12.4.2 Statistical Models of Evidence (for Evidence
Accumulation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
12.5 The Assembly Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
12.6 The Presentation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458



XXII

Contents

12.7 The Delivery Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
12.8 Putting It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
13 The Evidence Accumulation Process . . . . . . . . . . . . . . . . . . . . . . . 467
13.1 The Four-Process Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
13.1.1 A Simple Example of the Four-Process Framework . . . . . 471
13.2 Producing an Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
13.2.1 Tasks and Task Model Variables . . . . . . . . . . . . . . . . . . . . . 474
13.2.2 Evidence Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
13.2.3 Evidence Models, Links, and Calibration . . . . . . . . . . . . . 486
13.3 Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
13.3.1 Basic Scoring Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
13.3.2 Adaptive Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
13.3.3 Technical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
13.3.4 Score Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
14 Biomass: An Assessment of Science Standards . . . . . . . . . . . . . 507
14.1 Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
14.2 Designing Biomass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
14.2.1 Reconceiving Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
14.2.2 Defining Claims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
14.2.3 Defining Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
14.3 The Biomass Conceptual Assessment Framework . . . . . . . . . . . . 515
14.3.1 The Proficiency Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
14.3.2 The Assembly Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
14.3.3 Task Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
14.3.4 Evidence Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
14.4 The Assessment Delivery Processes . . . . . . . . . . . . . . . . . . . . . . . . 535

14.4.1 Biomass Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
14.4.2 The Presentation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
14.4.3 Evidence Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
14.4.4 Evidence Accumulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
14.4.5 Activity Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
14.4.6 The Task/Evidence Composite Library . . . . . . . . . . . . . . . 543
14.4.7 Controlling the Flow of Information Among the
Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
14.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
15 The Biomass Measurement Model . . . . . . . . . . . . . . . . . . . . . . . . . 549
15.1 Specifying Prior Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550
15.1.1 Specification of Proficiency Variable Priors . . . . . . . . . . . 552
15.1.2 Specification of Evidence Model Priors . . . . . . . . . . . . . . . 554
15.1.3 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560
15.2 Pilot Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
15.2.1 A Convenience Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561


Contents XXIII

15.2.2 Item and other Exploratory Analyses . . . . . . . . . . . . . . . . 564
15.3 Updating Based on Pilot Test Data . . . . . . . . . . . . . . . . . . . . . . . . 566
15.3.1 Posterior Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
15.3.2 Some Observations on Model Fit . . . . . . . . . . . . . . . . . . . . 575
15.3.3 A Quick Validity Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
15.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579
16 The Future of Bayesian Networks in Educational
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583
16.1 Applications of Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . 583
16.2 Extensions to the Basic Bayesian Network Model . . . . . . . . . . . . 586

16.2.1 Object-Oriented Bayes Nets . . . . . . . . . . . . . . . . . . . . . . . . 586
16.2.2 Dynamic Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . 588
16.2.3 Assessment-Design Support . . . . . . . . . . . . . . . . . . . . . . . . . 592
16.3 Connections with Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
16.3.1 Ubiquitous Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594
16.4 Evidence-Centered Assessment Design and Validity . . . . . . . . . . 596
16.5 What We Still Do Not Know . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
A

Bayesian Network Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
A.1 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
A.1.1 Bayesian Network Manipulation . . . . . . . . . . . . . . . . . . . . . 602
A.1.2 Manual Construction of Bayesian Networks . . . . . . . . . . . 603
A.1.3 Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . 603
A.2 Sample Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639


List of Figures

1.1

A graph for the Language Testing Example . . . . . . . . . . . . . . . . . .

5

2.1


The principle design objects of the conceptual assessment
framework (CAF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The proficiency model for a single variable, Proficiency Level . .
The measurement model for a dichotomously scored item . . . . . .
The four principle processes in the assessment cycle . . . . . . . . . . .

27
28
31
35

2.2
2.3
2.4
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
4.1
4.2
4.3
4.4
4.5

4.6
4.7

4.8
4.9

Canonical experiment: balls in an urn . . . . . . . . . . . . . . . . . . . . . . .
Graph for Feller’s accident proneness example . . . . . . . . . . . . . . . .
Unidimensional IRT as a graphical model . . . . . . . . . . . . . . . . . . . .
Variables θ1 and θ2 are conditionally dependent given X . . . . . .
Examples of discrete and continuous distributions. a Discrete
distribution. b Continuous distribution . . . . . . . . . . . . . . . . . . . . . .
Likelihood for θ generated by observing 7 successes in 10 trials .
A panel of sample beta distributions . . . . . . . . . . . . . . . . . . . . . . . .
A panel of sample gamma distributions . . . . . . . . . . . . . . . . . . . . .

42
55
56
56

A simple undirected graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A directed graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A tree contains no cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Examples of cyclic and acyclic directed graphs . . . . . . . . . . . . . . .
Filling-in edges for triangulation. Without the dotted edge,
this graph is not triangulated. Adding the dotted edge makes
the graph triangulated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Directed Graph for
P(A)P(B)P(C|A, B)P(D|C)P(E|C)P(F |E, D) . . . . . . . . . . . . . . .
Example of a hypergraph (a) and its 2-section (b) . . . . . . . . . . .
Hypergraph representing
P(A)P(B)P(C|A, B)P(D|C)P(E|C)P(F |E, D) . . . . . . . . . . . . . . .

2-section of Fig. 4.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83
84
85
85

59
65
68
73

86
87
89
89
90


XXVI

List of Figures

4.10 D-Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.11 Directed graph running in the “causal” direction: P(Skill)
P(Performance|Skill) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.12 Directed graph running in the “diagnostic” direction:
P(Performance)P(Skill|Performance) . . . . . . . . . . . . . . . . . . . . . . . . 96
4.13 A graph showing one level of breakdown in language skills . . . . . 96
4.14 Influence diagram for skill training decision . . . . . . . . . . . . . . . . . . 98

4.15 Graph for use in Exercise 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14
5.15
5.16
5.17
6.1
6.2

Graph for the distribution of X and Y . . . . . . . . . . . . . . . . . . . . . . 107
The acyclic digraph and junction tree for a four-variable chain . 112
A junction tree corresponding to a singly-connected graph . . . . . 119
A polytree and its corresponding junction tree . . . . . . . . . . . . . . . 119
A loop in a multiply connected graph . . . . . . . . . . . . . . . . . . . . . . . 124
The tree of cliques and junction tree for Figure 5.5 . . . . . . . . . . . 124
Acyclic digraph for two-skill example (Example 5.5) . . . . . . . . . . 125
Moralized undirected graph for two-skill example (Example 5.5) 129
Two ways to triangulate a graph with a loop . . . . . . . . . . . . . . . . . 130

Cliques for the two-skill example . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Junction tree for the two-skill example . . . . . . . . . . . . . . . . . . . . . . 132
Relationship among proficiency model and evidence model
Bayes net fragments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Total acyclic digraph for three-task test . . . . . . . . . . . . . . . . . . . . . 141
Proficiency model fragments for three-task test . . . . . . . . . . . . . . . 141
Evidence model fragments for three-task test . . . . . . . . . . . . . . . . 141
Moralized proficiency model graph for three-task test . . . . . . . . . 142
Moralized evidence model fragments for three-task test . . . . . . . . 142

Model graph for five item IRT model . . . . . . . . . . . . . . . . . . . . . . . 160
The initial probabilities for the IRT model in Netica. The
numbers at the bottom of the box for the Theta node
represent the expected value and standard deviation of Theta . . 162
6.3 a Student with Item 2 and 3 correct. b Student with Item 3
and 4 correct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.4 Probabilities conditioned on θ = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.5 Five item IRT model with local dependence . . . . . . . . . . . . . . . . . . 168
6.6 a Student with Item 2 and Item 3 correct with context effect.
b Student with Item 3 and Item 4 correct with context effect . . 169
6.7 Three different ways of modeling observable with two parents . . 173
6.8 Initial probabilities for three distribution types . . . . . . . . . . . . . . . 174
6.9 a Updated probabilities when Observation = Right.
b Updated probabilities when Observation = Wrong . . . . . . . . . . . 175
6.10 a Updated probabilities when P1 = H and
Observation = Right.
b Updated probabilities when P1 = H and Observation = Wrong 176



×