Tải bản đầy đủ (.doc) (30 trang)

Alignment of Standards, Large-scale Assessments, and Curriculum A Review of the Methodological and Empirical Literature

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (421.45 KB, 30 trang )

Alignment

Alignment of Standards, Large-scale Assessments, and Curriculum:
A Review of the Methodological and Empirical Literature
Meagan Karvonen
Western Carolina University
Shawnee Wakeman and Claudia Flowers
University of North Carolina at Charlotte

Support for this research is provided by the National Alternate Assessment Center
(www.naacpartners.org) a five-year project funded by the U.S. Department of
Education, Office of Special Education Programs (No. H324U040001). The NAAC
represents a collaborative effort between the University of Kentucky, University of
North Carolina at Charlotte (UNCC), National Center on Educational Outcomes
(NCEO), the Center for Applied Special Technology (CAST), and the University of
Illinois at Urbana-Champaign. The opinions expressed do not necessarily reflect the
position or policy of the Department of Education, and no official endorsement
should be inferred.

1


Alignment

2

Abstract
The purpose of this study was to provide a comprehensive review of the literature
on the alignment of academic content standards, large-scale assessments, and
curriculum. After reviewing the characteristics of 195 identified resources on
alignment published between 1984 and 2005, this review primarily focused on (1) a


comparison of features of alignment models and their methodologies, and (2) a
narrative and quantitative analysis of characteristics of 67 empirical alignment
studies. Based on this review, several recommendations for further research and
improvements in alignment technology were made.


Alignment

3

Alignment of Standards, Large-scale Assessments, and Curriculum:
A Review of the Methodological and Empirical Literature
The educational community sometimes assumes that instructional systems
are driven by content standards, which are translated into assessment, curriculum
materials, instruction, and professional development. Research has shown that
teachers may understand what content is wanted and believe they are teaching
that content, when in fact they are not (Cohen, 1990; Porter, 2002). Improvements
in student learning depends on how well assessment, curriculum, and instruction
are aligned and reinforce a common set of learning goals, and on whether
instruction shifts in response to the information gained from assessments (National
Research Council, 2001). Alignment is often difficult to achieve because educational
decisions are frequently made at different levels of the educational agency. For
example, states may have one set of experts who develop written standards, a
second set of experts who develop the assessment, and a third set of experts who
train teachers in standards-based instruction. Finally, it is teachers who translate
academic standards into instruction.
In 1994 the Improving America’s Schools Act and Title I of the Elementary
and Secondary Education Act required states to set high expectations for student
learning, to develop assessments that measure those expectations, and to create
systems that hold educators accountable for student achievement. The No Child

Left Behind Act (2002) reiterated this emphasis on quality assessment of student
achievement; final NCLB regulations require that states’ assessment systems
“address the depth and breadth of the State’s academic content standards; are
valid, reliable, and of high technical quality; and express results in terms of the
State’s academic achievement standards” (55 Fed. Reg. 45038, emphasis added).


Alignment

4

NCLB peer review guidance (U.S. Department of Education, 2004) indicates that
judgments about the compliance of states’ assessments systems with Title I
requirements will be made based on evidence submitted by states (e.g., alignment
studies) rather than assessments themselves. The Guidance further recommends
that states consider the following points about their assessments:

o Cover the full range of content specified in the State’s academic content
standards, meaning that all of the standards are represented legitimately in
the assessments; and

o Measure both the content (what students know) and the process (what
students can do) aspects of the academic content standards; and

o Reflect the same degree and pattern of emphasis apparent in the academic
content standards (e.g., if the academic content standards place a lot of
emphasis on operations then so should the assessments); and

o Reflect the full range of cognitive complexity and level of difficulty of the
concepts and processes described, and depth represented, in the State’s

academic content standards, meaning that the assessments are as
demanding as the standards; and

o Yield results that represent all achievement levels specified in the State’s
academic achievement standards. (U.S. Department of Education, 2004, p.
41)
These issues should be considered in the alignment of the state’s entire assessment
system, including assessments for students with disabilities and English language
learners. Low complexity methods, such as simply mapping assessment items back
to state content standards, are insufficient for peer review purposes (U.S.
Department of Education, 2004, p. 41).


Alignment

5

Alignment can be formally defined as the degree of agreement, overlap, or
intersection between standards, instruction, and assessments. In other words,
alignment is the match between the written, taught, and tested curriculum (Flowers,
Browder, Ahlgrim-Delzell, & Spooner, in press). Accurate inferences about student
achievement and growth over time can only be made when there is alignment
between the standards (expectations) and assessments. From this perspective,
alignment has both content and consequential validity implications (Bhola, Impara,
& Buckendahl, 2003; LaMarca, Redfield, Winter, Bailey, & Despriet, 2000).
The consequences of poorly aligned standards, assessments, and curriculum
are potentially significant for students and educational systems. Aligning curriculum
with assessments can result in improved test scores for students regardless of
background variables such as socioeconomic status, race, and gender. In contrast,
misalignment may reinforce differences among students based on their

sociocultural backgrounds, as those with more exposure to educational
opportunities in their everyday lives may still perform well when tests measure
content that is not taught in the classroom (English & Steffy, 2001). Strong evidence
of alignment between assessments and state standards supports the validity of
interpretations made about test scores.
For many years, states and test developers have relied on content experts
and other item reviewers to make judgments about whether test items reflect the
content of particular strands within state content standards. The AERA position
statement on high-stakes testing calls for alignment of assessments and curriculum
on the basis of both content and cognitive processes (AERA, 2000). Bhola et al.
(2003) emphasized the need to use more complex methods for examining
alignment that go beyond content and cognitive process at the item level. La Marca


Alignment

6

et al. (2000) reviewed and synthesized conceptualizations of alignment and
methods for analyzing the alignment between standards and assessment. They
identified five dimensions that should be considered, based largely on Webb’s
(1999) work:
1. Content match, or the correspondence of topics and ideas in the standards
and the assessment,
2. Depth match, or level of cognitive complexity required to demonstrate
knowledge and transfer it to different contexts,
3. Relative emphasis on certain types of knowledge tasks in the standards and
the assessment system,
4. Match between the assessment and standards in terms of performance
expectations, and

5. Accessibility of the assessment and standards, so both are challenging for all
students yet also fair to students at all achievement levels.
The emphasis in this study is on the methodologies used to empirically
investigate alignment, and on the existing empirical evidence that might indicate
what degree of alignment has been achieved in large-scale assessment systems. In
addition to the focus on alignment of standards and assessments emphasized by La
Marca et al. (2000) and Webb (1999), this study examines the alignment of
standards and assessments with the curriculum taught in schools. This review and
synthesis of literature is intended to yield information about gaps in methodological
approaches to examining alignment, as well as areas in which additional empirical
investigations are needed to establish sound criteria for judging the quality of
alignment.
Methods


Alignment

7

This section describes the literature search and identification procedures,
primary and secondary coding procedures, and data analysis strategies.
Literature Search and Identification Procedures
Cooper (1989) warned against overly narrow problem formations in the early
stages of a literature review, as limited conceptual breadth poses a threat to the
validity of the study. Thus, the scope of the literature search was initially very broad.
Literature written between 1984 and 2005 that had a primary focus of alignment
was the target of the search. The scope of the alignment included measures
between (1) assessment and curriculum / instruction, (2) assessment and content
standards, (3) content standards and curriculum / instruction, (4) instruction and
instructional materials, (5) measures of alignment between two types of standards,

and (6) a combination of assessment, content standards, and curriculum /
instruction. Assessments included both general and special education instruments
that were either objective or alternative (e.g., performance-based, portfolio).
Classroom and district-level assessments were excluded from this study, but
alignment in higher education settings was included. Studies on alignment based on
standards at any level (e.g., district, state) were included.
A total of 28 terms or combinations of terms were used to define the research
base of alignment resources (e.g., sequential development; alignment and
curriculum; accountability, alignment, and assessment). Electronic and print
resources were used to identify materials for possible inclusion. Electronic
databases searched included InfoTrac, Google, ERIC, PsychInfo, Academic Search
Elite, Books in Print, and Dissertation Abstracts. The websites of assessment
organizations (e.g., Harcourt, Measured Progress, Buros Institute for Assessment
Consultation and Outreach), technical assistance centers (e.g., National Center on


Alignment

8

Educational Outcomes), educational organizations (e.g., Council of Chief State
School Officers, National Center for Research on Evaluation, Standards, and Student
Testing [CRESST]), and state education agencies were also searched for
nonpublished alignment material. As some websites identified a very large number
of potential hits (e.g., Google identified 5,690,000 hits for alignment and
assessment), the first 150 of those documents were reviewed for potential inclusion.
The reference lists of identified books and several seminal and recent works (e.g.,
Bhola, Impara, & Buckendahl, 2003: Case, Jorgensen, & Zucker, 2004; La Marca et
al., 2000; Webb, 1997) were also searched. Contacts with authors were made when
identified materials could not be located. Finally, a follow up list of prominent

authors (e.g., Andrew Porter, Robert Rothman, John Smithson, Norman Webb) and
model names (e.g., Surveys of Enacted Curriculum, Achieve, Council for Basic
Education) were also searched in Google to ensure complete coverage of the
reference material.
Conceptual relevance of each source identified in the literature search was
determined by the study coordinator, who applied the inclusion criteria liberally
during the first round of literature identification. Resources that were of
questionable relevance were reviewed by a second author.
Coding Procedures
Initial coding was done on the entire set of identified documents in order to
broadly identify the nature of the alignment literature identified. A secondary coding
scheme was applied to the empirical resources.
Initial coding procedures. Identified material was entered into a database by
reference and was coded according to three categories: (a) elements being aligned
(as described above), (b) type of document, and (c) purpose or focus of document.


Alignment

9

The type of document was defined by five categories. Literature was coded as a
report if it was written as a non-published paper, technical report, dissertation, or
brief. Presentations included all papers or multimedia work presented to an
audience. Journal articles were published works found in a journal or newsletter
format. Books included any chapters in edited works or manuals disseminated by
states. Finally, other included all training materials, web pages dedicated to
alignment, and other relevant alignment work (e.g., state documents that discussed
alignment but did not include any empirical data or methodological descriptions).
The purpose or focus of the document was coded into six groups. Conceptual

included literature that either defined alignment, discussed the relationships among
standards, assessment, and curriculum, discussed reasons for alignment, or argued
the benefits of well-aligned systems or drawbacks of poorly aligned systems.
Resources that described a model or method to conduct alignment studies were
coded as methodological. Literature that focused on recommendations for policy
about alignment was coded as policy. Documents that included data collection
procedures and results from an original alignment study were coded as empirical.
Review/synthesis was coded for materials that described more than one primary
source for alignment. Finally, other was coded for miscellaneous foci that did not fit
other categories (e.g., state descriptions of alignment without methodological or
empirical components; instances where rubrics or test blueprints were used to
examine alignment).
Interrater reliability was obtained for each coded category. Two researchers
coded a sample of 80 documents (41%) to obtain inter-rater reliability. A point-bypoint method (the number of agreements for occurrences and non-occurrences
divided by the total of points multiplied by 100) was used to calculate the reliability.


Alignment

10

The average reliability for type of alignment was determined to be 88% (range of
50%-100% agreement). As there were only two documents that were identified as
addressing alignment between standards and curriculum/instruction, the reliability
percentage of 50% reflects one disagreement. The median agreement was 91%.
The average reliability for the type of document was 100%. The average reliability
for purpose or focus of document was 90% (range of 50%-100%). Policy was
identified as the focus of six documents by one researcher and by three for the
other researcher resulting in a 50% agreement rate. The median was 96%.
Consensus was found for any disagreements across all categories.

Secondary Coding Procedures. Using a coding form developed by the first
author, two researchers summarized information about the resources identified in
the first phase as empirical studies. Categorical data were recorded for type of
literature; content area(s) and grade levels; elements of the educational system
aligned; descriptions of the types of standards assessment, and instructional
indicators; alignment methodology used; and entity that conducted the alignment
study. The second author coded three resources with a second coder for training
purposes, and then both people coded three additional resources and compared
codes before the second coder coded the remaining empirical studies
independently. Reliability on the secondary coding was 93% based on a sample of
11 resources (16% of the empirical literature). One researcher entered data into
SPSS and cleaned the database prior to analysis.
Data Analysis Strategies
Descriptive statistics were calculated on all primary codes for the entire set of
literature, and on the secondary codes for the subset of empirical literature.
Frequencies were also calculated for key characteristics of alignment studies, by


Alignment

11

alignment methodology. Narrative descriptions of some articles were provided to
illustrate certain points about the literature.
Results
In this section, characteristics of the identified alignment literature are first
described. Then a subset of the literature (methodological and empirical) is
analyzed to address the following points:

1. A comparison of features of alignment models and their methodologies, and

2. A narrative and quantitative analysis of results from empirical alignment
studies
Characteristics of the Literature
A total of 195 resources were identified during the search, nearly half of
which were reports (47%). Other documents included journal articles (21%),
presentation slides or papers (14%), and books and other resources (18%). Roughly
one-third of the resources (33.4%) were empirical, while 13.8% were conceptual,
13.8% were methodological, 9.2% were descriptive or reviews, and 4.6% had a
policy emphasis. Twenty-four percent of resources had other foci, such as state
descriptions or webpages about alignment issues (n=8), state reports described
above (n=25), and documents (state and individual authors) for professional
development. (Each resource could have more than one emphasis.)
One hundred fifty-seven of the resources had identifiable publication dates.
Six of those resources (3.8%) were published between 1984 and 1990, while
another six (3.8%) were published between 1991 and 1995. Between 1996 and
2000, 25 additional resources (15.9%) were published. The number of resources
published per year began increasing dramatically after 2000; between 2001 and
2005, 120 alignment resources (76.4%) were published. Empirical studies were the


Alignment

12

most frequently identified category across all year spans, increasing from 4 in 19841990 to 43 in 2001-05.
A diverse range of documents were identified and included in the review.
Research monographs and conceptual articles frequently cited in the alignment
literature (e.g., Bhola, Impara, & Buckendahl, 2003; Webb, 1997) were one common
type. Alignment reports, such as those published by organizations conducting
alignment studies (cf. Achieve, 2001), were another common type. Powerpoint

presentations from conferences or meetings (e.g., Potter, 2002), books and book
chapters (English & Steffy, 2001), ERIC documents (e.g., Madfes & Muench, 1999),
and dissertations (Moahi, 2004) were also identified. There were several types of
documents published by states. One type was state websites that described the
definition of alignment (e.g., Florida Department of Education, n.d.) or could be used
as a resource to teachers or districts (e.g., Oregon Department of Education, n.d.).
States also published test blueprints (e.g., Oklahoma Department of Education, n.d.)
or reports for peer review (North Dakota Department of Public Instruction, 2003).
Alignment Models and Methods
Bhola et al. (2003) reviewed existing alignment methodologies and
characterized them according to their level of complexity. Expert review of content
represented in state standards and on assessments to identify item-level matches
would be described as a low complexity method. At the other end of the spectrum is
Webb’s (1999) approach, which includes several indicators of alignment at both the
item and test level. The three moderate and high complexity methods used in the
empirical literature are briefly described here.
Achieve. In the Achieve model, the four dimensions for examining the degree
of alignment between and assessment and standards are (a) content centrality, (b)


Alignment

13

performance centrality, (c) challenge, and (d) balance and range (Resnick,
Rothman, Slattery, & Vranek, 2003). Content centrality examines the quality of the
match between the content of each test question and the content of the related
standards. After a senior reviewer has matched test items to the test blueprint,
reviewers examine each item (in the blueprint) to determine whether it assesses the
academic content well, partially, or not at all. These judgments go deeper than the

one-to-one correspondence used in the blueprint. Performance centrality focuses on
the degree of the match between the type of performance (cognitive demand)
presented by each test item and the type of performance (e.g., select, identify,
compare, analyze, represent, use) described by the related standard. Reviewers
analyze each test item to determine whether the type of performance the item
requires match the demand expected by the standard, and whether it does so well,
partially, or not at all. The criterion called challenge is applied to a set of items to
determine whether doing well on the set requires students to master challenging
subject matter. Reviewers consider two factors in evaluating sets of test items
against change criterion: source of challenge and level of challenge. Source of
challenge attempts to uncover whether the individual test items in a set are difficult
because of the knowledge and skills they target, or for other reasons not related to
the subject matter, such as relying unfairly on students’ background knowledge.
Level of challenge compares the emphasis of performance required by a set of
items to emphasis of performance described by the related standards. Reviewers
also judge whether the set of items has a span of difficulty appropriate for students
at a given grade. Finally, tests must cover the full range of standards with an
appropriate balance of emphasis across the standards. Evaluating balance and
range provides both qualitative and quantitative descriptive information about the


Alignment

14

choices that test developers have made. Balance investigates whether there are
enough items to measure a content strand. If so, do the items in a set focus only on
a subset. Range is a measure of coverage or breadth (the numerical proportion of
all content addressed).
Surveys of Enacted Curriculum (SEC) Model. The SEC alignment approach

analyzes standards, assessments, and instruction using a common content matrix,
which consists of two dimensions for categorizing subject content, which include
content topics and cognitive demands (Porter, 2002). Using this approach, content
matrixes for standards, assessments, and instruction are created and the
relationships between these matrices are examined. In addition to alignment
statistics that can be calculated from the two-dimensional matrix, content maps and
graphs can be produced to visually illustrate differences and similarities between
standards, assessments, and instruction. In practice there are usually five or more
content areas and six or more categories for cognitive demand upon which
alignment is analyzed. To analyze assessments and standards, a panel of content
experts conducts a content analysis and codes the assessment and/or standards by
topic and cognitive demand. Results from the panel are then placed in a topic by
cognitive demand matrix, with values in the cells representing the proportion of the
overall content description. While expert judgment is used to collect information
from academic standards and assessments for the two dimension matrices, teacher
surveys are typically used to collect data for the content of instruction. Content of
instruction is described at the intersection between topics and cognitive demand.
Teachers are surveyed on the amount of time devoted to each topic, and the
relative emphasis given to student expectations. These survey data are then


Alignment

15

transformed into proportion of total instructional time spent for each cell in the twodimensional matrix.
Webb. Webb’s (1997, 1999) alignment model includes several indicators of
alignment at the item and test level. Categorical concurrence is the consistency of
categories of content in the standards and assessments. The criterion of categorical
concurrence between standards and assessment is met if the same or consistent

categories of content appear in both the assessment and the standards. For
example, if a content standard (or stand) is measurement in mathematics does the
assessment have items that target measurement? It is possible for an assessment
item to align to more than one content standard. For example, if an assessment
item requires students to calculate surface area, which is aligned to the content
standard of measurement, to successfully answer the question the student needs to
be able to multiply numbers, which is aligned to the content standard of operations.
In this case the item is aligned to both content standards. The Range-of-knowledge
correspondence criterion examines the alignment of assessment items to the
multiple objectives within the content standards. Range-of-knowledge
correspondence is used to judge whether a comparable span of knowledge
expected of students by a standard is the same as, or corresponds to, the span of
knowledge that students need in order to correctly answer assessment items. The
range-of-knowledge numeric value is the percentage of content standards with at
least 50% of the objectives having one or more hits. For example, if there are five
objectives (e.g., length, area, volume, telling time, and mass) included in the
content standard of measurement, a minimum expectation is at least one
assessment item is related to at least three of the objectives. The balance of
representation criterion is used to indicate the extent to which items are evenly


Alignment

16

distributed across the content standards and the objectives under the content
standards. In a measurement content standard with five objectives, we would
expect items would be evenly distributed across the five objectives. In practice
educational agencies may place greater emphasis on specific objectives and
content standards. In this case the assumption of an even distribution would be

replaced with the expected proportion, or emphasis, as specified by the educational
agency. Depth-of-knowledge (DOK) examines the consistency between the cognitive
demands of the standards and cognitive demands of assessments. Completely
aligned standards and assessments requires an assessment system designed to
measure in some way the full range of cognitive complexity within each specified
content standard. Webb identified four levels for assessing the DOK of content
standards and assessment items. DOK levels are Recall (Level 1), Skill or Concept
(Level 2), Strategic Thinking (Level 3) and Extended Thinking (Level 4). Of course to
accurately evaluate the DOK level, each level needs to be behaviorally defined and
examples given of types of student behaviors. To examine the DOK, all item on the
assessment and all academic content standards are rated for DOK. We expect
assessments to have items that are below the expected DOK, but there should be
items at or above the expected DOK.
Table 1 provides an overview of features of these alignment models, as well
as two others described in the literature. While most of the methods allow for both
item and test level alignment analysis on the basis of both content and cognitive
demand, there are some distinctions. Achieve, Webb, and CBE models also include
methods for investigating alignment of difficulty. Webb’s model includes statistics
that indicate “good” alignment, as do some of Achieve’s indices. The SEC model is
the only one with established techniques that allow for alignment analysis among


Alignment

17

standards, assessments, and instruction. SEC has an established method for
administering surveys and generating alignment reports online, and a new webbased tool has recently been developed based on Webb’s method (Wisconsin Center
of Educational Research, n.d.).


Evidence of Alignment from Empirical Literature
Of the 67 empirical studies, twelve (18%) were peer-reviewed while the
remaining 49 were not peer reviewed. The vast majority (n = 45, 67.2%) were
based on state standards, while seven (10.4%) were based on national standards
and eight (11.9%) were studies of alignment with other kinds of standards (e.g.,
higher education or international programs).
The types of materials and emphases of the empirical resources are
described in Table 2. Slightly more than half of the empirical studies (52%) were in
reports available from organizations that conduct alignment studies or from state
agencies. Roughly one-fifth (19%) of the empirical studies were located in journal
articles, and another 16% were in presentations. Nine percent of the empirical
studies consisted of dissertations, while the remaining 3% were books or book
chapters. The majority of the studies focused on alignment of standards and
assessments (72%), followed by the alignment of assessments and
curriculum/instruction (13%). Only 12% of the studies focused on alignment of all
three elements of the educational system (standards, assessments, and
curriculum/instruction). Many of the alignment studies focused on multiple
academic subjects; 75% were based on math, while 63% examined English
language arts. Fewer alignment studies examined science (19%) or social studies


Alignment

18

(9%), and a few focused on other issues such as functional curriculum for students
with disabilities.
Features of the empirical studies were also examined for each major
alignment method (see Table 3). Webb’s method was most frequently represented
(n = 21, 31%). All of those studies examined alignment between standards and

assessments. Most of them covered assessments across K-12. Math and English
language arts were the most frequently studied content areas using Webb’s
method, although several also examined social studies and science. Studies using
the Achieve model occurred second most frequently (n = 12, 18%), with emphases
on math and English language arts. The Achieve studies also consisted primarily of
alignment between standards and instruction, although nearly half also examined
alignment between state standards and Achieve-derived, model content standards.
The Achieve studies did not tend to cover the entire K-12 grade span, but rather
grade bands or selected grade levels. The SEC method, used in 5 studies (7%)
focused on math and science content areas. One of these studies examined the
relationship between assessments and curriculum/instruction, while the other four
looked at three-way alignment (standards, assessment, and instruction). Two of the
SEC studies spanned grades K-12, while the others included one or more grade
bands. It is interesting to note that 28 empirical studies (42%) were based on
methodologies other than the top three recommended by CCSSO. Upcoming stages
of analysis will examine these studies in more detail for evidence of other emerging,
viable methodologies.
Discussion
As states submit information about their assessment systems for peer review,
they will be asked to describe alignment of assessments with state standards on the


Alignment

19

basis of content and cognitive demand, using moderate or complex methods.
Existing methods advocated by CCSSO provide options for examining alignment in
order to determine whether systems are adequately aligned (e.g., using Webb’s
benchmarks). While NCLB does not require alignment studies to consider classroom

instruction, ignoring this component of the educational system may mean studies
yield strong evidence of alignment that does not extend to the teachers who are
responsible for implementing instruction based on curriculum. Thus, systems that
appear to be well aligned may still yield poor performance if students are not taught
what is tested.
If alignment is to focus on the relationship between state standards and
assessments, it may be important to consider assessment systems in a way that
acknowledges their dynamic nature. For example, Ryan (2002) has written about
the stages of an assessment’s maturity and the need to investigate validity across
the stages (conceptualization, design, implementation, and operational forms).
Case, Jorgensen, & Zucker (2004) allude to the sequential development of
assessments and the importance of considering early stages of an assessment’s
development. Evidence collected at the test development stage is also appropriate
evidence for peer review purposes (U. S. Department of Education, 2004). However,
with the exception of Achieve’s studies based on test blueprints, existing empirical
studies focus primarily on fully operational forms of assessments. Figure 1 illustrates
one way of matching existing alignment methods within stages of test
development. During the conceptualization phase, information about the
assessment’s purpose and intended inferences should be recorded. During the
design phase, methods of examining alignment between standards and
assessments may be first used. Once the assessment reaches initial implementation


Alignment

20

and fully operational form stages, the third element of the system (instruction) may
be included in alignment studies.
Recommendations for Research

Several existing conceptual resources have already identified needs for
improvement in alignment technologies, to address such issues as match between
grain size of assessment items and state standards, the relationship between
alignment and achievement standards, and rater training (Bhola et al., 2003).
Additional gaps located via this review of literature include:

1. alignment methods applied to alternate assessments based on alternate and
modified achievement standards;

2. ways of aggregating alignment data, within assessment systems and across
studies based on identical methods;

3. validation of criteria for determining whether alignment is sufficient; and
4. evidence that the degree of alignment within an educational system has an
impact on student learning evidenced in achievement scores.
This review of literature was intended to illustrate areas of depth in the field’s
knowledge about alignment, and areas in which more research and improved
methods are needed. However, the field continues to change rapidly. Shortly before
this paper was finished, a compendium of six more reports on alignment was
released (CCSSO, 2006). The emphasis on well-aligned systems will continue to
drive the field to improve its methods and develop evidence for the importance of
alignment in large-scale assessment programs.


Alignment

21

References
Achieve, Inc. (2001). Measuring up: A standards and assessment benchmarking

report for Massachusetts. Retrieved September 8, 2005, from
/>American Educational Research Association (2000). AERA position statement
concerning high-stakes testing in preK-12 education. Retrieved July 29, 2005
from />Bhola, D. S., Impara, J. C., & Buckendahl, C. W. (2003). Aligning tests with states’
content standards: Methods and issues. Educational Measurement: Issues
and Practice, 22(3), 21-29.
Case, B. J., Jorgenson, M. A., & Zucker, S. (2004). Alignment in educational
assessment. Retrieved June 5, 2005, from
/>s.pdf
Cohen, D. K. (1990). A revolution in one classroom: The case of Mrs. Oublier.
Educational Evaluation and Policy Analysis, 12, 311-330.
Cooper, T. M. (1989). Integrating research: A guide for literature reviews (2nd ed.).
Newbury Park, CA: Sage.
Council of Chief State School Officers (2006). Aligning assessment to guide the
learning of all students: Six reports on the development, refinement, and
dissemination of the web alignment tool. Washington, DC: Author. Retrieved
March 17, 2006 from />PublicationID=293
English, F. W., & Steffy, B. E. (2001). Deep curriculum alignment: Creating a level


Alignment

22

playing field for all children on high-stakes tests of educational accountability.
Lanham, MD: Scarecrow Press.
Florida Department of Education. (n.d.). Aligning curriculum, instruction, and
assessment. Retrieved September 7, 2005, from
www.osi.fsu.edu/waveseries/htmlversions/wave9.htm
Flowers, C., Browder, D., Ahlgrim-Delzell, L, & Spooner, F. (in press). Promoting the

alignment of curriculum, assessment, and instruction. In D. M. Browder & F.
Spooner (Eds.), Teaching reading, math, and science to students with
significant cognitive disabilities. Paul H. Brookes Publishing.
La Marca, P. M., Redfield, D., Winter, P. C., Bailey, A., & Despriet, L. H. (2000). State
standards and state assessment systems: A guide to alignment. Washington
DC: Council of Chief State School Officers.
Madfes, T., & Muench, A. (1999). Learning from assessment: A middle school
mathematics professional development resource. (ERIC Document
Reproduction Service No. ED434827).
Moahi, S. (2004). The validity of the Botswana Junior Certificate mathematics
examination over time. Unpublished doctoral dissertation, University of
Arizona, Tempe.
National Research Council (2001). Knowing what students know: The science and
design of educational assessment. Washington, DC: National Academy Press.
No Child Left Behind Act of 2001, Pub. L. No. 107-110, 115 Stat.1425 (2002).
North Dakota Department of Public Instruction. (2003). Submitting evidence of final
assessment system under Title I Elementary and Secondary Education Act:
North Dakota State Assessment Plan. Retrieved August 3, 2005, from
/>

Alignment

23

Oklahoma Department of Education. (n.d.). Oklahoma school testing program test
blueprint grade 6 reading school year 2004-2005. Retrieved August 5, 2005,
from />Oregon Department of Education. (n.d.). Curriculum alignment scoring guide.
Retrieved September 8, 2005, from
www.ode.state.or.us/schoolimprovement/schoolreview/schoolresources/curali
gnscoringguide.pdf

Porter, A. C. (2002). Measuring the content of instruction: Uses in research and
practice. Educational Researcher, 31(7), 3-14.
Potter, D. (2002). Assessment alignment issues: The case of South Carolina.
Retrieved September 13, 2005, from
/>tter.pdf
Resnick, L. B., Rothman, R., Slattery, J. B., & Vranek, J. L. (2003). Benchmarking and
alignment of standards and testing. Educational Assessment, 9, 1-27.
Ryan, K. (2002). Assessment validation in the context of high-stakes assessment.
Educational Measurement: Issues and Practice, 21(1), 7-15.
U. S. Department of Education. (2004). Standards and assessments peer review
guidance: Information and examples for meeting the requirements of NCLB.
Washington, DC: Author. Retrieved May 2, 2005 from
/>Webb, N. L. (1997). Criteria for alignment of expectations and assessments in
mathematics and science education (Research Monograph No. 6). Madison,
WI: University of Wisconsin-Madison. Retrieved June 17, 2005, from
/>

Alignment

24

Webb, N. L. (1999). Alignment of science and mathematics standards and
assessments in four states. (NISE Research Monograph No. 18). Madison, WI:
University of Wisconsin-Madison, National Institute for Science Education.
Wisconsin Center of Educational Research (n.d.). Web alignment tool. Madison, WI:
Author. Retrieved March 17, 2006 from
/>

Alignment


25

Table 1
Comparison of Features of Alignment Methods
Model

Complex
ity (Low,
Med,
High)

All 3
elemen
ts
include
d?

Conte
nt
(item)

Cognitiv
e
demand
(item)

Content
(balance
across
items)


Difficulty/
challenge

Achie
ve

High

No









CBE

Moderat
e

No










SEC

Moderat
e

Yes







Webb

High

No







Other

Conte

Low


(non-

Other

Accuracy
of
blueprint
Source of
challenge
Item
response
typeNational
Standards
Use
Use
benchmar
k
standards



Criteria
for
judging
“good”

alignme
nt?

Feasibility

Evidenc
e/use
(#
studies)

Some

-Reviewers
are
trained/oneday per
assessment

12

No

- Reviewers
work in pairs

0

No

Yes
No


- 4 expert
reviewers,
computerized
data
collection &
automated
reports
- New manual
and online
system
available
- Well known
- Easier to
conduct

5

21
1


×