Chapter 12: Testing and Assessment
Chapter 13: Research and Evaluation
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
1
Testing and Assessment
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
2
Testing: a subset of assessment
Assessment includes:
Informal Assessment
Personality Testing
Ability Testing
The Clinical Interview
See Figure 12.1, p. 396
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
3
You will be administering and interpreting assessment
instruments
You may consult with others on their proper use
You may use them in program evaluation and research
You will read about them in the professional literature
School counselors: Sometimes the only expert on assessment
in the schools
Other counselors: Will likely be using them in your setting and
consulting with others who use them
Why testing? Why not testing? Testing is an additional
method of gaining information about your client
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
4
2200 BCE: Chinese developed essay type test for civil
service employees
Darwin, set the stage for modern science and the
examination of differences
Wundt, Fechner: 1st experimental labs to examine
differences in people
Binet: Hired by Ministry of Public Education in France
to develop intelligence test
Binet test, later became “Stanford Binet”—revised
by Terman
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
5
Spread of testing at beginning of 20th century:
Psychoanalysis spurred on development of objective and
projective personality tests
Industrial Revolution and need for vocational assessment
WWI: Ability and personality tests used to determine
placements of recruits
1940s and 1950s: advances in statistics led to better test
construction
1980s and on: Personal computers make tests easier to
develop, analyze, use, administer, and interpret
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
6
Ability Testing (Testing in the Cognitive Domain) (see Figure
12.2, p. 399)
Two types
▪ Achievement Testing (What one has learned)
▪ Aptitude Testing (What one is capable of learning)
Achievement Testing
▪ Survey Battery Tests
▪ Diagnostic Tests (see Box 12.1, p. 400: PL 94-142)
▪ Readiness Tests
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
7
Ability Testing (Testing in the Cognitive Domain) (see Figure
12.2, p. 399) (Cont’d)
Aptitude Tests (What one is capable of learning)
▪ Intellectual and Cognitive Functioning Testing
▪ Intelligence Tests
▪ Neuropsychological Assessment
▪ Cognitive Ability Tests
▪ Special Aptitude Tests
▪ Multiple Aptitude Tests
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
8
Personality Assessment (Testing in the Affective Domain; see
Figure 12.3, p. 399)
Objective Tests
Projective Tests
Interest Inventories
Informal Assessment (see Figure 12.4, p. 399)
Observation
Rating Scales (see Box 12.2, p. 404)
Classification Systems (see Box 12.3)
Environmental Assessment
Records and Personal Documents
Performance-Based Assessment
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
9
The Clinical Interview
Sets a tone for the types of information that will be covered
during the assessment process
Allows client to become desensitized to information that can
be very intimate and personal
Allows examiner to assess nonverbals of client while he or
she is talking about sensitive information
Allows examiner to learn problem areas firsthand
Gives client and examiner opportunity to study other’s
personality style to assure they can work together
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
10
Norm-referenced Tests
Your results are compared to your peer group
Criterion-referenced Tests:
Preset learning goals are established
Examinee has increased time to meet educational goals
Often used for individuals with learning disabilities
Norm-Referenced and Criterion Tests Can Be Standardized or NonStandardized
Standardized: Given exactly the same way each time
Non-Standardized: Vary in how administered. Generally not as
rigidly researched as standardized tests (e.g., teacher made tests)
See Table 12.1, p. 407
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
11
Relativity and Meaningfulness of Scores
Raw scores don’t hold much meaning unless you do
something to them
By comparing raw scores to those of an individual’s peer
group, you are able to:
▪ See how the individual did in comparison to similar
people
▪ Allow test takers who took the same test, but are in
different norm groups to compare their results
▪ Allow an individual to compare his or her results on two
different tests
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
12
Some statistics help us make meaning of test scores
Measures of Central Tendency
▪ Mean
▪ Median
▪ Mode
Measures of Variability
▪ Range
▪ Interquartile Range
▪ Standard Deviation
▪ See Figure 12.5, page 409
▪ See Figures 12.6 and 12.7; page 410 and 411
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
13
TYPES OF DERIVED SCORES
TYPES OF DERIVED SCORES
Percentile Rank
Normal Curve Equivalents (NCEs)
T-Scores
Stanines
Deviation IQ
Sten Scores
SAT/GRE Type Scores
Grade Equivalent Scores
ACT Scores
Idiosyncratic Publisher-Derived
Scores
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
14
A basic statistic not directly related to interpretation of test
but crucial in test construction
Ranges from -1.0 to +1.0
The closer to -1.0 and +1.0 the strong the relationship
between variables
Positive correlation: tendency for two sets of scores to be
related in same direction
Negative correlation: tendency for two sets of scores to be
related in opposite direction
0 = no relationship between variables
See Figure 12.8, p. 413
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
15
Four Types
Validity: Is the test measuring what it’s supposed to
measure?
Reliability: Is the test accurate (consistent) in its
measurement?
Practicality: Is this a practical test to use?
Cross-Cultural Fairness: Has the test been shown to be fair
across different cultures?
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
16
Three types
1. Content
2. Criterion-Related
▪ Concurrent
▪ Predictive
1. Construct
▪
▪
▪
▪
Experimental
Convergent
Discriminant
Factor Analysis
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
Face validity
Not a “real” type of
validity. Does the test, on
the surface, seem to
measure what it’s
supposed to measure
Some tests may be valid,
but may not seem to be
measuring what it’s
supposed to measure
17
Is bias removed—as best as possible?
Does it predict well for all cultural groups?
Griggs v. Duke Power Company: Tests must show that
they can predict for job performance
A number of ethical and legal issues have been addressed
(see later under “Ethical, Professional, and Legal Issues”)
See Table 12.2, p.417: Summary of Types of Validity and
Reliability
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
18
Four Types:
1. Test-Retest
2. Alternate (Parallel; Equivalent) Forms
3. Split-Half (Odd-Even)
4. Internal Consistency
▪ Cronbach’s Coefficient Alpha
▪ Kuder-Richardson
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
19
Is this a realistic test to give?
Based on:
Cost
Time to administer
Ease of administration
Format of test
Readability of test
Ease of interpretation
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
20
Over 4000 assessment procedures
How do you find them:
Publisher resource catalogs
Journals
Source Books and On-Line Source “Book” Information
▪ Buros Mental Measurement Yearbook
▪ Tests in Print
Books on Testing and Assessment
Experts
The Internet
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
21
Info usually included:
1. Demographic
information
2. Reason for referral
3. Family background
4. Other relevant
information (e.g., legal,
medical, vocational)
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
Behavioral observations
6. Mental status
7. Test results
8. Diagnosis
9. Recommendations
10.Summary
5.
22
Usually a few pages long
Problems with:
Overuse of jargon
Focusing on assessment procedures & downplaying
person
Focusing on person and downplaying assessment results
Poor organization
Poor writing skills
Failure to take a position
Demographics
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
23
Caution in Using Assessment Procedures
Cultural bias continues to exist in testing
Standards and ethical codes have been developed to help us:
▪ Understand the cultural bias inherent in tests
▪ Know when a test should not be used due to bias
▪ Know what to do with test results when a test does not
predict well for minorities
Standards for effective use of assessment instruments
Association for Assessment in Counseling’s Standards for
Multicultural Assessment
Code of Fair Testing in Education
ACA Ethics Code
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
24
Take A Stand—Do Something!
Our duty and moral responsibility to do something when
▪ Tests have been administered improperly
▪ Tests are culturally biased and the bias is not addressed
▪ Cheating has taken place
▪ Tests were used with limited validity or reliability
© 2007 Thomson Brooks/Cole, a division of Thomson Learning
25