Vietnam national university, ha noi
College of foreign languages
. .
nguyễn thị hoàng lân
An evaluation on the validity of english tests
used for English 10 at some Higher Secondary
Schools in the middle and north of viet nam ,
from ha tinh to ha nam
đánh giá tính hiệu lực của các bài kiểm tra tiếng anh dành cho
học sinh lớp 10 ở một số tr-ờng thpt miền trung và miền bắc việt
nam; từ hà tĩnh đến hà nam
======== ========
Ma thesis
Field: methodology
Code:
Supervisor: Dr. Hà cẩm tâm
Ha Noi, 2009
Vietnam national university, ha noi
College of foreign languages
. .
nguyễn thị hoàng lân
an evaluation on the validity of english tests
used for English 10 at some Higher Secondary
Schools in the middle and north of viet nam ,
from ha tinh to ha nam
đánh giá tính hiệu lực của các bài kiểm tra tiếng anh dành cho
học sinh lớp 10 ở một số tr-ờng thpt miền trung và miền bắc việt
nam; từ hà tĩnh đến hà nam
======== ========
Ma thesis
Field: methodology
Code:
Ha Noi, 2009
v
table of contents
Introduction
1. Rationale of the study 1
2. Scope of the study. 5
3. Aims of the study 5
4. Methods of the study 5
5. Research questions 6
6. Organization of the study 6
Development
Chapter 1: literature review
1.1. Basic concepts of testing 8
1.2. Achievement Tests 10
1.2.1. Definition 10
1.2.1.1. Kinds of achievement tests 11
1.2.1.2. Final achievement tests 11
1.2.1.3. Progress achievement tests 12
1.3. Characteristics of a good language test 13
1.3.1. Reliability 13
1.3.2. Discrimination 14
1.3.3. Practicability 15
1.3.4. Validity 15
1.3.4.1. Content validity 16
1.3.4.2. Construct validity 17
1.3.4.3. Criterion-related validity 18
vi
1.3.4.4. Face validity 19
1.3.4.5. Backwash validity 20
1.3.4.6. Souces of invalidity 21
1.4. Test items for phonetics, structures and vocabulary 22
1.4.1. Test items 22
1.4.2. Language components 23
1.4.3. The test items types used to evaluate phonetics, structures 23
and vocabulary
1.5. Syllabus Objectives on language components 24
Chapter 2: The study
2.1. Research Questions 26
2.2. Data Description 26
2 3.Analytical framework for data analysis 29
2.3.1. Content validity 29
2.3.2. Construct validity 30
2.3.3. Face validity 30
2.3. Data Analysis and Discussion 31
2.3.1. Content validity of the tests used 31
2.3.1.1. Content validity of 45 minute written tests 31
2.3.1.2. Content validity of 15 minute written tests 36
2.3.1.3. Content validity of final written tests 41
2.3.2. Construct validity of the tests used 46
2.3.3. Face validity of the tests used 53
Conclusion
3.1. Conclusion 55
3.2. Implications 56
2. Limitations and Suggestions for further studies 59
vii
References 61
Appendix: survey questionaire 64
iv
List of tables
Table 1: The test item types 24
Table 2: Syllabus Objectives 25
Table 3: Format of 45 minute tests and final tests 28
Table 4: Specification for fourty five minute tests 32
Table 5. The contents tested in the fourty five minute tests 34
Table 6: The contents tested in the fifteen minute tests 37
Table 7: Test Specification for final written tests 41
Table 8: The contents of language components in the final tests 43
Table 9: Specification for language components 46
Table 10: Teachers’ opinions on the investigated tests 54
Table 11: Testing techniques 57
1
Introduction
1. Rationale of the study
English has played an integral role in increasing the development of science,
technology, culture and international relations. This fact has resulted in the growing
demand for English language learning and teaching in many parts of the world. In addition,
the world-wide globalization process has confined English the most widely used means of
international communication. The need to master English to access to information and
interactions with each other is increasingly growing in many parts of the world. English
teaching is undoubtly the ultimate capacity-building tool.
Fully recognizing the importance of this global language, Vietnamese Ministry of
Education has encouraged and required pupils of Secondary Schools to learn it as a
compulsory subject during at least seven years. English is also a compulsory subject in the
Higher Secondary graduation examination.
To evaluate and assess the English learning and teaching process, testing is
apparently employed as an important and powerful tool. This is because ever since
language began to be taught in formal settings, the development of tests to assess the
learner's performance has been an integral part of language learning and teaching process.
Language testing, then, is central to language teaching. Therefore, It is also widely
accepted that testing plays a significant role in the process of learning and teaching foreign
languages. The main purpose of language testing is to provide opportunities for learning,
both for the students who are being tested and for the professionals who are administering
the tests. Through the tests students can learn from the work they do with the teacher, and
by themselves in preparation for the tests, the opportunities arising during the tests for
developing what they know and what they can do, especially the feedback which they
receive after the tests, both from their own reflection and from professionals who have
monitored their performance on the tests. The teachers can pinpoint strengths and
weaknesses in the learned abilities of the students and gain the information about the
progress the students are making or what the students are likely to be able to do with the
2
language in a target context or what the students know and what they do not know (both
explicitly and implicitly) about the target language. In general, a language test can be a
“sample language behaviour and infer general ability in the language learnt”. (Brown
D.H, 1994: 252). From the results of the tests and depending on different kinds of tests
with different purposes as well, the teacher can infer a certain level of language
competence of his students in such different areas as grammar, vocabulary, pronunciation,
or speaking, listening , writing and reading. Lanwerys & Seanlon (1969) contends in their
book “testing is an important tool in educational research and for programme evaluation,
and may even throw light on both the nature of language proficiency and language
learning”.
“Language testing is a form of measurement. It is so closely related to teaching that we
cannot work in testing without being constantly concerned with teaching” (Heaton,
1988:5). Therefore, it is undeniable that the most effective and fastest way to check
students' understanding is testing. Besides, thanks to testing, teachers can evaluate the
effectiveness of the used syllabus or its contents, objectives, methods and to identify,
locate the difficult areas that their pupils are being confronted with in learning process
through tests.
For the past ten years or so, there have been a number of changes in the practice of
English teaching in Viet Nam tertiary education. Some regard methodology, from
Grammar translation method to Communicative approach. Some involve in course books.
Some are concerned with technology, from traditional tape recorders to modern LCD
projectors. Some are related to testing. For example, at Higher Secondary Schools in recent
years there is a shift in testing from Subjective tests to Objective tests, which has great
effects on teaching and learning process. Therefore, in testing pupils’ progress, teachers
tend to design more objective tests and many mid-term or final tests are multiple choice
questions. This is considered as a good preparation for students to perform well in the
entrance university tests which exists in multiple choice questions. However, the problem
is that the English 10 is one of the three new course books of Ministry of Education which
focus on improving the four skills reading, writing, speaking and listening and help
students to consolidate their grammar in the Language Focus part. Thus, multiple choice
questions seem to fail to test pupil’s progress accurately. The question arising is that
whether the tests used at High Schools test what students are supposed to acquire
3
according to the objectives of the textbook. This is also one of the major reasons why I
carried the research on validity.
In addition, Test researchers and developers have admitted that validity are critical
for tests and referred to as integral measurement qualities. Because this quality provides
major justification for using test score numbers as a basic for making inferences or
decisions (Bachman and Palmer, 1996:19). From educative perspectives that both teachers
and students should have their voice heard about instructional content, mode of syllabus
delivery, and assessment. As analyzed above, validity is an indispensable quality of all
good tests. Opinions from test takers and test raters, therefore, are essential and important
to the process of test construction. More importantly, it is impossible for test writers to try
in vain to increase the validity of a reliable test due to the features of test items that
constructs it. From the outset of test construction, test validity should be of most essential
focus of all. Heaton (1988:60) argued that "face validity can provide not only a quick and
reasonable guide but also a balance to too great of concern with statistical analysis." He
stated that the students' motivation is maintained if a test has good face validity and most
students will try harder if the test looks sound. Thus, the face validity plays a certain role in
any test and it is also of great concern in this thesis. Moreover, the emphasis on test
validity is also confirmed in Hughes (1989) that, "the greater a test's content validity is, the
more likely it is to be an accurate measure of what it is to measure." To put it in another
way, if major areas in the test specification are not identified or not represented, the test is
said to be inaccurate. Furthermore, such an inaccurate test is likely to have harmful
backwash effect because those are not presented or tested will probably be ignored in
teaching and learning. Bachman ( 1990: 289) also insists that :'' The most important quality
to consider in the development, interpretation and use of language tests is validity, which
has been described as a unitary concept related to the adequacy and appropriateness of the
way we interpret and use test scores." In general , the reasons discussed here are regarded
as a strong impetus that initiates this thesis into investigating the validity of the
achievement tests at Higher Secondary Schools from Ha Tinh to Ha Nam.
Some studies and researches have been done in some particular schools to design
an English achievement test for the 10th form pupils as a Case Study such as the study by
Ta Thi Minh Hien (2005). However, there has not been any study on investigation into
and evaluation on tests used for 10th form pupils High Schools in The Middle and in the
4
North of Viet Nam. While it is undeniable that good evaluation of tests can help us
measure skills and knowledge of pupils more accurately. For example, test analysis can
help us remove weak items even before we record the results of the tests.
Another reason for the selection of this research topic lies in the fact that language
testing at Higher Secondary Schools has not been paid enough attention to. As a teacher, I
have been involved in designing, administering and marking any kinds of English tests.
Yet I have also witnessed neither comprehensive nor systematic evaluation nor research on
the effectiveness and appropriateness of these tests. No formal discussions or seminar on
test construction or test methods have been carried out. There is a lack of a language test
item bank, a professionals testing committee, who judges the quality of the tests and takes
the responsibility for the given tests.
For the above-mentioned reasons, as a learner, a teacher, and a beginning
researcher of English, the author has been encouraged to conduct the study entitled: “An
evaluation on the validity of English tests used for English 10 at Higher Secondary Schools
in the middle and north of Viet Nam , from Ha Tinh to Ha Nam” with a view to evaluate
the validity of the tests used for pupils at Higher Secondary School. It is hoped that the
study will benefit the author as well as teachers at Higher Secondary Schools and those
who are concerned with language testing in general and English testing techniques at
Higher Secondary School in particular.
2. Scope of the study.
In this study the author intends to focus mainly on the content validity, construct
validity, face validity of progress achievement tests including 15 minute tests, mid-term
tests, and final achievement tests consisting of final- term tests and final tests in the school
years of 2007-2008 and 2008-2009at the 12 high schools in 6 provinces from Ha Tinh to
Ha Nam. The results can be seen as the basis for providing some suggestions for test
designers as well as raters.
5
3. Aims of the study
Parallel with the above reasons leading to the research are some following aims:
- To assess the validity of tests used for English 10 at Higher Secondary School
from the Middle to the North of Viet Nam focusing on content validity,
construct validity, face validity,
- To suggest some implications on designing a written English test to better the
teaching and learning English at Higher Secondary Schools in Viet Nam.
4. Methods of the study
In order to achieve the above aims, a study has been carried out with the following
approaches.
Basing on the theory and principle of language testing, major characteristics of a
good test, especially achievement tests, random samples of progress tests including
progress tests, 15 – minute tests, 45 - minute tests, mid-term tests, final term tests and final
achievement tests comprising of term tests and final tests in the school years 2007-2008
and 2008-2009 in a number of Higher Secondary Schools, from the Middle to the North of
Viet Nam were analyzed. Content validity is evaluated basing on the comparison between
the test specification relying on the syllabus objectives of the English 10 and the content
tested in the collected tests. Construct validity is assessed relying on the test specification
constructed basing on theoretical background of testing and the syllabus of the English 10.
The survey questionnaire was administered to the teachers of the Upper Secondary Schools
to investigate their evaluative comments on the face validity of the tests they designed.
Beside the use of critical reading, analysis and questionnaires for data collection,
the study made use of other supporting methods such as interviews, informal discussions,
opinion exchanges with teachers and students to gather necessary information about the
learning, teaching and testing situations at High Schools.
6
The methods used in the study are quantitative and qualitative.
5. Research questions
This study is implemented to find the answers to the following research question:
- Do the achievement tests for Higher Secondary School pupils of grade meet the
following criteria: content validity, construct validity, face validity?
6. organization of the thesis
This thesis is comprised three parts:
Part one introduces the rationale of the study, the scope, the aims, the methods,
research questions.
Part two is the development of the thesis which is divided into three chapters
Chapter one reviews the literature related to language testing (basic concepts, roles,
types of testing, criteria of a good test and test items for reading, writing, grammar and
vocabulary.).
Chapter two presents the methodology including the curricula of English 10, Data,
Participants and Analytical framework for data analysis (Construct validity, Content
validity, Face validity), Results and discussions (Construct validity of the tests used,
Content validity of the tests used, Face validity of the tests used)
Part three demonstrates the conclusion comprising of main finding, implications
and suggestions for further studies.
7
Development
chapter 1. Literature Review
This chapter reviews the theories and literature relevant to the topic under
investigation in the present study. The chapter starts with basics concepts of testing and
then the definition and types of achievement tests are reviewed. A brief review of major
characteristics of a good language test is presented with a major focus on test validity,
especially construct, content and face validity. Next, test items for phonetics, structures and
vocabulary is discussed. Finally, Curricula of English 10 is provided with the objectives
and the content of the English 10.
1.1. Basic concepts of testing
8
Testing is an essential part of every teaching and learning experience and becomes
one of the main aspects of methodology. Many researchers have demonstrated definitions
of testing with different point of view.
Allen (1974: 313) emphasizes testing as an instrument to ensure that students have
a sense of competition rather than to know how good their performance is and in which
condition a test can take place. He contends that “test is a measuring device which we use
when we want to compare an individual with other individuals who belong to the same
group."
Carrol (1968: 46) holds that a psychological or educational test is a procedure
designed to elicit certain behavior from which one can make inferences about certain
characteristics of an individual. In other words, a test is a measurement instrument
designed to elicit a particular behavior of each individual.
Besides, Ibe (1981: 1) points out that "a sample of behavior under the control of
specified conditions aims towards providing a basis for performing judgment". The term a
sample of behavior used here is rather broad and it means something else rather than the
traditional types of paper and pencils. Read (1983) shares the same idea in the sense that a
sample of behavior suggests language testing certainly includes listening and speaking
skills as well as reading and writing ones.
However, Heaton (1988:5) looks at testing in a different way. In his opinion, tests
are considered as a means of assessing the students' performance and to motivate the
students. He looks at tests with a positive eyes as many students are eager to take tests at
the end of semester to know how much knowledge they have. One important thing is that
he points out the relationship between testing and teaching.
Harrison (1986:1) notices that a natural extension of classroom work, providing
teachers and students with useful information that can serve each as a basis for
improvement and a test is necessary but unpleasant imposition from outside the classroom.
That means a test is a useful tool to measure learners' ability in a certain situation
especially in classroom.
9
According to Bachman (1990:20), what distinguishes a test from other types of
measurement is that it is designed to obtain specific sample of behavior. This distinction is
believed to be of great importance because it reflects the primary justification for the use
of language tests and has implications for how we design, develop and use them to their
best use. Thus, language tests can provide the means for more focus on the specific assure
of interest.
Brown (1994:252) states that "A test, in plain or ordinary words, is a method of
measuring a person's ability or knowledge in a given area". Moore (1992:138) proposes
that evaluation is an essential tool for teachers because it gives them feedback concerning
what the students have learned and indicates what should be done next in the learning
process. Evaluation helps us to better understand students, their abilities, interests,
attitudes, and needs so as to teach more effectively and motivate them. However, in the
book of Brown (1994:373) he stresses that are seen by learners as dark clouds hanging
over their heads, upsetting them with thunderous anxiety as they anticipate the lightning
bolts of questions they do not know and worst of all a flood of disappointed if they do not
make the grade.
From the above descriptions, though different researchers holds different point of
view on testing, in short, testing is an effective means of measuring and assessing students'
language knowledge and skills. It is of great use to both language teaching and learning.
1.2. Achievement tests
Just as there are many purposes for which language tests are developed, so there are
many types of language tests. Some types of tests serve a variety of purposes while others
are more restricted in their applicability. The tests collected were designed basing on the
text book English 10 and were intended to assess pupils' progress, therefore in this part
definition as well as kinds of achievement tests are presented
1.2.1. Definition
10
Achievement tests are defined differently depending on researchers' points of view.
Hughes (1990:10) held that.“, achievement tests are directly related to language
course, their purpose being to establish how successful individual students, groups of
students , or the courses themselves have been in achieving objectives.”.Achievement tests
are usually carried out after a course on a group of learners who take the course. Brown
(1994:259) also suggests that “An achievement test is related directly to classroom lessons,
units or even total curriculum.”. achievement tests in his point of view “are limited to a
particular material covered in a curriculum within a particular time frame”. Another
comment on achievement test offered by Finocchiaro and Sako (1983:`5) is that
achievement tests or attainment test are widely employed in many language teaching
institutions. They are used to measure the degree of control of discrete language and
cultural items and of integrated language skills acquired by the students within a specific
period of instruction in a specific course. Harrison (1983:7) demonstrates that “an
achievement test looks back over a longer period of learning than the diagnostic test, for
example, a year’s work, or the whole course, or even a variety of different courses.” He
also states that achievement tests are intended to show the standard, which the pupils have
reached in relation to other pupils at the same level. In short, Achievement tests are
directly related to language courses. The purpose of this kind of test is to know how
successful students, courses or the teaching itself have been in achieving the objectives
stated beforehand (in the program of the course, for example).
In short, achievement tests play a crucial role in the school programs, especially in
evaluating students' acquired language knowledge and skills during the course, and they
are widely used at different school level.
1.2.2. Kinds of achievement tests
Achievement tests can be subdivided into the final achievement tests and progress
achievement tests classified according to the time administration and the designed
objectivities.
1.2.2.1. Final achievement tests
11
Final achievement tests are administered at the end of a course and its purpose is to
measure the achievement of the course as a whole. These tests may be written and
administered by ministries of education, official examining boards, or by members of
teaching institutions. Obviously, the content of these tests must be related to the courses
with which they are concerned, but the nature of this relationship is a matter of
disagreement amongst language testers.
According to some testing experts, the content of a final achievement test should be
based directly on a detailed course syllabus or on the books and other materials used. This
is known as the syllabus-content approach. The test should has an obvious appearance for
it only contains what it is thought that the pupils have actually encouraged and therefore
can be considered, in this respect at least, a fair test. However, this test holds a
disadvantage that if the syllabus is badly designed, or the books and other materials are
badly chosen, then the results of the test can be very misleading. Successful performance
on the test may not truly indicate successful achievement of course objectives.
The alternative approach is to design the test content basing directly on the
objectives of the course, which has a variety of advantages. First, it forces course designers
to elicit about course objectives. This in turn puts pressure on those who are responsible for
the syllabus and the selection of books and materials to ensure that these are consistent
with the course adjectives. Tests based on course objectives work against the perpetuation
of poor teaching practice, a kin of course-content-based test, almost as if conspiracy fails to
do. I strongly believe that test content based on course objectives is much preferable,
which provides more accurate information about individual and group achievement, and is
likely to promote a more beneficial backwash effect on teaching.
1.2.2.2. Progress achievement tests
Progress achievement tests are intended to measure the progress students are
making in order to plan future work (including remedial work). They are usually
administered at the end of a specific unit or lesson. Obviously, these tests should be related
to the course objectives. These should make a clear progression towards the final
12
achievement test based on course objectives. Then if the syllabus and teaching methods are
appropriate to these objectives, progress tests based on short-term objectives will fit well
with what have been taught. If not, there will be pressure to create a better fit. If it is the
syllabus that is at fault, it is the tester’s responsibility to make clear that it is there, that
change is needed, not in the tests.
Moreover, more formal achievement test require careful preparation; teachers could
feel free to set their own ways to make a rough check on pupil’s progress to keep pupils on
their toes. Since such tests will not form part of formal assessment procedures, their
construction and scoring need not be purely towards the intermediate objectives on which a
more formal progress achievement tests are based. However, they can reflect a particular
“route” that an individual teacher is taking towards the achievement of objectives.
1.3 Characteristics of a good test
In order to make a good test, teachers have to take the various factors into
consideration such as the purpose of a test, the content of the syllabus, the pupils'
background and so on. In addition to these factors, test characteristics play a very
important role in constructing a good test. According to a number of leading scholars in
testing as Valette (1977), Harrison (1983), Weir (1990), Carroll and Hall (1985), Henning
(1987), and Brown (1994) all good tests have four main characteristics as follows:
- Validity
- Reliability
- Practicality
- Discrimination
Moreover, we will have further details as follow
1.3.1. Reliability
Reliability is a necessary characteristic of any good test. It is of primary importance
in the use of proficiency tests for both public achievement and classroom tests. An
appropriateness of the various factors affecting reliability is important for the teacher at the
13
very outset, since many teachers tent to regard tests as infallible measuring instruments and
fail to realize that even the best test is indeed a somewhat imprecise instrument with which
to measure language skills.
A fundamental criterion against any language test, which has to be judged is its
reliability. The concern here is with how far we can depend on the results that a test
produces. Three aspects of reliability are usually taken into account. The first concern the
consistency of scoring among different makers. The second is the concern of the tester
how to enhance the agreement between makers by establishing, and maintaining adherence
to, explicit guidelines for the conduct of this making. The third aspect of reliability is that
of parallel-forms reliability, the requirements of which have to be born in mind when
future alternative forms of a test have to be devised.
The concept of reliability is particularly important when considering language tests
within the communicative paradigm. Moreover, Davies (1968) stresses that reliability is
the first essential for any test, but for certain kinds of language tests, they may be very
difficult to achieve the appropriate results.
1.3.2. Discrimination
Another important feature of a test is its capacity to discriminate among the
different candidates and to reflect the differences in the performances of the individuals in
the group. It is true for both teacher-made tests and standardized test. The extend of the
need to discriminate will vary depending on the purpose of the test. In many classroom
tests, for example, the teacher will be much more concerned with finding out how well the
pupils have mastered the syllabus and will hope for a cluster of marks around the 80
percent and percent brackets. Nevertheless, there may be occurrences in which the teacher
may require a test to discriminate to some degree in order to assess relative abilities and
locate areas of difficulty. Here are the items should be spread over a wide difficulty level
in the test
- extremely easy items
- very easy items
- easy items
14
- fairly easy items
- items below average difficult level
- items of average difficult level
- items above average difficult level
- fairly difficult items
- difficult items
- very difficult items
- extremely difficult items
1.3.3. Practicability
A test must be practical, in other words, it must be fairly straight forward to the
administers. The most obvious practical considerations concerning the tests overlook.
Firstly, the length of time available for the administration of the test if frequently
misjudged even by experienced test writers, especially if the complete test consists of a
number of sub-tests. Another practical consideration concerns the answer sheets and the
stationary used. The use of answer sheets, however, greatly facilitates marking and is
strongly recommended when large numbers of pupils are being tested. The question of
practicability, is not confined solely to oral tests, such written tests as situational
composition and controlled writing tests depend not only on the availability of qualified
markers who can make valid judgment concerning the use of language, etc. but also on the
length of time available for the scoring of the test. A final point concerns the presentation
of the test paper itself, where possible, it should be printed or typewritten and appear neat,
tidy and authentically pleasing.
1.3.4.Validity
According to Huges, A. (1989:22), " A test is said to be valid if it measures
accurately what it is intended to measure". The test must aim to provide a true measure of
the particular skill which it is supposed to measure. When closely examined, however, the
concept of validity reveals a number of aspects, each of which deserves our attention.
15
1.3.4. 1. Content validity
" A test is said to have content validity if its content constitutes a representative
sample of the language skills, structures, etc. with which it is meant to be concerned."
(Huges, A.,1989:22). This kind of validity depends on careful analysis of the language
being tested and of the particular course objectives. It is obvious that a grammar test, for
instance, must be made up of items testing knowledge or control of grammar. But this in
itself does not ensure content validity. The test would have content validity only if it
included a proper sample of the relevant structures. Just what are the relevant structures
will depend, of course, upon the purpose of the test. Therefore, in order to judge whether or
not a test has a content validity, we need a specification of the skills or structures etc. that it
is meant to cover. Such a specification should be made at a very early stage in test
construction. It isn't to be expected that everything in the specification will always appear
in the test, there may simply be too many things for all of them to appear in a single test.
But it will provide the test constructor with the basis for making a principled selection of
elements for inclusion in the test. A comparison of test specification and test content is the
basis for judgments as to content validity.
What is important about content validity? First, the greater a test's content's
validity, the more likely it is to be an accurate measure of what it is supposed to measure.
A test in which major areas identified in the specification are under-represented - or not
represented at all- is unlikely to be accurate. Secondly, such a test is likely to have a
harmful backwash effect. Areas which are not tested are likely to become areas ignored in
teaching and learning.
Anastasi (1982:131) defined content validity as " essentially the systematic
examination of the test content to determine whether it covers a representative sample of
the behavior domain to be measured" She shows a set of useful guideline for establishing
content validity:
-The behavior domain to be tested must be systematically analyzed to make certain
that all major aspects are covered by the test items, and in the correct proportions.
16
- The domain under consideration should be fully described in advance, rather than
being defined after the test has been prepared.
- The content validity depends on the relevance of the individual’s test relevance of
item content.
The more a test stimulates the dimensions of observable performance and accords
with what is known about that performance, the more likely it is to have content and
construct validity. According to Kelly (1978:8), content validity seems "an almost and
completely overlapping concept" with construct validity, and for Moller (1982:68), " the
distinction between construct and content validity language proficiency."
1.3.4 2. Construct validity
Construct validity is defined by Anastasi (1982:144) as " the extent to which the
test many be said to measure a theoretical construct of trait. Each construct is developed to
explain and organize observed response consistencies. It derives from establish inter-
relationships among behavioral measures focusing on a broader, more enduring and more
abstract kind of behavioral description construct validation requires the gradual
accumulation of information from a variety of source. Any data throwing light on the
nature of the trait under consideration and the condition affecting its development and
manifestations are grist for this validity mill."
Construct validity is viewed from a purely statistical perspective in much of the
recent American literature Bachman and Palmer (1981a). It is seen principle as a matter of
the posterior statistical validation of whether a test has measured a construct that has a
reality independence of other constructs.
According to Hughes, A, 1989: 26, a test, part of a test, or a testing technique is
said to have construct validity if it can be demonstrated that it measures just the ability
which is supposed to measure. The word " construct" refers to any underlying ability (or
trait) which is hypothesised in a theory of language ability. For example, it can be argued
that a speed reading test based on a short comprehension passage is an inadequate measure
of reading ability (and thus has low construct validity) unless it is believed that the speed
17
reading of short passages relates closely to the ability to read a book quickly and efficiently
and is a proven factor in reading ability.
1.3.4.3. Criterion-related validity
Another approach to test validity is to see how far results on the test agree with
those provided by some independent and highly dependable assessment of the candidate's
ability. This independent assessment is therefore the criterion measure against which the
test is validated. Criterion-related validity consists of two types, concurrent validity and
predictive validity.
Concurrent validity is the degree to which a test correlates with other tests testing
the same thing. In other words, if a test is valid it should give a similar result to other
measures that are valid for the same purpose. When considering concurrent validity, there
are several concerns.
First, the measure that is being used for comparison of the test in question must be
valid. If the measure is not valid, there is no point in testing another test's validity against
it. For instance, teacher's ranking might be used to test validity but the teacher's ranking
may be affected by a number of factors that are not related to the students' actual
proficiency. One possible solution is to average the rankings of several teachers to make up
for this.
Second, the measure must be valid for the same purpose as the test whose validity
is being considered. A reading test can not be used to test the concurrent validity of a
grammar test. In addition, if teachers' ranking are being used, it is essential to make sure
that they understand on what basis they are expected to rank the students. If the test being
considered is a grammar test, then the teachers should be asked to rank the students
according to their grammar proficiency, not their overall English language ability.
It is said that predictive validity is different from concurrent validity in that "
instead of collecting the external measures at the same time as the administration of
experimental test, the external measure will only be gathered some time after the test has
18
been given". (Alderso et al, 1995). To put it in a simple way, predictive validity is the
extent to which the test in question can be used to make predictions about the future
performance. For example, does a test of English ability accurately predict how well
students will get along in a university in an English- speaking country? There are
numerous problems with attempting to answer such questions. Measures used to know how
well a student does at a university are sometimes employed to measure predictive validity,
but the problem is that there are many factors other than English proficiency involved in
academic success. Furthermore, it is not possible to know whether the students who scored
low on the tests and therefore did not get to go to university would have done if they had
been allowed to go. However, it is undeniable that prediction is an important and justifiable
use of language tests, and evidence that indicates a relationship between test performance
and the behaviour that is to be predicted provides support for the validity of this use of test
results. However, there is a wide range of situations in which we are not interested in
prediction at all, but in determining the levels of abilities of language learners.
In short, information about criterion relatedness- concurrent or predictive - is by
itself insufficient evidence for validation. ( Bachman 1990: 253). That is one of the reasons
why in this thesis, the author do not evaluate the criterion-related validity in tests.
1.3.4.4. Face validity
Anastasi (1982:136) points out that face validity is the technical sense; it refers, not
to what the test actually measures, but to what it appears superficially to measure. Face
validity pertains to whether the test "looks valid" to the examinees who take it, the
administrative personnel who decide on its use and other technically untrained observers.
Fundamentally, the questions of face validity concerns report and public relations. Lado
(1961), Davies (1968), Ingram (1977), Palmer (1981) have all discounted the value of face
validity. If a test does not have face validity though, it may not be acceptable to the
students taking it, or the teachers using it. If the students do not accept it as valid, their
adverse reaction to it may mean that they do not perform in a way that truly reflects their
ability. Anastasi (1982:136) takes a similar position " Certainly if test content appears
irrelevant, inappropriate, silly or childish, the result will be poor co-operation, regardless of
19
the actual validity of the test. Especially in adult testing, it is not sufficient for a test to be
objectively valid. It also needs face validity to function effectively in practical situations.
In short, a test is said to have face validity if it looks as if it measures what is
supposed to measures. For example, a test which intended to measure pronunciation ability
but which did not require the candidate to speak (and there have been some) might be
thought to lack face validity. Face validity is hardly a scientific concept, yet it is very
important. A test which does not have face validity may not be accepted by candidates,
teachers, education authorities or employers. It may simply not be used; and if it is used,
the candidates' reaction to it may mean that they do not perform on it in a way that truly
reflects their ability. Face validity can be judged by teachers or pupils.
1.3.4 5. Backwash validity
Language teachers operating in a communicative frame work normally attempt to
equip students with skills that are judged relevant to present of future needs, and to the
extent that tests are designed to reflect these, the closer the relationship between the test
and the teaching that precede it, the more the test is likely to enhance construct validity. A
suitable criterion for judging communicative tests in the future might well be the degree to
which they satisfy pupils, teachers and future users of test results, as judged by some
systematic attempt to gather data on the perceived validity of the test. If the first stage, with
its emphasis on construct, content, face, backwash validity, the bypassed procedures do not
suit the purpose for which it was intended.
On balance, special attention must be paid to the validity of a test when one
constructs it. Although there are many kinds of validity, from Harrison's conclusion, only
face validity and content validity are most vital for the teacher setting his own tests. This
view of validity provides a specific and useful framework for language test evaluation and
is also adapted in this thesis.
1.3.4.6. Souces of invalidity