Tải bản đầy đủ (.pdf) (59 trang)

Đánh giá sự phù hợp về nội dung của bài kiểm tra tiếng Anh cuối kỳ dành cho sinh viên không chuyên năm thứ hai của Trường Đại học Dân lập Phương Đông

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (976.37 KB, 59 trang )

VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF FOREIGN LANGUAGES AND INTERNATIONAL STUDIES
FALCUTY OF POST-GRADUATE STUDIES
……………… ***………………….









TRẦN THÚY QUỲNH


THE CONTENT VALIDITY OF THE CURRENT
ENGLISH ACHIEVEMENT TEST FOR SECOND YEAR
NON MAJOR STUDENTS AT PHUONG DONG
UNIVERSITY

(Đánh giá sự phù hợp về nội dung của bài kiểm tra tiếng Anh
cuối kỳ dành cho sinh viên không chuyên năm thứ hai
Trường Đại học dân lập Phương Đông)
M.A MINOR THESIS


Field: ENGLISH TEACHING METHODOLOGY
Code: 60-14-10
Course: 18 (2009-2011)
Supervisor: M.A Kim Van Tat





HA NOI- SEPTEMBER 2011
iv




TABLE OF CONTENTS
Page
Acknowledgement……………………………………………………… i
Abstract………………………………………………………………… ii
List of tables and figures…………………………………………………iii
The table of contents………………………………………………… iv

Chapter 1: Introduction
1.1. Rationale………………………………………………………… 1
1.2. Scope of study…………………………………………………….2
1.3. Aims of study…………………………………………………… 2
1.4. Methods of study………………………………………………….2
1.5. Research questions……………………………………………… 3
1.6. Design of study……………………………………………………3
Chapter 2: Literature review
2.1. Language testing
2.1.1. Definition of language testing…………………………………… 4
2.1.2. The roles of language testing…………………………………… 5
2.1.3. Relationship between testing and teaching- learning…………… 6
2.2. Major Characteristics of a good test
2.2.1. Test validity……………………………………………………….8

2.2.1.1. What is test validity? 8
2.2.1.2. Types of test validity………………………………………… 9
2.2.1.2. Face validity…………………………………………….9
2.2.1.2. Content validity……………………………………… 10
2.2.1.2.1. What is content validity? 10
2.2.1.2. 2. How to make the test more valid? 11
2.2.2. Test reliability…………………………………………………… 13
v

2.2.3. Relationship between reliability and validity…………………… 16
2.2.4. Practicality…………………………………………………………17
2.2.5. Discrimination………………………………………………17
Chapter 3: The study
3.1. English learning, teaching and testing at Phuong Dong University
3.1.1. The students……………………………………………………… 19
3.1.2. The teachers……………………….……………………………….19
3.1.3. The course book “New Headway Elementary- The third edition” 19
3.1.4. Syllabus and its objectives……………………………………… 20
3.1.5. The final achievement test…………………………………………20
3.2. Research method……………………………………………………20
3.2.1. The survey questionnaires…………………………………………21
3.2.2. Document analysis……………………………………………… 21
3.3. Data analysis……………………………………………………… 22
3.3.1. Analysis of the final achievement test…………………………… 22
3.3.2. Analysis of the survey questionnaire for students…………………26
3.3.3. Analysis of the survey questionnaire for teachers…………………30
3.4. Results……………………………………………………………….32
Chapter 4: Recommendations and conclusions
4.1. Recommendations………………………………………………… 34
4.2. Conclusion …………………… ………………………………… 43

4.3. Limitations………………………….……………………………… 43
References…………………………………………………… 45
Appendixes
Appendix 1: The content of the course book……………… I
Appendix 2: Survey questionnaires for students………… IV
Appendix 3: Survey questionnaires for teachers……………V
Appendix 4: Answer key for reading task ………………….VII
Appendix 5: Answer key for the new final achievement test.VIII


iii



LIST OF TABLES AND CHARTS
1. Table 1: Scores on test A (invented data) by Arthur Hughes
2. Table 1: Scores on test B (invented data) by Arthur Hughes
3. Table 3: The components of the final achievement test
4. Table 4: What students had been taught and what they had been checked in part I,
II, III of the test
5. Table 5: What students had been taught and checked in the writing part.
6. Table 6: Paper specification grids for the final achievement test.
7. Chart 1: Students’ comment on validity of the test
8. Chart 2: Students' comment on time allowance of the test
9. Chart 3: Students' comment on difficult level of the test
10. Chart 4: The result of the test
11. Chart 5: The purpose of the test

















1
Chapter 1: Introduction
1. 1. Rationale
These days, the need of learning English has become greater and greater. In our
country Viet Nam, having recognized the importance of it, the Ministry of Education and
Training (MOET) has recently decided that English is a compulsory subject in most high
schools and universities. This decision requires both teachers and students to alter their
ways of teaching and learning. In addition, testing is one effective way to evaluate teaching
and learning. They are so closely related. Testing validates the teaching-learning process
while teaching and learning provides a great source of language materials for testing to
exploit. And testing is a concerned matter to all teachers.
During the teaching time at Phuong Dong University, the writer heard both teachers
and students here complaining that the English test did not often faithfully reflect the
teaching and learning process or in other words, the test did not reflect what the students
learnt and what the teachers taught. What was tested was not really taught and the test
measures neither the achievement of the course objectives nor the expected skills and
knowledge of students. It is shared by some test researchers as Brown (1994: 373) and

Hughes (1989:1) on recent language testing:
“A great deal of language testing is of very poor quality. Too often language
testing has a harmful effect on teaching and learning and too often they fail to
measure accurately whatever it is they are intended to measure.”
Another reason for the selection of this research topic lays in the fact that language
testing at Phuong Dong University has not been paid enough attention to. Classroom
language tests were often written in a hurry because the teachers here could not find time to
think carefully and plan the test. Sometimes, they did not have a clear idea of what they
were testing students for and why. They were busy mixing the number of various question
types and from that many students got low marks.
Due to its close relationship with language teaching and learning, testing deserves
proper attention from teachers and students in order for a positive backwash on the
teachers‟ teaching, students‟ satisfaction and encouragement in their study. In order to
design a good test to have exact, fair and effective evaluation of students‟ knowledge and

2
performance of English, teachers are supposed to have good knowledge of test writing
techniques and testing theories.
Because of all above-mentioned reasons, the writer is encouraged to undertake this
study entitled: “Content validity of the current English achievement test for second-year-
non-major students of English at Phuong Dong University” with aims at finding out the
strengths and weaknesses of this test in terms of the content validity and some, if any,
suggested solutions for the improvement of it.
1.2. Scope of study
The scope of this thesis is limited to a research on evaluating the final achievement
test in terms of its content validity by comparing the objectives, the syllabus and the
textbook allocation with the test contents. The study provides investigated and analyzed
data of the currently used test and proposes practical suggestions on the improvements of
this test.
Due to the limitations of time, ability and conditions, it is impossible for the writer

to cover all the tests. Only some suggestions for the improvements of the test are presented.
1.3. Aims of study
The study aims at checking the content validity of the final achievement test for
second-year-non-major students at Phuong Dong University. It places high emphasis on
analyzing the contents of the final achievement test.
The specific aims of this research are:
- To find out the strengths and weaknesses of the currently used test with reference to
the content validity.
- To suggest some improvements for the test.
1.4. Methods of study
In order to achieve the above-mentioned aims, a combination of many
methodologies was utilized.
Firstly, the writer based herself both on the theories and principles of language
testing and major characteristics of a good test with a special focus on test content validity.
From her own reading, many reference materials have been gathered and analyzed to draw
out a theoretical basis to evaluate the achievement test being used for second year students

3
in terms of its content validity. Basing on what students had learnt in their first semester
and the contents of this test, the writer would examine its content validity.
In addition, qualitative methodologies involving data collected through survey
questionnaires were employed. Two sets of questionnaires were administered to both
English teachers and students at Phuong Dong University to investigate their evaluative
comments on the content validity of the final achievement test and some suggestions for its
improvements.
1.5. Research questions
In this study, the writer tries to answer the two following questions:
Question 1: What are the strengths and weaknesses of the final achievement test with
reference to the content validity for second year non major students at Phuong Dong
University?

Question 2: What are some suggested solutions for the improvements of the test?
1.6. Design of study
The thesis is organized into five major chapters:
1. Chapter 1 INTRODUCTION presents such basic information as: the rationale, the
aims, the methods, the research questions and the design of the study.
2. Chapter 2 LITERATURE REVIEW presents a review of related literature that
provides the theoretical basis for evaluating and building a good language test. This
review includes background on language testing, criteria of good tests and theoretical
issues on test content validity.
3. Chapter 3 THE STUDY mentions the methods used in the research and which shows
its detailed results of the surveys including the questionnaires and the analysis of the
final achievement test in order to find out its problems with reference to the content
validity.
4. Chapter 4 RECOMMENDATIONS AND CONCLUSIONS. Recommendations
provide some suggestions for the improvements of the final achievement test basing on
the mentioned theoretical and practical study. Conclusions summarize the matters of
research, its findings as well as its limitations.



4
Chapter 2: Literature review
This chapter provides an overview of the theoretical background of the study. It
includes three main sections.
2.1. Language testing
2.1.1. Definition of language testing
Testing is an important part of every teaching and learning experience and becomes
one of the main aspects of methodology. The issue of language testing and its significant
role has been discussed a great deal by many professionals and research worldwide.
Different definitions of language testing are given out with various points of view.

According to Allen (1974:313), testing as an instrument to ensure that students have
a sense of competition rather than to know how good their performance is and in which
condition a test can take place. He says: “Test is a measuring device which we use when we
want to compare an individual with other individuals who belongs to the same group.”
Carroll (1986:46) stresses a psychological or educational test is a procedure
designed to elicit certain behavior from which one can make inferences about certain
characteristics of an individual. In other words, a test is a measurement instrument designed
to elicit a particular behavior of each individual.
According to Bachman (1990:20), what distinguishes a test from other types of
measurement is that it is designed to obtain specific sample of behavior. This distinction is
believed to be of great importance as it reflects the primary justification for the use of
language and has implications for how we design, develop, and use them to their best use.
Thus, language tests can provide the means for more focus on the specific assure of
interest.
In the point of view of Ibe (1981:1), “a sample of behavior under the control of
specified conditions aims toward providing a basis for performing judgment.” The term a
sample of behavior used here is quite board and it means something else rather than the
traditional types of paper and pencils.
Yet, Heaton (1988:5) has different opinion. In his ideal, tests are considered as a
mean of assessing the students‟ performance and to motivate the students. He looks at tests
with positive eyes as many students are eager to take tests at the end of the semester to

5
know how much knowledge they have. One important thing is that he points out the
relationship between testing and teaching.

2.1.2. The roles of language testing
Language testing is a form of measurement. It helps the teachers:
+ To assess the learner‟s achievement in a language program, for example, to
evaluate the testee‟s language knowledge in relation to a given curriculum or material

which the testee has gone through in a given course.
+ To assess a learner‟s proficiency in language in relation to future language use;
for example, to find out if a person‟s language is good enough for him to become a tourist
guide. This is the future use of the language regardless of what language programs or
materials the testee went through.
+ To diagnose a learner‟s strengths and weaknesses in a language and to attempt to
explain why the certain problems occur and what treatments could be used to tackle these
problems.
+ To classify or place the testees in the appropriate language classes.
+ To measure the testee‟s aptitude for learning a language.
+ To evaluate the effectiveness of a language program. This is often done by using
experimental and control classes with the same educational objectives but using different
methods and materials to achieve these objectives, Brown (2000:5).
In another way, Bebecca.M.Valette (1977:3) comments that classroom tests play
three important roles in second language teaching program. They are defining course
objectives, stimulating student progress and evaluating class achievement.
Firstly, classroom tests help us to define the course objectives. Students are quick to
observe types of tests given and to study accordingly. Thus, much as the teacher may
emphasize oral fluency in the classroom, if any tests are written tests the students will soon
concentrate on perfecting the skills of reading and writing.
Secondly, tests help stimulating student progress. As much as possible, the time
given over to classroom testing should provide a rewarding experience. The test should
furnish an opportunity for the students to show how well they can handle the specific
elements of the target language; gone are the days when the teacher designed a test to point

6
up the students‟ ignorance or lack of application. Tests should be distinctly announced in
advance to permit the students to prepare adequately. If the students themselves are
expected to demonstrate their abilities, it is only proper that they should learn as soon as
possible after the test how well they did. The test best fulfills its functions as a part of the

learning process if the correct performance is immediately confirmed and the errors are
pointed out.
The last role of testing is evaluating class achievement. Through frequent testing,
the teacher can determine which aspects of the program are presenting difficulties for
individual students and for the class as a whole. By analyzing the mistakes made on a given
test, the teacher can determine where to concentrate extra class drills and how best to assist
each student. At the same time, testing enables the teacher to discover whether the class
objectives are being met. Through tests, the teacher can evaluate the effectiveness of a new
teaching method, of a different approach to a difficult pattern, or of new materials. The
most familiar role of the classroom test is to furnish an objective evaluation of each
student‟s progress: his or her attainment of course objectives and his or her performances in
relation to the rest of the class.

2.1.3. Relationship between testing and teaching- learning
In the past, teaching and testing used to separate both theoretically and practically.
According to Williams (1983), a test is necessary imposition but outside the classroom, it is
unpleasant one because of two main reasons. The first one is that testing is concerned with
competition rather than cooperation. Thus, while classroom activities may involve pair
works and group works, such cooperation during a test is condemned as copying, and the
individual is expected to work alone. If these are perfectly possible, the results of a group
test may tell us very little about each individual in that group. In the same way, testing does
not admit cooperation between teachers and learners. The teacher who helps and
encourages the learners with their tasks and responds to their difficulties, in a test situation,
withdraws cooperation. The other reason followed from the first is that there should be a
winner and loser in the test. To be sure, those who close to win themselves do not feel too
upset, but those who gain little from experience, may feel conscious.

7
Nowadays, a new trend and development with a remarkable emphasis on integrative
and communicative tests has brought about many innovations in English testing techniques.

Most of the researchers comment that teaching and testing are so closely related. As Brown
(1994) states: “Teaching and testing are so interwoven and interdependent that it is difficult
to tear them apart”. Tests are constructed primarily as the devices to reinforce learning and
to motivate the students and as a means of assessing the student‟s performance in the
language. In the other words, a test is an extension of classroom work, providing teachers
and students with the useful information that can improve both teaching and learning
process. In turn, teaching and learning provide a great source of language materials for
testing to exploit.
A good test is a valuable teaching device for some reasons. Firstly, a test provides
the teachers information on how effective teaching has been. It helps the teaching process
to find out if students are capable of performing behavior. And from that, we can know the
characteristics of an individual. Secondly, with the aids of tests, teachers can monitor and
evaluate student‟s learning and diagnose the strengths and weaknesses as they occur. Last
but not least, basing on the test results, the teachers can evaluate the effectiveness of the
syllabus as well as the method and materials they are using.
However, testing has a harmful and beneficial effect on teaching and learning. For
example, if a test is regarded as important, then preparation for it can come to dominate all
teaching and learning activities. If the end goal is to help students to pass the test or
examination, many teachers will focus their teaching on the content of the test only. So the
teaching program may be distorted in many ways.

2.3. Major characteristics of a good test
Before writing the test, it is very necessary to answer this question: “What are the
major characteristics of a good test?” Harrison (1983: 10) claims that there are four basic
characteristics of all good tests. They are validity, reliability, practicality and
discrimination.





8
2.3.1. Test validity
2.3.1.1. What is test validity?
Validity is one of the most important characteristics of a good test. It has been a
controversial issue for a long time. A recent trend in language testing discussion is to
consider validity as a unitary concept with different types of validity and it is now
considered as aspect of validity.
Henning (1987:5) defines validity as follows:
“In general validity refers to the appropriateness of a given test or any of its
component parts as a measure of what it is purported to measure. A test is said to
be valid to the extent it measures what it is supposed to measure. It follows that the
term valid when used to describe a test should usually be accompanied by the
proposition for any test, then may be valid for some purposes, but not for others."
A test is considered valid when it specifically measures what is supposed to
measure. A listening test with written multiple choice options may lack validity if the
printed choices are so difficult to read that the exam actually measures reading
comprehension as much as it does listening comprehension. It is least valid for students
who are much better at listening than at reading. In other words, the test results are
interpreted as appropriate to the purposes of testing. That is, validity can be defined as the
degree to which a test actually tests what it is intended to test. For example, if the purpose
of a test is to test ability to communicate in English. And this test is valid if it does actually
test ability to communicate. When considering test validity is the degree to which a test
measures what it is supposed to measure, it has two very important aspects. The first one is
a matter of degree. There is a degree of validity, and some tests are more valid than the
others. A second important aspect of this definition is that tests are only valid or invalid in
terms of their intended uses. If a test is intended to test reading ability, but it also tests
writing, then it may not be valid for testing reading but it may test reading and writing
together.
Validity refers to the appropriateness or correctness of the inferences and
discussions made about individuals and groups from the test results. Valid must be

considered in terms of the correctness of a particular inference about test takers. Therefore,
validity is not always easy to measure.

9
2.3.1.2. Types of test validity
There are many types of validity such as: face validity, content validity, construct
validity, concurrent validity and predictive validity. In this part, the writer will focus on
only two main types: face validity and content validity.
2.3.1.2.1. Face validity
When mentioning face validity, we should concern with this questions: “Does the test
on the face of it appear from the learners‟ perspective to test what it is designed to test?”.
Face validity is almost always perceived in terms of content. If the test samples the actual
content of what the learner has achieved or expects to achieve, then face validity will be
perceived. According to Arthur Hughes (1989:40), a test is said to have face validity if it
looks as if it measures what it is supposed to measure. For example, a test which pretended
to measure pronunciation ability but which did not require the candidate to speak may be
thought to lack face validity. Candidates, teachers and education authorities may not accept
a test, which does not have face validity. Face validity concerns the appeal of the test to the
popular or non-expert judgment such as the candidates, the candidates‟ families, members
of the public and it is calculated by asking other teachers to give their opinions about the
test.
However, with the advent of communicative language testing, there has been
increased emphasis on face validity. It is important for communicative language test to look
like something one might do “in the real world” with language. They attribute such appeals
to “real life” to face validity. While opinions of students about the test are not expert, it can
be important because it is one kind of response that you can get from the people who are
taking the test. If a test does not appear to be valid to the test takers, they may not try their
best, so the perceptions of non-experts are useful.
In other words, the face validity affects the response validity of the test. This critical
view of face validity provides a useful method for language test validation.

Face validity can provide not only a quick and reasonable guide but also a balance to
a great concern with statistical analysis. Moreover, students‟ motivation is maintained if a
test has good face validity. On the other hand, the test appears to have little of relevance in
the eyes of the students, it will clearly lack face validity. It is possible for a test to include
all the components of a particular teaching program being followed and yet at the same

10
time lack face validity. The concept of face validity is far from now in language testing but
the emphasis now placed on it is relatively new. In the past, many test writers regarded face
validity simply as a public relation exercise. Today, most designers of communicative tests
regard face validity as the most important character of all types of test validity.
2.3.1.2.2. Content validity
2.3.1.2.2.1. What is content validity?
Among several kinds of validity, the simplest and most important one to the
language teachers is content validity.
In Read‟s opinions (1983:6), the most relevant type of validity for classroom testing
is content validity, which means that the contents of the test should reflect the contents and
the objectives of the syllabus that is being followed. In the other words, if we want to find
out students' progress of what they have learnt, the test should contain a representative
sample of the items, rules, skills or functions that they are supposed to achieve. Obviously,
the test contents are the main concern if content validity is to be achieved.
Kerlinger (1973) defines content validity is the representative or sampling adequacy
of the content, the substance, the matter and the topics of a measuring instrument.
In the same way, Harrison (1983: 11) defines content validity as:
"Content validity is concerned with what goes into the test.
The content of a test should be decided by considering the purpose of the
assessment, and then drawing up a list known as a content specification."
According to Cyril J.Weir (1990), the purpose of content validity is to examine
whether the test is a good representation of the material that needs to be tested and to
ensure the defensibility and fairness of interpretation based on the test performances. It

involves looking at empirical evidence- the hard factors emerging from data from test trials
or operational administrations and is calculated by comparing the test with its course
objectives. Last but not least, a test is said to be valid if it is relevant to the aims and
purposes of the learning areas on which it is set.
The most distinction between face validity and content validity was pointed out by
Alderson et al (1995: 173) as follows:
"In face validation, we do not necessary accept the judgment of others,
although we respect it, and appreciate that for those people it is real and important

11
and may, therefore, influence behaviors. In content validation, we gather judgments
from people we are prepared to believe."
In this case, if face validity is an appeal to the lay observers who are students,
administrators for example, the content validity is the opinion of the subject experts (i.e.,
teachers, test makers ) as to whether a test is valid.
For Kelly (1978), content validity seems as “an almost completely overlapping
concept" with construct validity. And for Moller (1982: 68), “The distinction between
construct and content validity in language testing is not always very marked, particularly
for tests of general language proficiency." In these cases, particular attention must be paid
to content validity in an attempt to ensure that the sample of activities included in a test is
as representative of the target domain as possible.
To sum up, the writer does in favor of Read‟s ideas, the most important
characteristics of a good test is content validity which means the contents of the test should
reflect the contents and the objectives of the syllabus that is being followed.

2.3.1.2.2.2. How to make the test more valid?
Firstly, in content validation, we should look at whether the test is representative of
the skills they are trying to test. It means that we should look at the content of the tests and
compare them with a statement of what the contents ought to be. This involves looking at
the syllabus in the case of an achievement test, the test specifications and deciding what the

test was intended to test and whether it accomplishes what it is intended to do. In the other
words, the content validity depends on the particular course objectives. In addition, the test
would have content validity only if it included a proper sample of the relevant structures.
Just what the relevant structures are will depend of course upon the purposes of the test. In
order to judge whether a test has content validity or not, we need a specification of the
skills or structures that it is meant to cover. Such a specification should be made at a very
early stage in test construction. It is not to be expected that everything in the specification
will always appear in the test. But it will provide the test construction with the basis for
making a principled selection of elements included in the test. A comparison of test
specifications and test contents is the basis for the judgments of content validity.

12
However, how important is content validity? Arthur Hughes (1989) gave two
important things of it. First, the greater a test's content validity, the more likely it is to be an
accurate measure of what it is supposed to measure. A test in which major areas identified
in the specification are not represented at all is unlikely to be accurate. Secondly, a test is
likely to have a harmful backwash effect. Areas, which are not tested, are likely to become
areas ignored in teaching and learning. Too often the content of tests is determined by what
is easy to test rather than what is important to test. The best safeguard against this is to
write full test specifications and to ensure that the test content is a fair reflection of these. In
the other words, when embarking on the construction of a test, the test writer should first
draw up a table of test specifications, describing in very clear and precise terms the
particular language skills and areas to be included in the test. If the test or sub-test being
constructed is a test of grammar, each of the grammatical areas should then be given a
percentage weighting. For example, the future simple tense 10%, uncountable nouns 15%,
relative pronouns 10% If the test or sub-test concerns reading, the each of the reading
sub-skills should be given a weighting in a similar way. For instance, deducing word
meanings from contextual clues 20%, search-reading for specific information 30%, reading
between the lines and inferring 12%, intensive reading comprehension 40%
According to Heaton, J.B (1982) the test writer has attempted to quantify and

balance the test components, assigning a certain value to indicate the importance of each
component in relation to other components in the test. In this way, the test should achieve
content validity and reflect the component skills and areas that the test writer wishes to
include in the test.
Anastasi (1982:131) defines content validity as: “essentially the systematic
examination of the test content to determine whether it covers a representative sample of
the behavior domain to be measured.” She provided a set of useful guidelines for
establishing content validity:
1. The behavior domain to be tested must be systematically analyzed to make
certain that all major aspects are covered by the test items, and in the correct
proportions.
2. The domain under consideration should be fully described in advance, rather
than being defined after the test has been prepared.

13
3. Content validity depends on the relevance of the individual's test responses to
the behavior area under consideration, rather than on the apparent relevance of item
content.
Brown (1994: 385) gives a list of necessary factors to improve the test validity:
+ A careful-construct well thought out format.
+ Item that is clear and uncomplicated
+ Direction that is crystal clear
+ Tasks that are familiar and relate to their course work
+ A difficulty level that is appropriate to your students
+ Test conditions that are biased for best that bring out students' best performances.
In the same way, Moore (1992: 11) stressed: “Content validity is established by
determining whether the instrument's test items correspond to the content that the students
are supposed to learn."
Correspondingly, to evaluate the test content validity, the test items should be
inspected regarding their correspondences to the teachers' stated objectives.

In short, test content validity is the most important characteristic of a good test. The
basis to evaluate content validity is a comparison between the test specifications and the
test contents.

2.3.2. Test reliability
Reliability is another necessary characteristic of any good test. A reliable test can be
used as a measuring instrument. If the test is administered to the same students on different
occasions (with no language practice work taking place among these occasions) then
produces different results, it is not reliable. So a test is said to be reliable if it can produce
the same results when administering to the same students under different times.
There are two types of reliability. The first one refers to the ability of a test to
produce the consistent results from the same students whenever it is used namely test-retest
reliability and the other type of reliability is the inter-item consistency which means that the
test should be able to measure the same thing all the time.
Bachman (1990), a leading expert, describes reliability as "a quality of test score".
We can look at the hypothetical data in table 1. They present the scores obtained by 5

14
students who took a 100-item test A on a particular occasion and those that they would
have obtained if they had taken it a day later. The most obvious thing of these is simply to
have people take the same test twice. We should note the size of the differences between
the two scores for each student:
Table 1: Scores on test A (invented data) by Arthur Hughes (1989: 30)
Students
Score obtained
Score which would have been
obtained on the following day
Bill
68
82

Mary
46
28
Harry
39
67
Don
43
35
Sue
62
49

Now have a look at table 2, which displays the same kind of information for a second
100-item test B again, note the difference in score for each student:
Table 2: Scores on test B (invented data) by Arthur Hughes (1989: 30)
Students
Score obtained
Score which would have been
obtained on the following day
Bill
65
69
Mary
48
52
Harry
85
90
Don

38
35
Sue
52
57

The differences between the two sets of scores are much smaller for test B than for
test A. Therefore, test B appears to be more reliable than test A.
According to Harrison (1969) states that there are some factors affecting the
reliability of a test:
+ Firstly, it is the extent of the sample of materials selected for testing. If the
validity is concerned with the contents of the sample, reliability is concerned with the size.
The larger the sample is, the greater the probability the test is reliable. If there are very few

15
items in the test, the test may rely too heavily on luck-weak candidates may score 50% or
more on a short.
+ Second factor affects the test reliability is the administration of the test. If
individual test items are too hard for everyone or too easy for everyone then they are not
reliable test items. They do not differentiate between the strong and weak candidates. The
important factor in deciding reliability is whether the same test is administered to different
groups under different conditions or not.
+ The third one is test instructions: Are the various tasks expected from the testers
made clear to all candidates in the rubrics?
+ Another factor that influences on the reliability of a test is how much the test is
based on the passages and questions taken directly from a textbook and how much it is
based on the syllabus within the textbook, not the book itself. An over-emphasis on
“quoting” the textbook in a test will produce results that do not reveal achievement
professional progress of the learners in terms of reading, writing, listening, speaking,
vocabulary and grammar. The results will only reveal how well students have memorized

the passages and the correct answers.
+ Last but not least, one of the most important factors affecting reliability is the
scoring the test. Sometimes, a test can be unreliable because of the way it is marked. For
example, if an average composition is marked immediately after a very good composition,
the average composition may be given a mark that is actually below average. The marker‟s
subconscious comparison of the two compositions will result in the average composition
appearing worse than it really is. However, if the same average composition is marked
immediately after a very poor composition, then it may appear above average and be
awarded a higher mark than it deserves. In addition, different markers may award different
marks to the same composition; for example, some of the markers may be very lenient and
others may be unfairly strict.
To sum up, reliability is an undeniable important characteristic of a good test. If the
test result is not reliable, the assessment of it is not reliable either. In order to make the test
more reliable, it is important for the testers to consider many influential factors such as: test
administration involving scoring, timing, testing conditions, observation or control of doing

16
the test; the size of the test; test instructions and scoring methods right from the outsets of
the test constructing process.

2.3.3. Relationship between reliability and validity
Reliability and validity are essential measurement qualities of a good test. They are
qualities that provide major justification for using test scores and numbers as the basis for
making inferences or decisions (Bachman et al. (1996: 19)).
They have a complicated relationship. On the one hand, it is possible for a test to be
reliable without being valid. That is, a test can give the same result time after time but does
not measure what it was intended to measure. On the other hand, if the test is not reliable, it
cannot be valid at all. To be valid, according to Hughes (1988:42), a test must provide
consistently accurate measurements. It must therefore be reliable. A reliable test, however,
may not be valid at all. For example, in a writing test, the candidates are requires to

translate a text of 500 words into their own language. This could well be a reliable test but
it is unlikely to be a valid test of writing. In our efforts to make test reliable, we must be
wary of reducing their validity.
The problem is that while one can have test reliability without test validity, a test
can only be valid if it is also reliable. There is thus sometimes said to be a reliability-
validity tension. This tension exists in the sense that it is sometimes essential to sacrifice a
degree of reliability in order to enhance validity. However, if validity is lost to increase
reliability, we finish up with a test which is a reliable measure of something rather than
what we wish to measure. The two concepts are: if a choice has to be made “validity after
all, is the more important one”, (Guilford (1965:481)).
Moller (1981:67) comments that while it is understood that a valid test must be
reliable, it would seem that in such a highly complex and personal behavior as using a
language rather than one‟s mother tongue, validity could be claimed for measures that
might have a lower than normally acceptable level of reliability. Reliability is something
we should always try to achieve in our tests. Test reliability can not be ignored without a
harmful affect on the validity of the instrument.

17
Therefore, test validity and reliability are the two chief criteria for evaluating any
tests. And the ideal test should be both valid and reliable. However, the greater the
reliability of a test is, the less validity it has.

2.3.4. Practicality
In addition to reliability and validity, practicality plays an important role in deciding
whether a test is good or not. The main question of practicality is administrative. A test
must be carefully organized well in advance: How long will the test take? What special
arrangements have to be made? (For example, what happens to the rest of the class while
individual speaking tests take place)? Is any equipment needed (tape recorder, language lab,
overhead projector)? How is marking the work handled? How are tests stored among the
settings of tests? All of these questions are practical since they help ensure the success of a

test and testing, (Heaton: 1988). Therefore, practicality includes financial limitations, time
contains, ease of administration, scoring and interpretation.
According to Brown (1994), if a test which is prohibitively expensive, takes a
student ten hours to complete and takes a few minutes for students to do but several hours
for teachers to evaluate, is impractical.
Another important aspect of practicality we have to concern is that the test should
have “instructional value”, Oller (1979). The test should enhance the delivery of the
instructions into the students. The teachers need to make clear and useful interpretation for
students to understand and learn better. The instructions of the test should be clear and easy
for the students to know what they have to do. From knowing what to do, they can get
higher marks. In contrast, a too complicated or too difficult test may not be practical to the
teachers and the students.
To sum up, in order to be useful and efficient, tests should be as economical as
possible in terms of time and cost. In addition, the test‟s instructions should be well-written
for students to know what they ought to do.

2.3.5. Discrimination
Discrimination is another important factor that test designers have to concern when
writing a test. Heaton (1988) defines discrimination of a test is the capacity to discriminate

18
the different students and to reflect the differences in the performances of the individual in
groups. The test can not realize discrimination if the test items is either too easy or too
difficult. Therefore, the test items must be written in ranging from “extremely easy items”
to “extremely difficult items”. In the other way, Harrison (1994: 14) defines discrimination
as: “The extent to which a test separates the students from each other." Discrimination tells
us whether the test can differentiate between the more proficient students and the less
proficient ones. The extent of the need depends on the purposes of the test. For example, if
a placement test is able to efficiently discriminate among students, it will be much easier to
divide students into the suitable groups. In many classroom tests, the teacher will be much

more concerned with finding out how well the students have mastered the syllabus so the
teachers will hope higher results from the students.
Summary
In this chapter, the writer has reviewed definitions and the roles of language
testing; and four major characteristics of a good test with aims at finding out the emphasis
on the content validity and how to make the test more valid. In addition, the relationship
between reliability and validity is also presented in order to have the ideal test.











19
Chapter 3: The study
3.1. English learning, teaching and testing at Phuong Dong University
3.1.1. The students
At Phuong Dong University, students come from different parts of the country.
Most of these students commonly did not spend much time learning English at high
school as they had to devote most of their time to learning different subjects, for
example: mathematics, physics, chemistry, drawing… in order to pass the universit y
entrance examination. Thus, they are real beginners of English when entering university,
and of different language proficiency levels.

3.1.2. The teachers

English teachers working with 2
nd
year students are at different ages. Half of
them are at the age from 45 to 55 and the rest from 25 to 38 years old. They graduated
from three education institutions: Ha Noi National University, Ha Noi Foreign Language
University and Phuong Dong University.

3.1.3. The course book: “New Headway Elementary- The third edition”
The book “New Headway-Elementary- The third edition” has been used as the
textbook to teach the second year students at Phuong Dong University. This material is
designed for students at elementary level.
It consists of 14 units, designed in a harmonious combination with
powerful lexical to increase learners‟ vocabulary and develop awareness of the
English culture.
Each unit is divided into three parts, and each part lays a focus on grammar,
function or vocabulary. Every unit provides students with opportunities to learn and
develop their knowledge in categories of grammar, vocabulary, communication skills
and pronunciation through practice activities of listening, speaking, reading and writing
(see Appendix 1- page I)



20
3.1.4. Syllabus and its objectives
For the first semester of second year students, seven units from unit 7 to 14 are
taught in 45 periods (50 minutes per period) and delivered within about 9 weeks.
Students still work on four areas of grammar, vocabulary, communication skills, and
pronunciation and they have chance of dealing with different topics. The aims of
the course are to help increase students‟ basic knowledge of vocabulary, grammar and
also practice of four basic language skills such as listening, speaking, reading and writing

in social situations.

3.1.5. The final achievement test for second year non major students
The final achievement test consists of the following parts: types, items, tasks
Part
Types
Items
Tasks
Marks
Part 1
Rewrite the sentences
5
Rewrite sentences so that there is no
change of meanings
2
Part 2
Guided sentence
building
5
Use the following sets of words to
write complete sentences
2
Part 3
Correct mistakes
5
Find and correct one mistake in
each sentence
2
Part 4
Write a paragraph

1
Write a paragraph of 100-120 words
about your capital city
4
Table 3: The components of the final achievement test
Looking at the marking criteria for the test, we can see that it has confused many teachers
and worried students. It is very difficult for teachers to mark part 4 as there are no detailed
marking criteria such as: language, content, grammar, etc….

3.2. Research method
In this study, both quantitative and qualitative methods are used. They are survey
questionnaires and document analysis. However, with the scope and purposes of this study,
document analysis is taken as the main method to find out the strengths and the weaknesses
of the final achievement test regarding to content validity. In addition, survey
questionnaires help the writer collect more information of both teachers and students about

21
this test. Obviously, although each method helps to collect and confirm different kinds of
data, it has its own unavoidable shortcomings.

3.2.1. The survey questionnaires
There are many ways to collect data and survey questionnaire is one effective way
because of some reasons. Firstly, they can be used to gather information about teachers‟
and students‟ attitudes, views and thoughts to the content validity of the end-of-term 1 test.
Secondly, there are no confrontations between the persons who do the surveys and the
informants because it is often a list of questions. Therefore, the informants can feel free to
express their thoughts. Thirdly, most of the answers for the questions are closed ones so it
is easier for the writer to collect and analyze the data. Finally, it can gather a large numbers
of responses.


3.2.2. Document analysis
Besides survey questionnaires, document analysis is considered as the main method
to evaluate the final achievement test in terms of the content validity.
Firstly, the writer will analyze the “The New Headway- Elementary- The third
edition” to find out what the teachers have to teach, what the students ought to learn.
Because the purpose of this study is investigating into the content validity of the final
achievement test for second year students at Phuong Dong University, analyzing this test is
one effective way to get this purpose. Basing on the theories about testing, designing a test
and characteristics of a good test, the writer will analyze this test by comparing the course
objectives and what the students had learnt with the test contents in order to find out the
strengths and weaknesses of the test and then give some suggested solutions for its
improvements.
Last but not least, the writer will analyze the data of survey questionnaires from
both teachers and students to see how their comments about this test are.
Summary
Evidently, it is important to use several methodologies to compare the results
received and to ensure the authenticity of the results. Of course, the informants‟ real
feelings and full views are expressed. Besides, document analysis is a rich source of the

×