Tải bản đầy đủ (.doc) (43 trang)

THIẾT kế một bài THI ĐÁNH GIÁ kết QUẢ học tập môn TIẾNG ANH

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (239.43 KB, 43 trang )

DECLARATION
I certify this minor thesis of the Study Project entitled:
Designing an English achievement test for the first - year non-English major students
in Son La Teachers’ Training College
To total fulfillment of the requirement for the degree Master of Arts.
Son La, February 2009
Nguyễn Thị Ngọc Thuý
1
ACKNOWLEDGEMENTS
To carry out this MA coursework I am indebted to many people for their
encouragement, cooperation, and advice.
First and foremost, I would like to express my deepest gratitude to Mr. Vũ Văn
Phúc, my supervisor for his useful advice, insightful ideas, and dutiful supervision.
I also would like to take this opportunity to express my thanks to all the colleagues
in the English department in STTC (Son La Teachers’ Training College), for their help in
answering questions in surveys, direct interviews, for their constructive suggestions about
this research.
I would like to give my special thanks to the students of the first year at STTC who
have actively participated in doing the sample test, the surveys and responding to my
interviews.
Last but not least, my sincere thanks go to my family, my classmates, my friends,
especially my husband who encouraged and helped me to carry out the thesis.
2
ABSTRACT
Testing plays a very important role in teaching process which helps teachers to
assess their teaching as well as their students’ learning. Evaluating a test in terms of its
qualities such as reliability and validity is very necessary as to ensure the usefulness of this
assessment instrument. However, this issue receives little considerations from teachers at
Son La Teachers’ Training College. Thus, this minor study was designed to evaluate two
qualities, reliability and validity, of the achievement test for the first year non-major
English students at Son La Teachers’ Training College.


The MA thesis entitled “Designing an English achievement test for the first - year
non-English major students in Son La Teachers’ Training College” deals with the problems
of the current testing situation in the College. On doing the study the author attempts to
study the possible reasons for the low exam results of the students in the English program
and to propose an elementary achievement test for the first year non-major English
students. The study is based on both the analysis of the English course including the
subjects, the syllabus and the sample test and the analysis and interpretation of the test
scores and test items of the suggested test.
3
LIST OF ABBREVIATION
1 STTC Son La Teachers’ Training College
2 FV Facility Value
3 Sd Standard deviation
4 p Level of difficulty
5 D Discrimination
6 g-d Good discrimination
7 m-d Medium discrimination
8 b-d Bad discrimination
9 b-i Bad item
10 α Coefficient alpha
11 IELTS International English Language Testing System
12 TOEFL Testing English as a Foreign Language
LIST OF TABLES AND FIGURES
TABLE
Table 1 Test item types for Reading and Writing skills and Grammar, Vocabulary
4
Table 2 Time frame and the units
Table 3 The key points of those 8 units
Table 4 The specification grids of the test
Table 5 Frequency distribution in the final achievement test

Table 6 The standard derivations of 5 scales in the test
Table 7 The interpretation of the item difficulty of the test
Table 8 The result of item discrimination of the test
Table 9 The coefficient alphas of the 5 scales in the test
FIGURES
Figure 3.3.4.1 Histogram of score distribution
TABLE OF CONTENTS
Page
Declaration i
Acknowledgement ii
Abstract iii
List of abbreviation iv
List of tables and figures v
Tables of contents vi
CHAPTER 1: INTRODUCTION 1
1.1 Rationale 1
1.2 Scope of the study 3
5
1.3 Aims of the study 3
1.4 Methods of the study 4
1.5 Research questions 4
1.6 Design of the study 4
CHAPTER 2: LITERATURE REVIEW 5
2.1 Basic concepts of testing 5
2.2 Types of tests 5
2.2.1 Proficiency tests 5
2.2.2 Achievement tests. 6
2.2.3 Diagnostic tests 8
2.2.4 Placement tests 8
2.2.5 Progress tests 8

2.2.6 Direct versus indirect tests 9
2.2.7 Discrete point verse integrative testing 10
2.2.8 Norm – referenced versus criterion – referenced testing 11
2.2.9 Objective testing versus subjective testing 11
2.2.10 Communicative language testing 12
2.3 Characteristics of a good test 13
2.3.1 Validity 13
2.3.1.1 Construct validity 13
2.3.1.2 Content validity 13
2.3.1.3 Face validity 14
2.3.1.4 Backwash validity 14
2.3.1.5 Criterion-related validity 15
2.3.2 Reliability. 15
2.3.3 Discrimination 15
2.3.4 Practicability.
16
6
2.4 Test items for reading skill, writing skill, grammar, and vocabulary 17
2.4.1 Test items 17
2.4.2 Language components and language skills. 17
2.4.3 The test item types used to evaluate languages components and language
skills. 18
CHAPTER 3: THE STUDY 19
3.1 The subjects and the current English teaching, learning and testing
situations at STTC 19
3.1.1 Students and their backgrounds 19
3.1.2 The English teaching staff 19
3.1.3 English teaching and learning at STTC 20
3.1.4 The objectives of the elementary English course 21
3.1.5 Teaching material used for first-year students at STTC 22

3.2 The current testing situation at STTC 24
3.3 The proposed construction of the achievement test for the first year
students at STTC 25
3.3.1 Test objectives 25
3.3.2 The Paper Specification Grids for the 2
nd
Term Achievement Test 25
3.3.3 Data collection 26
3.3.4 Interpretation and test score analysis: 27
3.3.4.1 The frequency distribution 27
3.3.4.2 The central tendency 27
3.3.4.3 The dispersion 28
3.3.5 Test items evaluation 28
3.3.5.1 The item difficulty 28
3.3.5.2 The item discrimination 30
3.3.6 Estimating the reliability of the test 32
3.4 Teachers’ and students’ comments 32
CHAPTER 4: CONCLUSION 34
4.1 Summary of the study 34
4.2 Limitation 34
4.3 Suggestion for further study 35
REFERENCES -1-
APPENDICES -3-
Appendix 1: The Sample of The 2
nd
term Achievement Test -3-
Appendix 2: The result of test analysis using ITEMAN software
-8-
7
CHAPTER 1: INTRODUCTION

1.1 RATIONALE
The importance of language testing is recognized by virtually all professionals in the field of
language education. It is of special importance in educational system that is highly competitive as
testing is not only an indirect stimulus to learning, but also plays a crucial role in determining the
success or failure of an individual's career with direct implications for his future earning power.
"Thus, testing is an important tool in educational research and for programme evaluation, and may
even throw light on both the nature of language proficiency and language learning"(Lauwerys and
Seanlon, 1969).
Likewise, in the teaching and learning foreign language process, testing takes a very important role.
Language testing is one of the most important ways to evaluate how students acquire when
they learn a foreign language. Through tests teachers know not only the success or failure
of learners but also how well the learners use what they have been taught. Moreover, the
learners know what they gain, what they can apply, and what they cannot. Moore (1992,
p.138) states: “Evaluation is an essential tool for teachers because it gives them feedback concerning
what the students have learned and indicates what should be done next in the learning process.
Evaluation helps you to better understand students, their abilities, interests, attitudes and needs in order
to better teach and motivate them.” Nga (1997, p.1) reaches the same conclusion: “Tests are assumed to
be powerful determiners-of what happens in classroom and it is commonly claimed that they affect
teaching and learning activities both directly and indirectly.”
Therefore, testing is an important part of the teaching and learning process; but has it been given
adequate attention and careful study yet? Test researchers (Hughes, 1989; Brown, 1995;
Read, 1982; Hai, 1999; Tuyet, 1999) in general claim that unfortunately tests have got a
bad rap in recent years and not without reason. More often than not, tests are seen by
learners

as dark clouds hanging over their heads, upsetting them with thunderous anxiety
as they anticipate the lightning bolts of questions they do not know and worst of all a flood
of disappointment if they do not make the grade” (Brown, 1994a: p.373). Hughes (1989,
p.1) makes another comment on recent language testing: “It cannot be denied that a great
deal of language testing is of very poor quality. Too often language tests have a harmful

effect on teaching and learning and too often they fail to measure accurately whatevaer it is
8
they are intended to measure.” This coupled with the fact that teachers frequently lack
formal training in educational measurement techniques and they tend to be alienated from
the testing process. They regard it as a necessary evil, an intrusion on their regular
instructional activities.
At present, English tests at Son La Teachers’ Training College (STTC) have the following
characteristics:
- It has not been given appropriate attention and careful study
- Its role in teaching and learning has not been fully recognized.
- Almost language teachers think that teachers should be responsible for making tests
because testing is one part of teaching and learning activities that students have to pass.
- There has been a tendency using commercial (ready-made) tests rather than teacher self-
made tests since commercial tests are very convenient and do not take much time to
construct. Thus these selected tests may not be relevant to the objectives of the course.
- Test content is sometimes found to be unrelated to the objectives of the course and very
often many test items in some tests have not been dealt with classes.
- Students have complained that there is still a big gap between what is taught and what is
tested. An instance for this would be the case when some tests designed for pre-
intermediate level are given to students of elementary level. They are so difficult that only
few students can accomplish. Therefore, such tests are not valid and reliable.
- Using tests exclusively for grading, there is no feedback about the tests.
- There has been no discarding of bad tests or bad items. Some items are found to be so
difficult that few testees could do whereas there are test items, which are so easy that all
testees can obtain the correct answers. Such items should be discarded or replaced.
- Moreover, due to the fact that the writing and reading comprehension tests at the
university are totally designed with multiple choice techniques so students can easily cheat
by asking and copying answers from their classmates.
- Apart from those carefully designed tests, some others are still of low and poor quality
and these do not accurately measure the students' real ability. Perhaps the test writer only

pays attention to the fulfillment of his/her duty, which is to give tests, rather than to the
effectiveness of the tests. Those tests often fail to measure accurately whatever they are
intended to measure.
9
- Finally, the last testing problem at STTC is that some of the tests may lack reliability
because they are not pre-tested anywhere else for the sake of confidentiality. Truly, for the
sake of "confidentiality" test designers are often informed to write tests at short notice, just
some time before it is administered. In such circumstances who can say for sure that the
required standards, criteria will be met by the test writers?
Therefore, a well-design test is necessary for every language level especially for college
level since it is the elementary level, which aims at acquiring survival English and
diagnosing students’ aptitudes in the course and what they have to study to improve both
their knowledge and skills. In this minor thesis, the author bases herself on the knowledge
of testing and testing situation to propose a sample achievement test for the first year
students who have been taught the student’s book New Headway English Course
(elementary level) from unit 1 to unit 8.
1.2 SCOPE OF THE STUDY
The scope of the study focuses on the existing situations at Son La Teacher’s Training
College. I design a sample test only on writing and reading skills focusing on grammar,
vocabulary, reading and writing skills. The study provides investigated and analyzed data
of the achievement test for the first-year non-English major students. Moreover, the
teachers’ and students’ comments on the test and their suggestion for its improvement will
be presented in this thesis.
1.3 AIMS OF THE STUDY
The aim of the study is to report a research examining the current testing situations and
language tests for non-English majors at STTC with great emphasis on analyzing the result
of the sample test, the teachers’ and students’ comments on the test and their suggestion for
its improvement. The specific aims of the study are:
1. To investigate the STTC teachers’ evaluation and students’ evaluation of the
sample test concerning its content, time allowance and its format.

2. To investigate the teachers’ suggestions and students’ suggestions for improving
testing situations and language tests at STTC.
10
3. To propose an achievement test construction for the first-year students at STTC
and a sample test will be designed based on the proposed test construction.
4. To offer some practical recommendations for improving of testing situation at STTC.
1.4 METHODS OF THE STUDY
In order to achieve the above aims, a study has been carried out with the following
approach. Basing on the theory and principle of language testing, major characteristics of a
good test, especially achievement tests, the author analyzes the results of the sample test,
and the survey questionnaire done on 10 English teachers of the English major students at
STTC. Many other methods, such as interviews, informal discussion with students,
teachers, and classroom testing observation are also employed to get more needed
information.
1.5 RESEARCH QUESTIONS
The research questions of the study are as follows:
1. What should be done to improve the English testing situation for the first-year
students at STTC?
2. Which test components are considered appropriate for the English Achievement
test construction at STTC?
1.6 DESIGN OF THE STUDY
The minor thesis is organized into four chapters
Chapter one is the introduction consisting of the rationale, the aims, the method, the
research questions and the design of the study.
Chapter two presents the literature review on the basic concepts of testing, types of tests
and characteristics of good tests, the test items, test item types of language components and
language skills.
Chapter three, which is the main part of the study, shows the analysis of the finding of test
designing and some brief comments from teachers and testees.
Chapter four deals with some suggestions to improve the test and the summary of the

research.
11
CHAPTER 2: LITERATURE REVIEW
2.1 BASIC CONCEPTS OF TESTING
According to Brown (1994: p.252), “A test, in plain or ordinary words, is a method of
measuring a person’s ability or knowledge in a given area.” Moore (1992: p.138) proposes
that evaluation is an essential tool for teachers because it gives them feedback concerning
what the students have learned and indicates what should be done next in the learning
process. Evaluation helps us to understand students better, their abilities, interests,
attitudes, and needs in order to better teach and motivate them. However, in the book of
Brown (1994, p.373) he stresses that tests are seen by learners as dark clouds hanging over
their heads, upsetting them with thunderous anxiety as they anticipate the lightning bolts of
questions they do not know and worst of all a flood of disappointed if they do not make the
grade. Read (1983, p.3) shares the idea saying a language test is a sample of linguistic
performance or a demonstration of language proficiency. In other words, a test is not
simply a set of items that can be objectively marked; it can also involve a ‘subject’
educational of spoken and written performance with the assistance of a checklist, a rating
scale, or a set of performance criteria.” Nga (1992, p.2) also confirms that tests commonly
refer to a set of items or questions designed to be presented to one or more students under
specified conditions. Harrions (1986, p.1) notices that a natural extension of classroom
work, providing teachers and students with useful information that can serve as a basis for
improvement and a test is necessary but unpleasant imposition from outside the classroom.
That means test is a useful tool to measure learners’ ability in a certain situation especially
in classroom.
2.2 TYPES OF TESTS
2.2.1 Proficiency Tests
According to Hughes (1990:9), “Proficiency tests are designed to measure people’s ability
in a language regardless of any training they may have had in that language.” That is to say
the content of a proficiency test is not based on the content or objectives of any language
12

course test takers may have followed. It is rather based on a specification of what they
have to be able to do in the language to meet the requirement of their future aims.
Other test specialists, such as Carroll and Hall (1985), Harrison (1986) and Henning (1987)
share the same view that proficiency test helps both teachers and learners know whether
the learners can be able to follow a particular course or they have to take some pre-
departure training to some other popular tests such as TOEFL, IELTS, which are used to
test students’ proficiency for their study in some English speaking countries. In Vietnam
proficiency tests are of different levels namely A, B, C for workers, engineers, teachers,
architects, etc.
2.2.2 Achievement Tests
As it has been mentioned above, not many teachers are interested in proficiency tests since
it does not base on any particular course book. (Hughes, 1990:10) states: “In contrast to
proficiency tests, achievement tests are directly related to language courses, their purpose
being to establish how successful individual students, groups of students, or the courses
themselves have been in achieving objectives”. Achievement tests are usually carried out
after a course on a group of learners who take the course. Sharing the idea about
achievement tests with Hughes, Brown (1994:259) suggests: “An achievement test is
related directly to classroom lessons, units or even total curriculum”. Achievement tests, in
his opinion, “are limited to a particular material covered in a curriculum within a particular
time frame.” Another useful comment on achievement tests offered by Finocchiaro and
Sako (1983:15) is that achievement types or attainment tests are widely employed in any
language teaching institutions. They are used to measure the amount of degree of control
of discrete language and cultural items and of integrated language skills acquired by the
students within a specific period of instruction in a specific course”. In his book, Harrison
(1983:7) shows: “an achievement test looks back over a longer period of learning than the
diagnostic test, for example, a year’s work, or even a variety of different courses.” He also
points out that achievement tests are intended to show the standard, which the students
have reached in relation to other students at the same level.
There are two kinds of achievement tests: final achievement tests and progress
achievement tests.

13
Final achievement tests are those administered at the end of a course of study. They may
be written and administered by ministries of education, official examining boards, or by
members of teaching institutions. Clearly, the content of these tests must be related to the
courses with which they are concerned, but the nature of this relationship is still a matter of
disagreement amongst language testers.
According to some testing experts, the content of a final achievement test should be based
directly on a detailed course syllabus or on the books and other material used. This has
been referred to as the syllabus–content approach. It has an obvious appearance, since the
test only contains what it is thought that the students have actually encountered, and thus
can be considered, in this respect at least, a fair test. The disadvantage of this type is that if
the syllabus is badly designed, or the books and other materials are badly chosen, then the
results of a test can be very misleading. Successful performance on the test may not truly
indicate successful achievement of course objectives.
The alternative approach is to design the test content directly on the objectives of the
course, which has a number of advantages. Firstly, it forces designers to elicit course
objectives. Secondly, test takers show how far they have achieved those objectives. This in
turn puts pressure on those who are responsible for the syllabus and for the selection of
books and materials to ensure that these are consistent with the course objectives. Tests
based on course objectives work against the perpetuation of poor teaching practice, a kind
of course–content–based test, almost as if part of a conspiracy fails to do. It is the author’s
belief that test content based on course objectives is much preferable, which provides more
accurate information about individual and group achievement, and is likely to promote a
more beneficial backwash effect on teaching.
Progress achievement tests, as the name suggests, are intended to measure the progress that
learners are making. Since ‘progress’ in achieving course objectives, these tests should be
related to objectives. These should make a clear progression towards the final achievement
test based on course objectives. Then if the syllabus and teaching methods are appropriate
to these objectives, progress tests based on short – term objectives will fit well with what has
been taught. If not, there will be pressure to create a better fit. If it is the syllabus that is at fault, it

is the tester’s responsibility to make clear that it is there, that change is needed, not in the tests.
14
In addition, more formal achievement tests require careful preparation; teacher could feel
free to set their own ways to make a rough check on students’ progress to keep learners on
their toes. Since such tests will not form part of formal assessment procedures, their
construction and scoring need not be purely towards the intermediate objectives on which a
more formal progress achievement tests are based. However, they can reflect a particular
‘route’ that an individual teacher is taking towards the achievement of objectives.
2.2.3 Diagnostic Tests
According to Hughes (1990:13), “Diagnostic tests are used to identify students’ strengths
and weaknesses. They are intended primarily to ascertain what further teaching is
necessary”. Brown (1994:259) proposes, “A diagnostic test is designed to diagnose a
particular aspect of a particular language.” Harrison (1983) remarks that this kind of tests
is used at the ends of a unit in the course book or after a lesson designed to teach one
particular point. This kind of test is reasonably straight-forward to find out what skills are
applied well or badly by the learners. Otherwise, this leads to disadvantage, as it is not so
easy to obtain a detailed analysis of a learner’s command of grammatical structures. In
order to be sure of this, we would need a number of examples of the choice the student
made between the two structures in every different context on which we thought was
significantly different and important enough to warrant obtaining information. Tests of this
kind still need a tremendous amount of work to produce. Whether or not they become
generally available will depend on the willingness of individuals to write them and of
publishers to distribute them.
2.2.4 Placement tests
According to Hughes (1990:14), “Placement tests are intended to provide information
which will help to place students at the stage of the teaching progamme most appropriate
to their abilities. Typically, they are used to assign students to classes at different levels.”
In other words, we use placement tests to place pupils into classes according to their ability
so that they can start a course approximately at the same level as the other students in the group.
2.2.5 Progress Tests

A progress test is designed to measure the extent to which the students have mastered the
material taught in the classroom. It is based on the language programme which the students
15
have been following and is just as important as an assessment of the teacher's own work as
the students' own learning. Results obtained from the progress tests enable the teacher to
become more familiar with the work of each of the students and with the progress of the
class in general. It also aims at stimulating learning and reinforcing what has been taught.
Good performances may act as a mean of encouraging the students, and even poor
performances may act as an incentive-to more work.
According to Baker (1989, p.103), the frequent use of the progress test, as a goad to
encourage application on the part of the learners, can also in theory serve as a basis for
decisions on course content, learner placement and future course design. He also concludes
that the results of a progress test can be used as an indication to parts of the course content,
which have not been mastered by numbers of students and thus need remedial action.
Moreover, a properly written progress test sampling correctly from the course content can
be a pointer to learners which part of the course need more attention, and to course
designers which parts of the course have not been effective. Whereas, Khoa's research
(1999, p. 13) establishes: “A progress test is an ‘on-the-way’ achievement test, which is
linked to the specific content of a particular set "of teaching materials" or particular course
of instruction.
Progress tests are prepared by a teacher and given at the end of a chapter, a course, or a
term. They may also be regarded as similar in nature to achievement tests but narrower and
much more specific in scope. These tests help the teacher to judge the degree of success of
his or hers in teaching and to identify the weaknesses of the learners. The application of
progress tests is gaining force in many universities and colleges in Vietnam nowadays.
They are parts of what is generally known as ''continuous assessment", a process of
assessment which takes into consideration the results scored by students when they did
their progress tests.
2.2.6 Direct versus Indirect Tests
It is pointed out by Hughes (1990:15) that direct testing requires the candidate to perform

precisely the skills that we wish to measure. If we want to know how well the candidate
can write compositions, we ask them to write compositions. If we want to know how well
they pronounce words, we ask them to speak. The tasks, and the texts which are used,
should be as authentic as possible. There is a fact that the tasks cannot be really authentic.
16
Nevertheless, the effort is to make them as realistic as possible. Direct testing is easier to
design when it is intended to measure the productive skills of speaking and writing since
the very acts of speaking and writing provide us with information about the candidate’s
ability. With listening and reading it is necessary to get candidates not only to listen or read
but also to demonstrate that they have done this successfully. He also indicates several
attractions of direct testing. Firstly, if teachers want to assess pupils’ ability, it is relatively
straightforward to create the conditions, which will elicit the behavior based on judgments.
Secondly, in his opinion at least in the case of the productive skills, the assessment and
interpretation of students’ performance is quite straight - forward. Thirdly, there is likely to
be a helpful backwash effect since practice for the test involves the practice of the skills
that we want to encourage.
By contrast, indirect testing tries to measure the abilities that “underlie” the skills in which
we are interested (Hughes, 1990:15). One section of the TOEFL is considered an indirect
measure of writing ability where the candidate has to identify which of the underlined
elements is erroneous or inappropriate in formal Standard English. Another example of
indirect testing id Lado’s (1961) proposes methods of testing pronunciation ability by a
paper and pencil test in which the candidate has to identify pairs of words, which rhyme
with each other. The main problem with indirect tests is that the relationship between
language performance and skill performance in which we are usually interested tends to be
rather weak in strength and uncertain in nature. We do not know enough about the
component parts of composition writing to predict accurate composition writing ability
from scores on tests that measure the abilities, which we believe underlies it. We may
construct tests of grammar, vocabulary, discourse markers, handwriting, and punctuation.
Still we will not be able to predict accurately scores on compositions even if we make sure
of the representation of the composition scores by taking many samples.

2.2.7 Discrete point verse integrative testing
According to Hughes (1990:16), “Discrete point testing refers to the testing of one element
at a time, item”, which means the test involves a series of items and each item tests a
particular grammatical structure. On the contrary, integrative testing requires the candidate
to combine many language elements in the completion of a task involving writing a
composition, taking notes while listening to a lecture, taking a dictation, or completing a
17
cloze passage. Henning (1987) shares with Hughes the idea that discrete point tests will
usually be indirect, while integrative tests will tent to be direct. However, some integrative
testing methods, such as the cloze procedure, are indirect. Similarly, he stresses that the
distinction between discrete point and integrative was tests originated by John and Carroll
(1961). Discrete point tests are designed to measure knowledge or performance in very
restricted area of the target language. On the other hand, integrative tests are said to tap a
greater variety of language abilities. Moreover, Henning (1987) offers examples of
integrative tests such as random cloze dictation, oral interview, and oral imitation tasks.
2.2.8 Norm – Referenced versus Criterion – Referenced Testing
Imagine that a reading test is administered to an individual student. When teachers use
questions to see how the students perform the test, they may be given two kinds of
answers. The first kind would be that the student obtained a score that placed her or him in
the top ten per cent of candidates who have taken that test, or in the bottom five percent; or
that she or he did better than sixty percent of those who took it. Hughes (1990:17) defined,
“A test which is designed to give this kind of information is said to be norm – referenced.”
According to Henning (1987), a norm – referenced test must have been administered to a
large sample of people. For the purpose of language testing and testing in general, norm –
reference tests also have strengths and weaknesses. Positively, the comparison can easily
be made with the performance or achievement of a large population of students.
Negatively, norm – referenced tests are usually valid only with the population on which
they have been normed.
Criterion–referenced tests are not without their share of weakness. The objectives of
criterion – referenced tests are often too limited and restrictive (Henning, 1987: 7). The

purpose of criterion – referenced tests is to classify people according to the fact that
whether or not they are able to perform some task or set of tasks satisfactorily. Moreover,
the test must match teaching objectives perfectly, so that any tendency of the field of
language measurement, criterion tests possesses two positive virtues: they are helpful in clarifying
objectives and they motivate students to a setting standard in terms of what they can do.
2.2.9 Objective Testing versus Subjective Testing
18
The difference between objective testing and subjective testing is that of scoring. If no
judgment is required on the part of the scorer, then the scoring is objective. A multiple–
choice item test, with the correct responses unambiguously identified, would be a case to
point. If judgment is called for, the scoring is said to be subjective. There are different
degrees of subjectivity in testing. The impressionistic scoring of a composition may be
considered more subjective than the scoring of short answers in response to questions on a
reading tsak. In Oller’s point of view (1979), many tests, such as cloze tests, “lie
somewhere between subjectivity and objectivity”. As a result, many testers are seeking
after objectivity in scoring not only for the sake of objectivity itself, but also for the great
reliability it brings.
2.2.10 Communicative Language Testing
In recent years, in parallel with the development of communicative language teaching
(CLT), communicative language testing has been the focus of a great number of researches
on language testing. Discussions have been centered on the desirability of measuring the
ability to take part in acts of communication. In sum, it is assumed that the main function
of language is to enable people to communicate with each other in society. As a result,
testing language ability is but testing communicative ability (including reading and
listening, the two receptive skills necessary for the process of communication, a two-way
process (Khoa, 1999). Communicative language testing may embrace a number of testing
approaches such as direct versus indirect testing, objective versus objective testing and etc.
Based upon the theory language ability is a complex and multifaceted construct. Bachman
(1991, p.678) proposes the following characteristics or communicative tests: “First, such
tests create an “information gap," requiring test takers to process complementary

information through the use of multiple sources of input. Test takers, for example, might
be required to perform a writing task that is based on input from both a short recorded
lecture and a reading passage on the same topic. A second characteristic is that of task
dependency, with tasks in one section of the test building upon the content of earlier
sections, including the test taker's answers to those sections. Third, communicative tests
can be characterized by their integration of test tasks and content within a given domain of
discourse. Finally, communicative tests attempt to measure a much broader range of
language abilities including knowledge of cohesion, functions, and sociolinguistic
19
appropriateness than did earlier tests, which tended to focus on the formal aspects of the
language grammar, vocabulary, and pronunciation.”
2.3 CHARACTERISTICS OF A GOOD TEST
In order to make a well – designed test, teachers have to take into consideration the various
factors such as the purpose of a test, the content of the syllabus, the students’ background
and so on. In addition to these factors, test characteristics play a very important role in
constructing a good test. According to a number of leading scholars in testing as Valette
(1977), Harrison (1983), Weir (1990), Carroll and Hall (1985), and Brown (1994) all good
tests have four main characteristics as: Validity, reliability, practicality, discrimination
2.3.1 Validity
2.3.1.1 Construct validity
Construct validity is defined by Anastasi (1982: 144) as “the extent to which the test may
said to measure a theoretical construct or trait. Each construct is developed to explain and
organize observed response consistencies. It derives from establishing inter-relationships
among behavioral measures focusing on a broader; more enduring and more abstract kind
of behavioral description construct validation requires the gradual accumulation of
information from a variety of source. Any data throwing light on the nature of the trait
under consideration for the conditions affecting its development and manifestations are
grist for this validity mill.”
Construct validity is viewed from a purely statistical perspective in much of the recent
American Bachman and Palmer (198l) literature. It is seen principle as a matter of the

posterior statistical validation of whether a test has measured a construct that has a reality
dependence of other constructs.
2.3.1.2 Content validity
The more a test simulates the dimensions of observable performance and accords with
what is known about that performance, the more likely it is to have content and construct
validity. According to Kelly (1978:8), content validity seems “an almost and completely
overlapping concept “with construct validity and for Moller (1982: 68), “the distinction
between construct and content validity language proficiency.” Anastasi (1982: 131) defines
20
content validity as “essentially the systematic examination of the test content to determine
whether it covers a representative sample of the behavior domain to be measured.” She
shows a fact of useful guideline for establishing content validity:
- The behavior domain to be tested must be systematically analyzed to make certain that
major aspects are covered by the test items with correct proportions:
- The domain under consideration should be fully described in advance, rather than being
defined after the test has been prepared.
- The content validity depends on the relevance of the individual test relevance of item content.
2.3.1.3 Face validity
Anastasi (1982:136) points out that face validity is not validity in the technical sense; it
refers, not to what the test actually measures, but to what it appears who take it, the
administrative personnel who decide on its use and other technically untrained observers.
Fundamentally, the question of face validity concerns report and public relations. Lado
(1961), Davies (1968), Ingram (1977), Palmer (1981), and Bachman and Palmer (1981)
have all discounted the value of face validity. If a test does not have face validity though, it
may not be acceptable to the students taking it, or the teachers using it. If the students do
not accept it as valid, their adverse reaction to it may mean that they do not perform in a
way that truly reflects their ability. Anastasi (1982:136) takes a similar line “Certainly if
test content appears irrelevant, inappropriate, silly or childish, the result will be poor co-
operation, regardless of the actual validity of the test. Especially in adult testing, it is not
sufficient for a test to be objectively valid. It also needs face validity to function effectively

in practical situations.”
2.3.1.4 Backwash validity
Language teachers operating in a communicative frame work normally attempt to equip
students with skills that are judged relevant to present or future needs, and to the extent
that tests are designed to reflect these, the closer the relationship between the test and the
teaching that precedes it, the more the test is likely to enhance construct validity. A
suitable criterion for judging communicative tests in the future might well be the degree to
which they satisfy students, teachers, and future users of test results, as judged by some
systematic attempt to gather data on the perceived validity of the test. If the first stage, with
21
its emphasis on construct, content, face, backwash validities, the bypassed procedures do
not suit the purpose for which it was intended.
2.3.1.5 Criterion-related validity
The concept is concerned with the extent to which test scores correlate with a suitable
external criterion of performance. Criterion-related validity consists of two types, (Davies,
1977), concurrent validly, where the test scores are correlated with another measure of
performance, usually an older established test, taken at the same time (Kelly, 1978: Davies
1983) and predicative validity, where test scores are correlated with some future criterion
of performance. (Bachman and Palmer, 1981)
2.3.2 Reliability.
Reliability is a necessary characteristic of any good test. It is of primary importance in the
use of proficiency tests for both public achievement and classroom tests. An
appropriateness of the various factors affecting reliability is important for the teacher at the
very outset, since many teachers tend to regard tests as infallible measuring instruments
and fail to realize that even the best test is indeed a somewhat imprecise instrument with
which to measure skills.
A fundamental criterion against any language test, which has to be judged, is its reliability.
The concern here is with how far we can depend on the results that a test produces. Three
aspects of reliability are usually taken into account. The first concerns the consistency of
scoring among different markers. The second is the concern of the tester how to enhance

the agreement between markers by establishing, and maintaining adherence to, explicit
guidelines for the conduct of this marking. The third aspect of reliability is that of parallel-
forms of a test to be devised. The concept of reliability is particularly important when
language tests within the communicative paradigm one considered. Moreover, Davies
(1968) stresses that reliability is the first essential for any test, but for certain kinds of
language test, they may be very difficult to achieve the appropriate result.
2.3.3 Discrimination
Another important feature of a test is its capacity to discriminate among the different
candidates and to reflect the differences in the performances of the individuals in the
group. The extent of the need to discriminate will vary depending on the purpose of the
22
test. In many classroom tests, for example, the teacher will be much more concerned with
finding out how well the pupils have mastered the syllabus and will hope for a cluster of
marks around the 80 per cent and 90 per cent brackets. Nevertheless, there may be
occurrences in which the teacher may require a test to discriminate to some degree in order
to assess relative abilities and locate areas of difficulty. Here below are the items in the test
should be spread over a wide difficulty level as follows:
- Extremely easy items
- Very easy items
- Easy items
- Fairly easy items below average difficult level
- Items of average difficult level
- Items above average difficult level
- Fairly difficult items
- Difficult items
- Very difficult items
- Extremely difficult items.
2.3.4 Practicability.
A test must be practicable, in other words, it must be fairly straight forward to administer.
The most obvious practical consideration concerning the test is overlooked. Firstly, the

length of time available for the administration of the test is frequently misjudged even by
experienced test writers; especially the whole test consists of a number of sub-tests.
Another practical consideration concerns the answer sheets and the stationary used. The
use of answer sheets, however, greatly facilitates marking and is strongly recommended
when large population of test takers are being tested. The question of practicability, is not
confined solely to oral tests, such written tests as situational composition and controlled
writing tests depend not only on the availability of qualified markers who can make valid
judgment concerning the use of language, etc; but also on the length of time available for
23
the scoring of the test. A final point concerns the presentation of the test paper itself, where
possible, it should be printed or typewritten and appear neat, tidy, and authentically pleasing.
2.4 TEST ITEMS FOR READING SKILL, WRITING SKILL, GRAMMAR, AND
VOCABULARY
2.4.1 Test items
Tests usually consist of a series of items. Cohen (1992: 488) defines, “an item is a specific
task to perform, can test one or more points or objectives. For example, an item may test
one point such as the meaning of a given word, or several points, such as an item that tests
the ability to obtain fact from a passage and then makes inferences based on the facts. He
also suggests that ‘sometimes an integrative item is really more a procedure than an item,
as in the case of a free composition which could have a number of objectives’.
Furthermore, he stresses that the objectivity of an item is determined by the way it is
scored. A multiple- choice item, for example, is objective in that there is only one right
answer. He also points out that a free composition may be more subjective in nature if the
scorer does not look for any one right answer but rather for a series of factors namely
creative, cohesion and coherence, grammar and mechanics.
Item types for testing comprehension are ordering tasks, open-ended comprehension
questions and answers, dichotomous items, summary writing, note-taking, guessing
meaning of unfamiliar words from the context, making references, information transfer,
multiple-choice items, jumbled sentences, jumbled paragraphs, completion exercises,
matching words (sentences), cursory reading, gap-filling cloze test. Item types for testing

writing skills are multiple- choice items, matching items, editing, dictation, short-answer
items, summary writing, sentence transformation, free writing, compositions and essays,
error-recognition items, ‘broken-sentence’- items. Item types for testing grammar are multiple-
choice items, completion items, matching items, completion items, word transformation.
2.4.2 Language components and language skills.
Linguistics is the study of phonology, syntax, and semantics. The first, phonology, is
concerned with the sound of a language and the way in which these are structured into
segments such as syllables and words. The second, syntax, with the way we string words
together in phrases, clauses, and sentences to build well-formed sentences. Moreover, the
24
third, semantics, with the way we assign meaning to a certain unit of a language in order to
communicate. Each of these has additional levels, phonology is supplemented by
phonetics, the study of the physical characteristics of sound; syntax by morphology is the
study of the structure of words and semantics by pragmatics is the study of the situational
constrains on meaning. The language components we focus on in this minor thesis are
grammar, vocabulary, and phonology. Grammar belongs to syntax. Vocabulary belongs to
semantics. And phonology belongs to phonetics. In addition, the language skills, which we
want to test are reading and writing skills.
2.4.3 The test item types used to evaluate language components and language skills.
Test item types for Reading and Writing skills and Grammar, Vocabulary
Table 1: Test item types for Reading and Writing skills and Grammar, Vocabulary
Reading Writing Gram. and Usage Vocabulary
-Multiple-choice
Item
- Short-answer items
- Cloze items
- Words and sentence
matching
-Picture and sentences
matching

- True-False items
- Completion items
- Questions and
Answers
-Split sentence
- Cloze
- Reading
Comprehension
(open ended questions)
- Context-based
- Sentence building
- Sentence
transformation
- Sentence completion
- Letter writing of
application
- Eliciting a narrative
from a series of
pictures
- Controlled writing
tasks: a graph, plan or
drawing.
- Free writing: letters,
postcards, diaries,
forms, directions,
instructions.
- Reordering
-Multiple-choice
items
- Rearrangement

items
- Completion
items
- Transformation
items
- Error-
Recognition
Multiple-choice items
- ‘Broken
sentence’ items
- Pairing and
matching items
- Multiple-choice
items
- Matching
- Word
formation
- Items involving
synonyms
- Reordering
- Definitions
(explaining the
meaning of each
word.)
- Sentence
Completion
- Gap filling
25

×