Designing & evaluating an English reading test for the non-majors of Civil Engineering at Haiphong private university

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.24 MB, 51 trang )

Nguyen Thi Phuong Thu August 2005
Vietnam national university, hanoi
College of foreign languages
---------------
Designing & evaluating an English reading
test for the non-majors of Civil Engineering
at Haiphong private university
Thiết kế và đánh giá một bài kiểm tra tiếng anh chuyên ngành
cho sinh viên xây dựng dân dụng tại
trờng đại học dân lập hải phòng
M.A. minor thesis
Field: methodology
Code: 50702
Course: k11
By : Nguyen Thi Phuong Thu
Supervisor : Tran Hoai Phuong, MEd.
Hanoi - August 2005
Nguyen Thi Phuong Thu August 2005
Acknowledgements
During the process of further studying and conducting this research I was really
honored to receive guidance, assistance, and encouragement from various lecturers as well as
supervisors among whom I would like to acknowledge my sincere thanks to the leaders of the
College of Foreign Languages who have given me permission and created favorable conditions
for study and research.
I would also like to thank my supervisor, Mrs.Tran Hoai Phuong, Med, who really
sympathized with me and also gave me great help as well as invaluable guidance and
encouragement from the very start to the end of my research.
It is also my pleasure to give my special thanks to the students of classes XD 501, XD
502 and XD 503 at Hai Phong Private University who enthusiastically took part in doing the
test and helped me collect the results of the test.
I also benefited greatly from talks and discussions with my colleagues so let me thank

all of them for what they have directly or indirectly contributed.
And finally I really want to thank my beloved husband who always gives great support
to my further study.
Nguyen Thi Phuong Thu
Nguyen Thi Phuong Thu August 2005
List of abbreviations
1. HPU Haiphong Public University
2. CE Civil Engineering
3. CEE Civil Engineering English
4. ESP English for Specific Purposes
5. MCQ Multiple Choice Question
6. T True
7. F False
8. M Mean
9.
Σ
Sum of
10. N The number of the scores
11. x The raw score
12. f The frequency with which a score occurs
13. H The highest value
14. L The lowest value
15. SD Standard Deviation
16. FV Item difficulty
17. R The number of the correct answers
18. ve very easy
19. e easy
20. d difficult
21. vd very difficult
22. D Iitem discrimination

23. CU The number of the correct asnwers of the upper half
24. CL The number of the correct asnwers of the lower half
25. gd good discrimination
26. md bad discrimination
27. bi bad item
Nguyen Thi Phuong Thu August 2005
28. p Spearman rho correlation coefficient
29. SU Score on the upper half
30. SL Score on the lower half
Nguyen Thi Phuong Thu August 2005
Table of contents
Acknowledgement
List of abbreviations
Part I: Introduction
1.Rationale
2.Aims of the study
3.Scope of the study
4.Methods of the study
5.Design of the study
Part II: Development
Chapter one: Literature review
1.1.Language testing
1.2.Communicative language tests
1.3.Testing reading skills
1.3.1.Multiple choice questions
1.3.2.Short answer questions
1.3.3.Cloze
1.3.4.Selective deletion gap filling
1.3.5.C tests
1.3.6.Coloze elide

1.3.7.Information transfer
1.3.8.Jumbled sentences
1.3.9.Matching
1.3.10.Jumbled paragraphs
1.4.Major characteristics of a good test
1.41.Reliability
1.4.2.Validity
1.4.2.1.Content validity
Nguyen Thi Phuong Thu August 2005
1.4.2.2.Face validity
1.4.2.3.Criterion-related validity
1.4.2.4.Construct validity
1.4.3.Practicality
1.4.4.Discrimination
1.5.Achievement tests
1.5.1.Class progress test
1.5.2.Final achievement test
Summary
Chapter two: Methodology
2.1.A quantitative study
2.2.The selection of participants
2.3.The materials
2.4.Methods of data collection and data analysis
2.5.Limitations of the research
Summary
Chapter three: Discussion
3.1-The content area of the test
3.2-The relative weights of the different parts of the test
3.3-Constructing the test
3.4-Administering the test

3.5-Marking the test
3.6-Test scores interpreting and evaluation
3.6.1.The frequency distribution
3.6.2.The central tendency
3.6.2.1.The mode
3.6.2.2.The median
3.6.2.3.The mean
Nguyen Thi Phuong Thu August 2005
3.6.3.The dispersion
3.6.3.1.The low-high
3.6.3.2.The range
3.6.3.3.The standard deviation
3.7-Test item analysis and evaluation
3.7.1.Item difficulty
3.7.2.Item discrimination
3.8.Estimating reliability
Summary
Part III: Conclusion and recommendations
References
Appendices

Nguyen Thi Phuong Thu August 2005
Part I: In troduction
1.Rationale
Testing is a matter of concern to all teachers - whether we are in the classroom or
engaged in syllabus/ materials, administration or research. We know quite well that good tests
can improve our teaching and stimulate student learning. Although we may not want to
become a measurement expert we may have to periodically evaluate student performances and
prepare reports on student progress.
Haiphong Private University (HPU) is a university in which there are a number of

classes of Civil Engineering (CE) for students of Construction Department. Generally
speaking, non-majors, especially the students of this department, lack background knowledge
of English. The non-majors of CE have chances to learn General English (GE) during their
first three terms to prepare for their 120 periods of English for Specific Purposes (ESP) in the
fourth term. In fact, this type of English is quite demanding for them and many had to admit
that they could not learn it well. As a result, many students failed after each final examination.
The causes for the above situation are various. It might be because some students are
either too hesitant or too lazy to learn anew subject. It might also be because some students
could not overcome the difficulties they usually meet during their study, for example their ESP
is too new or too demanding for them, or they have to learn many periods per week to leave
time for other subjects. However, the reason which is no less important and which needs taking
into account is the matter of testing. In general, teachers at HPU are well-qualified and when
teaching they are quite enthusiastic with good teaching methodology. However, the results of
their students’ tests are not always satisfactory, the scores they gained were often lower than
expected. Moreover, we teachers cannot deny the fact that sometimes the test results do not
accurately reflect the testees’ language competence.
According to Brown (1994a: 373) and Hughes (1989: 1) “A great deal of language
testing is of very poor quality. Too often language testing has a harmful effect on teaching and
learning and too often they fail to measure accurately whatever it is they are intended to
measure.”
Nguyen Thi Phuong Thu August 2005
For all the above reasons the author of this research study would like to take this
opportunity to undertake the study entitled “Designing a reading test for the non-majors of
Civil Engineering at Haiphong Public University” with a view to evaluating the students’
reading ability after one term’s study last school year (2004-2005) as well as to gaining some
knowledge and experience of foreign language testing for herself after completing the study.
2.Aims of the study
The minor thesis is aimed at designing an achievement test of ESP reading which
would be conducted in a class of Civil Engineering English at HPU. The test was considered as
a final examination. Then the results of the test will be analysed, evaluated, and interpreted.

The test takers are non - English - majors.
The specific aims of the research are:
 to assess the learners’ achievement in improving reading skill with English of Civil
Engineering after 120 period reading course.
 to measure their aptitude for the reading skill.
 to diagnose their strength and weakness in reading the subject matter.
 to find out whether or not the test satisfies the qualities of a good test. From there
the test will measure the effectiveness of the teacher’s teaching. If the test is not a
good one, some suggestions will be made for a better test form.
3.Scope of the study
“Not all language tests are of the same kinds. They differ with respect to how they are
designed, and what they are for; in other words, in respect to test method and test purpose.”
(Mc Namara, 2000: 5). For example, in terms of method, there are paper-and-pencil language
tests, performance tests, ect. And in terms of purpose, there are achievement tests, proficiency
test, and so on. In fact, the same form of test may be used for different purposes, although in
other cases the purpose may affect the form.
Due to the limitation of time and ability, it is impossible for the author to design tests
of all these types or of all the four language skills (speaking, writing, listening and reading).
Nguyen Thi Phuong Thu August 2005
Therefore, this minor thesis is limited to designing and evaluating an achievement test of ESP
reading for the non-majors at HPU and the reading tested was for communicative purposes.
4.Methods of the study
In this minor thesis the author designed an achievement test of reading, administered it
and then evaluated it, so the method adopted is quantitative. The data will be collected through
testing the students’ reading ability of Civil Engineering English.
5.Design of the study
The study is composed of three parts:
*Part I is the presentation of basic information such as the rationale, the scope of the study,
the aims of the study, the methods of the study and finally the design of the study.
*Part II includes three chapters:

+ Chapter one is the literature review in which the literature that is related to language
testing and major characteristics of a good reading test is presented.
+ Chapter two is concerned with research methodologies including the methods
adopted in doing the research, the selection of participants, the materials, the methods of data
collection and data analysis.
+ Chapter three is the discussion, which is the main part of the study. This chapter
reviews how a reading test of Civil Engineering for the non-majors at HPU was designed,
administered, and then evaluated.
*Part III includes the conclusion and recommendations for further research on the topic.
Following these parts are the references and appendices.
Nguyen Thi Phuong Thu August 2005
Part II : De velopment
Chapter one : Literature review
This chapter will provide an overview of the theoretical background of the research. It
is composed of five small sections. Section 1.1 brings a significant insight into the concept of
language testing. Section 1.2 is the introduction of communicative language tests. Testing
reading skills will be discussed in section 1.3 which is followed by section 1.4 with the
investigation into major characteristics of a good test. The final area to be mentioned is a brief
review of achievement tests which is presented in section 1.5.
1.1.Language testing
An understanding of language testing is relevant both to those who are actually
involved in creating language tests, and also to those who are involved in using tests or the
information tests provide in practical research contexts. For this very reason, this section
wishes to take a close look at what a language test is.
Most researchers agree that language tests play many important roles in life. Firstly the
moment one does a test can be considered an important transitional moment in his life, for
example, a pupil wishing to enter a university has to pass the entrance tests, or a job seeker has
to do a certain test so that the employer will know whether he is competent, or if somebody
needs to drive a motor or a car, he or she has to pass a driving test, ect. Secondly, language
tests are also important to many occupations. We teachers rarely teach without testing our

students’ performance in the subjects. Tests will help us to put them in right places; therefore,
language tests, if used properly, can be considered a valuable teaching device for any teacher,
and they will contribute positively to the development of both teachers and learners. Last but
not least, any researcher who needs measurement of the language proficiency of the subjects
cannot do it without using an already existing test or designing his or her own test.
As for Caroll (1968) a test in general will certainly tell something about a testee’s
characteristics. Thanks to the results from his test, it is possible for a teacher to judge whether
this student is good or bad at the subject tested. Caroll provides the following definition of a
Nguyen Thi Phuong Thu August 2005
test: “a psychological or educational test is a procedure designed to elicit certain behavior
from which one can make inferences about certain characteristics of an individual.” (Caroll,
1968: 46)
According to Hughes (1989: 9), tests can be classified as follow:
 Proficiency tests
 Achievement tests
• Class progress tests
• Final achievement tests
 Diagnostic tests
 Placement tests
 Aptitude or Prognostic tests
 Direct tests versus indirect tests-Discrete- point tests versus intergrative
tests
 Norm-referenced tests versus criterion-referenced tests
 Objective tests versus Subjective tests
 Communicative tests
Generally there are some approaches to tests, for example the essay-translation
approach, the structuralist approach, the integrative approach, or the communicative approach.
However, in this minor thesis, I would like to choose only the communicative approach to
testing. This approach focuses on how the language is used in communication (‘meaning’
rather than ‘form’). This attempts to obtain different profiles of a learner’s performance in the

language.
The development and the use of language tests involve an understandingof the nature
of communicative language use and language ability, on the one hand, and of measurement
theory, on the other. Each of these areas is complex in its own right.
In short, like teaching, testing is important to any teacher as well as for any student. It
is difficult to deny that testing cannot be separated from teaching, testing can even be seen part
of teaching. Therefore, we teachers should pay great attention to the issue of testing in our
teaching.
Nguyen Thi Phuong Thu August 2005
Nguyen Thi Phuong Thu August 2005
1.2.Communicative language tests
There is one thing that is essential to the activities of designing a test and interpreting
the meaning of test scores. It is the view of language and language use embodied in the test.
The term ‘test construct’ refers to these aspects of knowledge or skill possessed by the
candidate which are being measured. To define test construct it is important to be clear about
what knowledge of language consists of and how that knowledge is used in actual performance
(i.e. language use). It is also essential to understand what view the test takes of language use
because if the view the test takes is different, then the test will be different. As a result, the
reporting of score will be different, and the test performance will be interpreted differently.
Therefore, the difference of format between tests is not just incidental; it implies a difference
between views of language and language use. Accordingly, communicative language tests are
different from other types of tests such as discrete point test or integrative and pragmatic tests
in the following aspects:
According to Mc Namara (2000: 17) discrete point test focuses on students’ knowledge
of the grammatical system, of vocabulary and aspects of pronunciation and tends to test these
aspects of knowledge in isolation. With this type of test, multiple choice questions are most
suitable. This discrete point tradition of testing is seen as focusing too much on knowledge of
the formal linguistic system for its own sake rather than on the way the knowledge is used to
achieve communication.
Aslo as for Mc Namara using integrated tests is a new orientation in which integrated

knowledge of relevant systemic features of language (pronunciation, grammar, vocabulary)
with an understanding of context is deployed. Yet, these tests are regarded as time consuming
and difficult to score. For example for an oral interview, the test will involve comprehension of
extended discourse (both spoken and written), and as a result besides the disadvantages
mentioned above it also requires trained raters.
Because of those disadvantages another type of test, pragmatic test, replaced the old
ones. It focuses less on knowledge of language and more on psycholinguistic processing
involved in language use. With this type, a cloze test was seen the most suitable and was once
believed to be easy to construct, relatively easy to score. However, it soon turned out to be
measuring the same kinds of things as discrete point tests of grammar and vocabulary. It also
failed to test communicative skills.
Nguyen Thi Phuong Thu August 2005
In the early 1970s thanks to Hyme’s theory of communicative competence (an
understanding of language and the ability to use language in context, particularly in terms of
the social demand of performance, i.e. knowing a language is more than knowing its rules of
grammar) communicative language tests developed and it has the two following features:
’They are performance tests which require assessment to be carried out when
the candidate is engaged in communication, either receptive or productive, or both.
They see language as a sociological phenomenon, focusing on the external,
social functions of language while integrative and pragmatic tests see language as an internal
phenomenon. With this test, the use of authentic texts and real world tasks may be developed.’
(Mc Namara, 2000: 16).
One of its distinguishing feature that supersedes other types of tests is that besides systemic
features of language, it requires students’ careful study of the communicative roles and tasks.
All the reasons discussed above are regarded as a strong impetus that initiates this minor thesis
into designing a reading test of ESP for communicative purpose, i.e. it is a communicative
language test.
1.3-Testing reading skills
In a reading test, test items are often set basing on the text itself. And often within the
same test more than one typed of item, maybe two, three or more types of the following items

are used:
1.3.1. Multiple-choice questions (MCQs)
This is one of the most popularly used types for setting a reading comprehension test.
When doing this test the candidate is required to select the answer from a number of given
options, only one of which is correct. The marking is totally objective. Selecting and setting
items are, however, subjective processes, and the decision about which is the correct answer is
a matter of subjective judgment on the part of the item writer.
Nguyen Thi Phuong Thu August 2005
1.3.2. Short answer questions
In the test there are questions which require the candidates to write down specific
answers in spaces provided on the question paper.
1.3.3. Cloze
This type is also familiar with students. In the cloze procedure, words are deleted from
a text after allowing a few sentences of introduction. The deletion rate is mechanically set,
usually between every fifth and eleventh word because deleting too many or too few words can
cause problems with test validity. Candidates have to fill each gap by supplying the word they
think has been deleted.
1.3.4. Selective deletion gap filling
It is selecting items for deletion based upon what is known about language, about
difficulty in text and about the way language works in a particular text.
1.3.5. C-Tests
In C-test every second word in a text is partially deleted. In an attempt to ensure
solutions, students are given the first half of the deleted word. The examinee completes the
word on the test paper and an exact word scoring procedure is adopted.
1.3.6. Cloze elide
In cloze elide test, words that do not belong to the original text are inserted into a
reading passage and candidates have to indicate where these insertions have been made.
1.3.7. Information transfer
This is a task where the information transmitted verbally is transferred to a non-verbal
form, e.g. by labeling a diagram, completing a chart or numbering a sequence of events. This

type of test is an objective method for testing the test takers’ understanding of the texts.
Nguyen Thi Phuong Thu August 2005
1.3.8.Jumbled sentences
This type of test is intended to test the student’s understanding of a sequence of stages
in a process or events in a narrative. A successful student is the one who can reorder jumbled
sentences or unscrambled sentences of a story correctly.
1.3.9.Matching
Like MCQ test, matching is a familiar type of testing reading comprehension. With this
test, candidates are required to identify the relationships between a list of entries in one
column with a list of responses in another column. Candidates may have to match word with
word, sentences with sentence, picture with sentence, etc.
1.3.10.Jumbled paragraphs
Similar to tasks involving jumbled sentences, test tasks with jumbled paragraphs
require students to rearrange the given paragraphs in the correct order. To do this students
have to read through these paragraphs to get the main idea of the whole text. In short, for
testing reading abilities different methods have been recommended and a teacher may use this
one or that one depending on certain purposes. For example, to develop the communicative
nature of tests the use of short answer questions, selective gap filling, C-tests, information
transfer techniques or other restricted response formats are often preferred.
1.4. Major characteristics of language tests
Tests can serve pedagogical purpose, to be sure. The most important consideration in
designing a language test is its usefulness. This can be defined in terms of their qualities such
as reliability, validity, practicality, interactiveness, impact, or authenticity, etc. Among these
the four qualities which will be discussed below are more critical for good tests.
1.4.1. Reliability
Reliability is apparently an essential quality of test values; if the scores of a test are not
relatively consistent, they fail to provide us with the information about the ability we want to
Nguyen Thi Phuong Thu August 2005
measure. Reliability is considered a fundamental criterion against which any language test has
to be judged.

‘Reliability is often defined as consistency of measurement’ (Bachman & Palmer,
1996:19). A reliable test score will be consistent across different characteristics of the testing
situation. Thus, reliability can be considered to be a function of the consistency of scores from
one set of test tasks to another. Or in other words, tests should not be plastic in their
measurements: if a student takes a test at the beginning of the course and again at the end, any
improvement in his score should be the results of differences in his skills and not inaccuracies
in the test. In the same way, it is important that the student’s score should be the same (or as
nearly the same as possible) whether he takes one version of the test or another and whether
one person marks the test or another. Reliability also means ‘the consistency with which a test
measures the same thing all the time’(Harrison, 1987). This can be presented in the figure
below:
Reliability
Figure 1: Reliability
There are therefore three aspects to reliability: the circumstances in which the test is taken,
the way in which it is marked, and the uniformity of the assessment it makes.
According to Hughes (1989) there are two components of test reliability: the performance of
candidates from occasion to occasion and the reliability of the scoring. Therefore, to make
tests more reliable Hughes (1989) gives a long list of and clear instructions for what we should
do:
- take enough samples of behavior,
- do not allow candidates too much freedom in choosing what and how to answer,
- write unambiguous items,
- provide clear and explicit instructions,
- ensure that tests are well laid out and perfectly legible,
- make sure candidates are familiar with format and testing techniques,
- provide uniform and non-distracting conditions of administration,
- use items that permit scoring which is as objective as possible,
- make comparisons between candidates as direct as possible,
Scores on test tasks with
characteristics A

Scores on test tasks
with characteristics A’
Nguyen Thi Phuong Thu August 2005
- provide a detailed scoring key,
- train scorers,
- agree on acceptable responses and appropriate scores at outset of scoring,
- identify candidates by number, not name, and
- employ multiple, independent scoring. (Hughes, 1989: 36-42)
The concept of reliability is particularly important when considering language tests
withinthe communicative paradigm (Porter, 1983). Davies (1965: 14) also shares the same
view but he also admits that ‘reliability is the first essential for any test; but for certain kinds
of language test may be very difficult to achieve.’
1.4.2. Validity
The second quality that affects test usefulness is validity. A test is said to be valid if it
measures what it is intended to measure. Or in other words, the test may be valid for some
purposes, but not for others. For example, if the purpose of a test is to test ability to
communicate in a foreign language, then it is valid if it actually tests ability to communicate. If
the test is full of questions of grammar, then the test cannot be considered valid. Moreover, if a
test is to test reading ability, but it also tests writing, for example, then the test fails to have the
validity for testing reading.
However, it is impossible to say whether a test is valid or not valid at all because there
are degrees of test validity, i.e. this test may be more valid than that one. Therefore, Moore
(1992) defined validity as “the degree to which a test measures what it is supposed to
measure” . There are different types of validity such as content, face, construct, criterion-
related validity, and they will be all discussed below.
1.4.2.1.Content validity
Among different types of validity, content validity is said to be the most important one,
but it is also the simplest. “A test is said to have content validity if its content constitutes a
representative sample of the language skills, structures, etc. with which it means to be
concerned.” (Hughes, 1989: 22). In order to judge whether or not a test has content validity,

we need a specification of all the related aspects that the test is meant to cover, including the
Nguyen Thi Phuong Thu August 2005
skills or the structures. Such a specification should be made at a very early stage in test
construction.
According to Weir (1990: 24) the more a test stimulates the dimensions of observable
performance and accords with what is known about that performance, the more likely it is to
have content validity and construct validity. Thus, for Kelly (1978: 8) content validity seems
‘an almost completely overlapping concept” with construct validity, and for Moller (1982: 68):
‘the distinction between construct and content validity in language testing is not always very
marked, particularly for tests of general language proficiency.’ Slightly different from other
researchers, Anastasi (1982: 131) defined content validity as: ‘essentially the systematic
examination of the test content to determine whether it covers a representative sample of the
behavior domain to be measured.’
So we could see that content validity has been defined differently, but most researchers
agree that content validity is highly important for the two following reasons. First, the greater a
test’s content validity is, the more likely it is to be an accurate measure of what it is supposed
to measure. A test in which major areas identified in the specification are under-representedor
not represented at all is unlikely to be accurate. Secondly, such a test is likely to have harmful
backwash effect. Areas which are not tested are likely to become areas ignored in teaching and
learning.
1.4.2.2. Face validity
A test is said to have face validity if it looks as if it measures what it is supposed to
measure. Face validity is hardly a scientific concept, yet it is very important. A test which does
not have face validity may not be accepted by candidates, teachers, education authorities or
employers.
1.4.2.3. Criterion-related validity
There are essentially two kinds of criterion-related validity: concurrent validity and
predictive validity. According to Viete (1992), concurrent validity is used to refer to the
relationship between the test results and the results of another assessment (using an
appropriate, reliable and validated assessment procedure) which was made at approximately

Nguyen Thi Phuong Thu August 2005
the same time. And predictive validity concerns the degree to which a test can predict
candidate’s future performance.
1.4.2.4 Construct validity
Like reliability, construct validity is essential to the usefulness of any language test.
The term construct validity is used to refer to the extent to which we can interpret a given test
score as an indicator of the ability(ies) or construct(s), we want to measure. The purpose of
construct validation is to provide evidence that underlying theoretical constructs being
measured are themselves valid. Typically, construct validation begins with a psychological
construct that is part of a formal theory. The theory enables certain predictions about the
construct variable will behave or be influenced under specified conditions. The construct is
then tested under the conditions specified. If the hypothesized results occur, the hypotheses are
supported and the construct is said to be valid. Often this will involve a series of tests under a
variety of conditions.
Test validity is the one that is always paid the most attention to since it is an
indispensable quality of all good tests. When constructing a test, the first thing to be focused
on is test validity. Hughes (1989: 22) agrees that if in a test important parts are not defined or
not presented, it will fail to be accurate. He notes that “the greater a test's content validity is,
the more likely it is to be an accurate measure of what it is to measure.”
1.4.3. Practicality
Another quality of a good test which should not be forgotten is its practicality.
Although it is different in nature from other qualities, practicality is not less important. Unlike
reliability and validity, practicality does not pertain to the uses that are made of test scores, but
primarily to the ways in which the test will be implemented in a given situation, and to whether
the test will be developed and used at all. Practicality often affects a tester’s decisions during
the development of a test, i.e., at every stage of his testing.
Practicality can be defined as ‘the relationship between the resources that will be
required in the design, development, and use of the test and the resources that will be
available for these activities’. (Bachman & Palmer, 1996: 35). This relationship can be
represented as in the figure below:

Nguyen Thi Phuong Thu August 2005
Available resources
Practicality=
Required resources
When practicality ≥ 1, the test development and use is practical
When practicality< 1, the test development and use is not practical.
In a nutshell, when designing a test the tester should always bare in mind this quality-
practicality-to ensure that the test is as economical as possible, both in time (preparation,
sitting and marking) and in cost (materials and hidden costs of time spent). In other words, a
practical test is the one which can minimize the use of the available resources, i.e., the required
resources must not be more than the available resources.
1.4.4. Discrimination
Finally, a discussion of the basic concepts behind testing would be incomplete without
the treatment of the closely related idea of discrimination. According to Harrison (1994:14)
discrimination is ‘the extent to which a test separates the students from each other.’ However,
the extent of discrimination varies according to each kind of test. For instance, an achievement
test should result in a wide range of scores because it is easier to make decisions about where
to separate one group of students from another so that they can be awarded different grades. A
diagnostic test, however, may be intended to show that nearly all students have learnt the
material tested, and in this case they should all get fairly high scores.
1.5. Achievement tests
Different researchers have different points of view of an achievement test. According
to Harrison (1983: 65) ‘designing and setting an achievement test is a bigger and more formal
operation than the equivalent work for a diagnostic test, because the student's result is treated
as a qualification which has a particular value in relation to the results of other students. An
achievement test involves more detailed preparation and covers a wide range of material, of
which only the sample can be assessed.’
Nguyen Thi Phuong Thu August 2005
Heaton (1988) defines achievement tests as the ones that are “based on what the
students presumed to have learnt, not necessarily on what they have actually learnt nor on

what has actually been taught.”
In Brown’s point of view “an achievement test is related directly to classroom lesson,
units or even a total curriculum within a particular time frame.” (Brown, 1994: 259). In other
words, an achievement test measures a student’s mastery of what should have been taught. It is
thus concerned with covering a sample (or selection), which accurately represents the contents
of a syllabus or a course book. Unlike progress test, achievement test should attempt to cover
as much of the syllabus as possible. If we confine our test to only part of the syllabus, the
contents of the test will not reflect all that the student has learnt.
Achievement test can be subdivided into class progress tests and final achievement tests.
1.5.1. The class progress test
The class progress test is often conducted during the course and is developed by the
teacher himself after each chapter or each term. He constructs such type of test to judge how
successful his teaching is and also to find out what his students have achieved from his
teaching. The class progress test is a teaching device and can be considered a good chance for
the students to prepare for the final achievement test.
1.5.2. The final achievement test
The final achievement test is more formal and intended to measure achievement on a
larger scale (annual exams, entrance exams, final exams). The final achievement test is not
written and administered by the teacher himself, but maybe by ministries of education, boards
of examiners, or by members of teaching institutions. A final achievement test is often based
on an adopted syllabus and its approach, either syllabus-content approach or syllabus-objective
approach. If the test is based on the former, its contents should be based directly on a course
syllabus or on the textbooks and other materials chosen. If it is based on the latter, its contents
are based directly on the objectives of the course.
Nguyen Thi Phuong Thu August 2005
Summary
In this chapter I have briefly dealt with the concept of a language test, how it is defined
and what is important in designing it. Moreover, I also mentioned the concept of
communicative language ability in which communicative competence was also discussed.
Also, in this chapter the definition of an achievement test as well as testing reading skills were

presented because they play an important role in the process of doing this research.
Nguyen Thi Phuong Thu August 2005
Chapter two: Methodology
This chapter will include a brief introduction of a quantitative study, the selection of
participants who took part in doing the test, and the materials from which the test items were
taken. The methods of data collection and data analysis are presented afterwards. Finally come
the limitations of the research.
2.1.A quantitative study
Like qualitative research, quantitative research comes in many approaches including
descriptive, correlational, exploratory, quasi-experimental, and true-experimental techniques.
As a teacher of Civil Engineering English, I designed this reading test to understand
better how things are really operating in my own classroom as well as to describe the
performance of my learners in the reading skill. After 120 period reading course 50 students
were chosen from three different classes (XD501, XD 502, XD 503) to do a reading test in the
time given (60 minutes) and then the results collected from the testing papers would be
described in different terms with the use of the descriptive statistics technique. The
correlational research technique was also used to find out the reliability coefficient latter in the
study.
2.2.The selection of Participants
The students at Haiphong Private University mainly come from different towns and
cities in the North of Vietnam. They are generally aged between 18 and 22, or older.
At the university, they study for eight terms in four years. There students are classified
into majors and non-majors of English. The latter usually have to learn a foreign language, in
this case English, in only two years of their whole student lifeIn the first three terms, they
study General English and in the fourth term English for Specific Purposes (ESP). After two
years’ English learning, they are required to be able to read and translate their ESP at
intermediate level. However, students often have varying English levels prior to the course due
to the fact that at secondary school they learned different languages, including Russian,
French, and Chinese. It is therefore important for teachers to apply appropriate methods in

Designing & evaluating an English reading test for the non-majors of Civil Engineering at Haiphong private university

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về