Độ “tin cậy” và độ “xác trị” trong xây dựng, thiết kế bài kiểm tra đánh giá năng lực tiếng Anh, những điểm cần lưu ý đối với giảng viên

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (289.49 KB, 6 trang )

NGHIÊN CỨU - TRAO ĐỔI v

ĐỘ “TIN CẬY” VÀ ĐỘ “XÁC TRỊ”
TRONG XÂY DỰNG, THIẾT KẾ BÀI KIỂM TRA
ĐÁNH GIÁ NĂNG LỰC TIẾNG ANH,
NHỮNG ĐIỂM CẦN LƯU Ý ĐỐI VỚI GIẢNG VIÊN
NGUYỄN MẠNH TUẤN *
Học viện Khoa học Quân sự, 

*

Ngày nhận bài: 24/4/2018; ngày sửa chữa: 22/5/2018; ngày duyệt đăng: 20/6/2018

TÓM TẮT
Kiểm tra là một phần không thể thiếu trong các chương trình học ngoại ngữ nói chung, và chương
trình học tiếng Anh nói riêng. Từ thực tế đó, mối quan tâm tới “độ tin cậy” và “độ xác trị” của
một bài kiểm tra năng lực tiếng Anh là thực sự quan trọng. Bởi một thực tế là hầu hết các giáo
viên tiếng Anh hiện nay hầu như chưa được đào tạo về kiểm tra đánh giá, mà họ hầu hết dựa vào
khả năng trực giác, kinh nghiệm và giáo trình để xây dựng, thiết kế một bài kiểm tra tiếng Anh.
Từ những lý do nêu trên, trong khuôn khổ bài viết này, một vài vấn đề có liên quan tới quá trình
xây dựng và thiết kế một bài kiểm tra năng lực trong chương trình học tiếng Anh sẽ được nêu lên
và thảo luận.
Từ khóa: bài kiểm tra năng lực tiếng Anh, chương trình học tiếng Anh, độ tin cậy, độ xác trị

1. INTRODUCTION
As every nation is increasingly integrating into
international business, not only the governments
but also the community at large recognise that
a high level of English language ability among
workforce is imperial for the success in almost
every aspect of life. Therefore, a widespread

concern for a standard of English proficiency,
together with a buoyant demand for the validity of
English proficency test, has been addressed among
educational institutions. In order to ensure a high
standard of English proficiency among English
language learners, a number of efforts have been

put by education experts to provide reliable and
valid English tests as many as possible.
Following a critical review of literature
on English Proficiency Test and its validy and
reliability, the current paper is hoped to highlight
the significance of reliability and validity of
English Proficency Test. Therefore, it is essential,
in this article, to work out the following basic
points:
- What English Proficiency Test is;
- What the reliability of English Proficiency
Test is and how to achieve it;
KHOA HỌC NGOẠI NGỮ QUÂN SỰ

Số 14 - 7/2018

87

v NGHIÊN CỨU - TRAO ĐỔI
- What the validity of English Proficiency Test
is and how to achieve it.
2.

DEFINITION
PROFICIENCY TEST

OF

ENGLISH

The nature of the term “language proficiency”,
for a long time, is still an area of disagreement
among eminent linguistic and educational experts
with no clear definition. A number of researchers
like Bachman and Palmer (1996) favour the
term “ability” rather than “proficiency”. Brown
(2004) shares the view with Bachman and
Palmer, explaining that the term “ability” sounds
more consistent with the current understanding
that specific components of language need to be
assessed separately (p. 71). However, there is a
general agreement on both terms, that is to say
the constructs that can be specified and measured.
Bachman and Palmer (1996) recommend that
the language ability consist of four component
skills: listening, speaking, reading and writing.
McNamara (2000) further suggests that the
integrative nature of language ability should
be evaluated by integrating several isolated
components (grammar, lexicology) with skill
performances (reading, listening, writing, and
speaking). Meanwhile, Hughes in his Testing for
language teachers (2003, p.44) mentions that

Proficiency tests refer to the ones that are designed
to measure people’s ability in a language.
From these ideas mentioned above, an English
Proficiency Test can be defined as a kind of test to
test language ability of English language learners
in terms of language components and language
skill performances.
3. QUALITIES FOR A GOOD ENGLISH
PROFICIENCY TEST
There are six qualities needed for an English
Proficiency Test, stated by Bachman (1996),
namely reliability, construct validity, authenticity,
interactiveness, impact and practicality. He further

88

KHOA HỌC NGOẠI NGỮ QUÂN SỰ

Số 14 - 7/2018

indicates that the conventional means to define
such test qualities has been to some extent intuitive.
In his view, therefore, test designers should try to
attain the balance among these qualities.
As a matter of fact, the discussion of all these
qualities requires considerable time and space.
Within this paper, the first two qualities, reliability
and construct validity will be focused. Accordingly
three major issues relating to language test reliability
and validity will be clarified. They include:

- Define English proficiency test reliability and
test validity;
- Factors influencing English proficiency test
reliability and validity;
- How to provide reliability and validity in
English proficiency test.
3.1. English Proficiency Test Reliability
A lot of attempts have been made to provide
an insight into the reliability of language
proficiency test. To define a language proficiency
test, Henning (1987) holds that only when does
an examinee’s result of the same or similar test
prove consistent, a test is regarded as reliable.
Brown (1996) demonstrates reliability by making
comparison of language testing and measuring
instruments. Both of them require the same results
whenever measurement occurs. In the same year,
Bachman indicates that a language proficiency test
demonstrates its reliability when the same test or
two tests in the same level of difficulty, in a twoweek interval from each other, take place with no
significant difference between the levels of scores.
From these ideas, it can be inferred that the
reliability of language proficiency test is a function
of accuracy. However, it is necessary to note
that unlike other types of measuring, measuring
language proficiency is a much more complicated
process since this is the task of dealing with
abstract notions rather than objective reality.

NGHIÊN CỨU - TRAO ĐỔI v

3.1.1. Factors influencing English Proficiency
Test Reliability
Accurately accessing students’ language
ability requires the awareness from teachers as
well as educational staff of the considerations to
be taken into. Therefore, Brown (1996) divides
these factors into three general categories:
environmental factors; administrative factors;
features of the test items.
Environmental Factors
A number of environmental factors, which
negatively influence of students’ language
performance, have been acknowledged. If a test is
monitored in a noisy, cramped setting where it is
too hot or too cold, students’ results are likely to
suffer. Likewise, if the test takes place in a badlylit surroundings, students’ performance is by all
means negatively affected.
Besides, according to Henning (1987), these
objective factors, the test inconsistency can stem
from psychological or physiological changes in the
test takers. He further proclaims that physical or
psychological illness, sickness, and the like might
as well result in wrong reflection of the students’
language proficiency. It should be acknowledged
that unpredictable and out of the teachers’ control
as these are, constant efforts should be made to
create favourable testing conditions.
Administrative Factors

Factors relevant to administration procedures
are also highlighted as the one contributing to
the decline in students’ language performance.
As Henning (1987) states that this is result of the
testing procedures applied in different groups of
students in different locations and different days
of testing. Moreover, the decrease in the test
reliability also results from factors such as implicit
instructions or unsuitable time of test.

Features of the Test Items
There has been suggested that the length,
difficulty and manner in which the test is
implemented are factors affecting test reliability.
First, there is argument that the longer test takes
place, the better job of spreading students’ level
proficiency it does. Moreover, it is the level of test
difficulty that has also made great contribution to
test reliability. Explicitly, too difficult or too easy
tests surely fail to evaluate accurately students in
terms of their proficiency. Last but not least, it is
often reported that the rest reliability also depends
on the manner of test, the way in which students
respond to the examination. Being familiar with
test procedures, students seem to develop a certain
kind of strategy and techniques to deal with
questions more effectively, which undoubtedly
leads to the lack of test reliability.
3.1.2. Ways of Improving Test Reliability
To maximize test reliability, requires

significant complex methods. Due to the limit
of time and space, within this paper only two of
them, which can be easily applied by teachers, will
be discussed.
Test – Retest method
In this method, the same test is implemented
twice in the same group of students. The second
implementation takes place no later than two
weeks from the first one. Students are not only
uninformed of the first test result but also given
no feedback on their performances. They are also
not warned about the second one and, therefore,
undergo no preparation in the upcoming test
during this period. After the second test, individual
results will be arranged into two columns to make
comparison. If there is no significant difference, it
will be claimed that the test seem to meet reliability
requirement. Although, as Brown (1996) states,
this way might sound strange and upset students
who are asked to take the same test twice, it could
KHOA HỌC NGOẠI NGỮ QUÂN SỰ

Số 14 - 7/2018

89

v NGHIÊN CỨU - TRAO ĐỔI
prove to be a useful method of working out about
the reliability of a test.

Parallel Test Method
In this method two test equivalent in terms
of difficulty are conducted to the same group of
students. The same procedures as in the test-retest
methods are applied. Now, although parallel test
method sounds more natural than the test-retest
method, it is more challenging because two
versions of a test need to be designed with the strict
equivalence in terms of difficulty. Consequently,
the level of difficulty, at first, is defined and then
the test items are developed to match the difficulty,
requiring teachers and test designer a huge amount
of effort.
3.2. Test Validity
As Huges (1992) states, a test proves valid
only when it corresponds with language skills or
structures which are going to be measured. For
example, when testing students’ knowledge of
vocabulary, which they have just covered, students
should be tested what they have already been
presented. If in the test, some vocabulary items
of which students have yet to receive instructions
and explanations are included in the test, the test
is surely reduced to invalidity, since it fails to
respond what is designed to identify.
It will be a mistake when discussing language
test validity without clarifying the construct
validity. According to Bachman (1996) “the so
called construct validity is subordinate to the sense
and rationality of interpretation of the language

test scores, which means this interpretation
is the assessment of language skills of the
subject” (Bachman and Palmer, 1996, pp.254271). Bachman holds a belief that by means of
interpreting the test score, we can not only assess
the language ability of the subject, but we also
estimate the reasonability of the language adopted
in the test. For example, when the aim of the test is

90

KHOA HỌC NGOẠI NGỮ QUÂN SỰ

Số 14 - 7/2018

to evaluate students’ ability to use Passive Voice,
it is important that the test be designed to directly
deal with this grammatical structure in the hope
that the scores will help us to assess our students’
language proficiency. If somehow the test items
include other structures, such as Conditionals, the
test will surely lack validity.
From the mentioned ideas, it could be said that
construct validity is to interpret scores, from which
language proficiency of students and test tasks can
be estimated.
3.2.1. Factors that Affect Test Validity
A series of factors having negative effects on
validity have been identified. Henning (1987), for
example, has listed some of them. The first factor
that affects test validity is the mismatch between a

test and construct it is going to measure. Bachman
also proposes that an invalid adaptation of tests
is another detrimental factor. If, for instance, a
test designed to test lexical level of first-year
students, is used with high school students, it is
surely invalid. However, only when McNamara
(2000) proposes that there are two major notable
factors: “irrelevant variance of validity” and
“underrepresentation of validity ”, is the problem
further clarified.
Irrelevant Variance of Validity
A test will be classified into “irrelevant
variance” if the test is too broad, consisting a
number of variables which are irrelevant to the
interpreted validity. McNamara argues that the
tested knowledge or skill mismatches in a setting
which is either out of student’s experience or
irrelevant to the content being tested. For example,
in an oral test, candidates may be asked to discuss
an abstract topic; if that topic is of their disinterest
or is one of which they may be ignorance, their
performance stands less chance of competence
than when they are asked to speak on a more
accustomed topic at the same level of abstraction.

NGHIÊN CỨU - TRAO ĐỔI v

In this case, it is noted that the quality being tested,
the ability to discuss an abstract topic in English, is

inconsistent with irrelevant requirement of having
particular knowledge of a certain topic.
Underrepresentation of Validity
“Underrepresentation of validity is contrary
to “irrelevant variance of validity”, that is to say
the testing is insufficient; the test either is too
narrow in terms of knowledge or fails to include
important aspects of validity. In other words, as
Fulcher (2010) states, the extent to which a test
fails to measure the relevant knowledge is the
degree to which it under-represents the validity
that is supposed to be tested.
3.2.2. Methods of Improving Language
Proficiency Test Validity
When discussing how to determine the test
validity, Henning (1987) indicates that there are
two main ways to achieve test validity. One is the
experimental method in which the data collection
together with the statistic formulas is applied to
calculation of validity. The other is through nonexperimental methods. This involves inspection,
intuition and common sense. Since the application
of experimental methods requires special training
in terms of statistics and the use of specialized
computer programs to work out complex
calculations, within the paper, the author would
focus on non-experimental methods for preference.
Although, as many worry, lack of experimental
evidence may somehow lead to lack of objectivity,
by a number of practical actions teachers can
enhance the chances of upholding the validity of

their test. For example, if one teacher wants to
evaluate his/her students’ knowledge of grammar
at the end of an elementary course, he or she need to
acknowledge and be aware of what knowledge of
grammar at the elementary level consists of. Then,
he or she should adopt test items matching what
students have been exposed to during the course.

4. CONCLUSION AND IMPLICATIONS
FOR TEACHERS
This paper has provided some
basic
understandings of English proficiency test in
which the definition, along with qualities needed
for English proficiency test, is mentioned. Also,
“reliability” and “validity” are chosen among
the features of English proficiency test to be
discussed. Accordingly, the factors that affect and
the methods used to improve “reliability” and
“validity” are also discussed.
The paper is written in the hope of providing
what is fundamental in designing and developing
English proficiency test. Without it, students will
be exposed to a considerable challenge in English
learning process. This, unfortunately, leads to
the fact that teachers are incapable of providing
students with objective feedback about students’
progress in their English learning process. This
lack of knowledge in turn has bad effect on
teachers as well. They will do not address what

their students’ weaknesses are and how to promote
their strengths.
From such reasons, it is significant that
teachers train themselves in problems relevant
to assessment and testing. Also, our educational
institutions should start offering courses in test
design and development together with other courses
in English language teaching methodology./.
References:
Bachman, L. (1980). Fundamental Considerations
in Language Testing. Oxford: Oxford
University Press.
Bachman, L. F., & Palmer, A. S. (1996). Language
Testing in Practice: Designing and Developing
Useful Language Tests. Oxford: Oxford
University Press.
Brown, J. D. (1996). Testing in Language
Programs. New Jersey: Prentice Hall Regents.
Brown, H. D. (2004). Language assessment:
KHOA HỌC NGOẠI NGỮ QUÂN SỰ

Số 14 - 7/2018

91

v NGHIÊN CỨU - TRAO ĐỔI
Principles and classroom practices. White
Plains, New York: Pearson Education.
Fulcher, G. (2010). Practical Language Testing.

London: Hodder Education
Henning, G. (1987). A Guide to Language
Testing: Development, Evaluation, Research.
Massachusetts: Heinle & Heinle.
Hughes, A. (1992). Testing for Language Teachers.

Cambridge: Cambridge University Press.
Hughes, A. (2003). Testing for Language Teachers
(2nd ed.). Cambridge: Cambridge University
Press.
McNamara, T. F. (2000). Communication and
design of language tests. In H. G. Widdowson
(Ed,), Language testing (pp 13-22). Oxford,
England: Oxford University Press.

A REVIEW OF ENGLISH PROFICIENCY TEST: RELIABILITY, VALIDITY,
AND IMPLI-CATIONS FOR TEACHERS
NGUYEN MANH TUAN
Abstract: Testing is an indispensable component in foreign language programs in general, and
in English in particular. In this context, the concerns about the reliability and validity are of
importance. There is a fact that teachers with practically no training in the field of test development
often depend mostly on their own intuition or their previous experience and text books. From
these above, within this article, the problems of test design and development in English program
will be raised and discussed.
Keywords: English proficiency test, English program, reliability, validity
Received: 24/4/2018; Revised: 22/5/2018; Accepted for publication: 20/6/2018

92

KHOA HỌC NGOẠI NGỮ QUÂN SỰ

Số 14 - 7/2018

Độ “tin cậy” và độ “xác trị” trong xây dựng, thiết kế bài kiểm tra đánh giá năng lực tiếng Anh, những điểm cần lưu ý đối với giảng viên

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về