Tải bản đầy đủ (.pdf) (28 trang)

How do teachers observe and evaluate elementary school students’ foreign language performance a case study from south korea

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (716.91 KB, 28 trang )

How Do Teachers Observe and Evaluate
Elementary School Students’ Foreign
Language Performance? A Case Study
from South Korea
YUKO GOTO BUTLER
University of Pennsylvania
Philadelphia, Pennsylvania, United States

This study investigates how teachers observe and assess elementary
school students’ foreign language performance in class and how such
assessments vary among teachers. Twenty-six elementary school teachers and 23 English teachers at secondary schools in South Korea watched
videotapes of 6th-grade students’ group activities in English and were
asked to assess the students’ performance as if they were in their own
classrooms. The study found that the teachers varied substantially in
their overall evaluations both within and across school levels. A discussion held among the teachers after the individual assessments were completed showed that the elementary school teachers and secondary
school teachers differed with respect to (1) their views toward assessment criteria, (2) how to evaluate student confidence and motivation;
and (3) how to gauge students’ potential ability to communicate competently in a foreign language. Such differences between the elementary
and secondary school teachers appeared to be deeply rooted in their
respective teaching contexts. Using Davison’s (2004) framework for
analyzing teachers’ beliefs and practices in teacher-based assessment,
the current study suggests that both groups of teachers need to negotiate assessment criteria while paying close attention to the local context
and adapting their teaching practices to fit therein.

T

his study is concerned with how teachers observe and assess young
learners’ foreign language performance in class and how such
assessments vary among teachers. By asking elementary school teachers
and secondary school teachers to evaluate 6th-grade students’ English
abilities at the end of their elementary school education in South Korea,
the current study aims to examine similarities and differences in teachers’ observations both among teachers working at the same school level


and across different school levels. In doing so, it is hoped that this study
will help us better understand teacher observation, as one popular type
of teacher-based assessment, and help us enable a smoother transition

TESOL QUARTERLY Vol. 43, No. 3, September 2009

417


in assessment practice from elementary school English to secondary
school English.
Teacher-based assessment can be defined as “nonstandardized local
assessment carried out by teachers in the classroom” (Leung, 2005, p. 871).
A number of countries have begun heavily promoting teacher-based assessments as part of their language-in-education policies in recent times. The
degree to which teacher-based assessments have gained in prominence is
particularly evident at the elementary school level. We find a strong emphasis on the development of communicative competence, especially in the
oral domain, because it has become one of the central goals of English as
a foreign language education at the elementary school level (FLES). The
promotion of teacher-based assessment is also tied to the intentions of policy makers, who have strived to avoid traditional achievement tests, such as
paper-and-pencil standardized tests, at the elementary school level.
Despite the strong promotion of teacher-based assessment in various
educational contexts, including FLES, concerns have been cited with
regard to the validity, reliability, high costs, fairness, and logistical challenges in developing, administering, and scoring teacher-based assessment (e.g., Gattullo, 2000; Linn, Baker, & Dunbar, 1991; Rea-Dickins &
Gardner, 2000).
Part of the challenge of teacher-based assessment appears to come
from the dilemma between the pedagogic and measurement aspects of
such assessments. Namely, at the same time that these assessments are
supposed to help teach students, education policies often ask teacherbased assessment to fulfill an accountability requirement. Teachers worldwide encounter tension in meeting the pedagogical needs of students
while at the same time meeting the accountability requirements that are
often based on prescribed standards and criteria (e.g., Arkoudis &

O’Loughlin, 2004; Brindley, 1998, for a discussion of this tension in
Australia; Davison, 2004, for Australia and Hong Kong; Gardner & ReaDickins, 1999; Teasdale & Leung, 2000, for England).
South Korea is no exception to this trend. At the elementary school
level, among the various types of teacher-based assessments, teacher
observation has been promoted as a primary means of assessment.
Teachers are encouraged to observe students’ performance systematically during classroom activities and to use such observations for both
summative and formative purposes.1 However, in many cases, no specific
criteria for conducting observations have been provided, and we know

1

418

The summative assessment is usually given to students at the end of an instructional sequence
and the results are primarily used for giving students reports about their achievement. The
formative assessment is usually undertaken before and/or during an instructional sequence
and is primarily used to help the students identify their strengths and weaknesses, and in
turn, provides teachers with information in order to make instructional decisions.

TESOL QUARTERLY


little about how teachers observe and assess their students’ performance
during classroom activities. At the secondary school level, teacher-based
assessment, including teacher observation, has also increasingly been
emphasized. However, practical and pedagogical challenges including
large class sizes and limited class hours leave teachers little time for making systematic observations for formative purposes. Under such conditions, parents and students have frequently cited their distrust of teacher
observation as a summative assessment (Butler, 2005).
A number of researchers have questioned the application of traditional measurement-based concepts of validity and reliability to teacherbased assessment (Brookhart, 2003; McMillan, 2003; Moss, 2003; Smith,
2003). Traditional validity and reliability theories are fundamentally concerned with the ability (or lack thereof) to generalize assessment-based

inferences, and the consistency of measures irrespective of the context,
form, time span, and raters involved in assessment. Such concepts are not
necessarily relevant or even compatible with teacher-based assessment,
which is highly context dependent and primarily formative in nature
(McNamara, 2001; Teasdale & Leung, 2000). Teacher-based assessment
should not be considered as a collection of miniature summative assessments (Rea-Dickins, 2007). Indeed, criteria that are derived from the psychometric tradition may not be appropriate for teacher-based assessment
(Leung, 2005).
As an alternative approach, Wiliam (2001) proposed construct-referenced
assessment, which is based on “the consensus of the teachers making the
assessment” (p. 172). In this approach, there is no predefined objective
criterion. Instead, teachers’ judgments are based on shared understandings of what a community of teachers in a given teaching context would
consider competency. Leung (2005) argues that the concept of constructreferenced assessment is “useful in that it opens the way to an examination of the kind of information teachers seek and the basis of their
decision making” (p. 880). This approach sheds light on the importance
of understanding teachers’ knowledge about assessment and paying
attention to the specific context in which the assessment is undertaken.
To date, researchers have only a limited understanding of the reasoning and criteria that teachers use in their teacher-based assessments for
young learners in English as a foreign language (EFL) contexts. In examining English as a second language (ESL) environments, a number of
studies have investigated how teachers understand and work with assessment criteria when they perform teacher-based assessments (e.g., Breen,
et al., 1997; Davison, 2004; Leung, 1999; Teasdale & Leung, 2000). In
England, for example, Leung (1999) found that teachers do not seem to
make judgments simply based on students’ linguistic performance on a
given task, but rather that they make holistic judgments while bringing in
various external factors such as performance in previous activities
ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE

419


or performance outside of the classroom. Much of the previous work,
however, is primarily focused on investigating how teachers interpret prescribed assessment frameworks and how they apply these in their assessment practices.

As Rea-Dickins (2007) indicates, there are usually multiple motivations
for implementing FLES. Many FLES programs have not only linguistic
targets but also nonlinguistic objectives such as developing positive attitudes toward the target language as well as an appreciation of both foreign and domestic cultures. One may hypothesize that having so many
varied objectives for FLES programs could make it difficult for teachers
to reach a consensus among themselves on what specific objectives they
should try to achieve through assessment. Moreover, researchers have
observed a lack of consistency in foreign language teaching practices,
including assessment, between the elementary and secondary school levels (e.g., Bolster, Balandier-Brown, & Rea-Dickins, 2004; Butler, 2005).
We also have very little understanding of how secondary school language
teachers understand and evaluate the performance of those incoming
students who have come up through FLES programs.
This study therefore investigates how elementary and secondary school
teachers observe and assess 6th-grade students’ foreign language performance in class (at the end of their elementary school education) and
how those assessments vary among such teachers. The study focuses on
what abilities teachers pay attention to when assessing elementary school
students’ performance and what kinds of criteria and reference points
they use. The study attempts to address these topics by focusing on
English FLES in South Korea as a case study. As we shall see, South Korean
society has traditionally placed substantial value on measurement in its
educational system, but the government recently began promoting
teacher-based assessment as part of its language-in-education policy.
Unlike many of the cases that have been documented thus far, no prescribed assessment framework is available for teachers in South Korea;
rather, teachers are responsible for developing their own assessments as
part of their teaching practice.

ENGLISH AS A FOREIGN LANGUAGE EDUCATION
IN SOUTH KOREA
In South Korea, various types of assessment have played a significant
role in education and society as a whole. English, as one of the key academic subjects, has been used as a barometer of students’ general academic achievement and diligence. Grammar translation and vocabulary
exercises long dominated English classrooms at the secondary school

level and beyond; rigorous standardized assessments measuring students’
420

TESOL QUARTERLY


discrete linguistic knowledge had a substantial impact on students’ future
academic and career opportunities.
In the 1990s, as part of the South Korean government’s globalization
policy, the Ministry of Education shifted their English curriculum from
traditional grammar-translation instruction to communicative-based
instruction. Acquiring communicative competency, and oral communicative abilities in particular, became a central goal of English education.
In line with the promotion of communicative language teaching (CLT),
teachers are now coached to use various types of activities in class and
encouraged to use only English in their classrooms. A student-centered
approach has been strongly promoted in the current policy.
Along with this shift in teaching approach, the government introduced
a series of reforms in assessment. One such reform was the promotion of
teacher-based assessment as part of the assessment requirements. The 7th
National Curriculum (implemented in 1999) indicated that teachers
should assess students’ process of learning as well as the outcome of their
learning through ongoing observation and other forms of performance
assessment, as opposed to one-shot multiple choice tests. It is important
that the policy stressed the autonomy of schools and teachers in administering such assessments. Schools were now responsible for deciding the
methods, criteria, and frequency of assessments through discussions among
their teachers (see, e.g., Chungcheongnamdo Office of Education, 2007).
As part of the effort to enhance the communicative competence of its
citizens, the South Korean government introduced English as a compulsory subject at the elementary school level nationwide in 1997. In addition to the strong emphasis on oral communication as a central goal of
FLES, motivating students to learn English was set as another important
goal. A variety of group activities have been implemented in classrooms

based on the uniform national curriculum. With respect to assessment,
the government (via the Korea Institute of Curriculum and Evaluation,
or KICE) has suggested that teachers should periodically observe their
students’ performance in class and keep “observation records” on attitudes, oral skill development, and written skill development. KICE also
created a 5-point scale to serve as an example for these teacher observations (Lee, 2007). However, KICE did not provide precise criteria for
each point and domain. Individual teachers must decide how to use such
assessment information for summative and/or formative purposes.
Currently, report cards to students and parents at the elementary school
level in South Korea are based on verbal descriptions and not on a
numeric scale. However, parents may request teachers to disclose any of
the information on their children’s performance in class that was used as
a basis for their evaluation. In practice, many teachers keep records of
one type of numeric scale or another, including standardized test scores,
in addition to verbal comments on their students’ performance.
ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE

421


Despite the promotion of teacher-based assessment at school, South
Korean society continues to emphasize a measurement-driven orientation toward assessment. Even among elementary school students, a number of large-scale standardized English tests such as the Test of English
for International Communication (TOEIC) Bridge and the Practical
English Level Test for Elementary English (PELT) are very popular
(Choi, 2008). Many students go to private English institutes where they
can access various types of proficiency tests including these standardized
tests. At the secondary school level, the pressure that students feel about
exams appears to have become even more intense. Under recent reforms
in English assessment, a certain portion of the students’ final grades
should now come from teacher-based assessment. Final grades, however,
are given to students on a numeric basis. Students and parents are greatly

concerned with their English grades at school as well as their scores on
various types of standardized tests of English, especially in relation to
accessing higher education and fulfilling their career aspirations. Under
substantial pressure from parents and students, teachers in South Korea
are expected to promote a learning culture in accordance with government policy, but in an excessively focused exam culture (Hamp-Lyons,
2007). Teacher-based assessment thus inevitably becomes part of accountability measures, at least to some degree.
Finally, it has been reported that elementary school teachers have had
little direct communication with secondary school English teachers
regarding instruction and assessment in South Korea (Butler & Lee,
2005). Teacher training is usually offered to elementary school teachers
and secondary school teachers separately, and teachers typically have few
opportunities to observe English classes at different school levels. As
such, assessment practices may differ significantly between elementary
and secondary school teachers.

COMMUNICATIVE ABILITIES IN A FOREIGN LANGUAGE
One of the challenges in developing teacher-based assessments in
FLES is the limited understanding of what specifically entails having communicative abilities in a foreign language. Language assessment constructs are not yet clearly understood, especially when pedagogical value
is primarily placed on assessment (McNamara, 2001; Rea-Dickins &
Gardner, 2000). In identifying constructs for teacher-based assessment
for young learners, the following three characteristics of existing models
of communicative abilities are especially problematic: (a) the notion that
communicative abilities reside in individuals, (b) the lack of a clear conceptualization of the affective aspects of communicative abilities, and (c)
the lack of a developmental perspective.
422

TESOL QUARTERLY


With regard to the first characteristic, current theories on communicative competence in a second or foreign language overemphasize individual performance as opposed to interactive performance (McNamara,

1996, 1997, 2001). McNamara argues that it is dangerous to consider
one’s performance on a performance test as being a mere reflection of
one’s individual competence; rather, performance is co-constructed
through interactions among various agents such as interlocutors, test
materials, raters, and so forth. In FLES programs, a frequently emphasized goal is developing students’ communicative abilities rather than
their discrete linguistic knowledge per se, and pair and group activities
are widely used. One can expect that a student’s performance is influenced by the nature of the activities and interactions with the student’s
interlocutors. In addition, which activities the teacher chooses to observe
and which aspects of the student’s performance the teacher chooses to
pay attention to all contribute to the dynamic interactions that influence
ratings and evaluations. However, it is not clear how best to understand
students’ communicative competence in such interactions.
Second, although affective aspects such as motivation and confidence
are often set as a key objective for FLES programs worldwide, including
in South Korea, current theories on communicative competence do not
agree on how best to conceptualize such affective factors in language
assessments (McNamara, 1996). Hymes (1972) distinguished ability for
use from knowledge in his model of communicative competence (which
was originally developed in the context of first language use). Hymes conceptualizes ability for use as one’s potential ability for performance, and it
includes various language-relevant cognitive and noncognitive factors
such as motivation. However, a model proposed by Canale and Swain
(1980) that has become one of the most influential models of communicative competence in second or foreign language acquisition carefully
excludes factors that are relevant to ability for use. There have been some
attempts to capture the affective dimensions of ability for use in successive models, such as Bachman’s (1990) strategic competence and Bachman
and Palmer’s (1996) affective schemata. However, as the term schemata indicates, affective dimensions in their models are conceptualized as cognitive entities in nature and are primarily considered to be a source of
response bias in assessment; the role of affective factors in language use
is far from clear (McNamara, 1996). In addition, the role of nonverbal
behavior in language assessment, such as body movements and facial
expressions, has yet to be sufficiently explored (Young, 2002).
One can also point to a lack of developmental perspectives in the current leading models of communicative competence. Such models identify and classify different constructs of communicative competence but

they do not explain how different components interact with each other
or how such interactions may change over the course of individual
ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE

423


development. Are abilities in different constructs expected to develop
simultaneously, or are certain constructs more important than others at
some point in development? We do not yet have a comprehensive theory
of the development of communicative competence that teachers can use
when conducting formative assessments for their students.
If we are to introduce teacher assessment, therefore, an important starting point would be to understand teachers’ perceptions of what’s important in carrying on competent communication in a foreign language.

RESEARCH QUESTIONS
The purpose of the study was to examine how teachers observe and
assess elementary school students’ foreign language performance in daily
classroom activities. More specifically, the study aimed to investigate the
following three questions:
1. How do teachers observe and assess elementary school students’ performance while they engage in group activities? How much consistency (or variability) is there in teachers’ assessments of student
performance while their students are interacting with other students?
2. What kinds of selective attention do teachers demonstrate in assessing their students’ performance (including both verbal and nonverbal aspects)? What criteria or methods do they rely on to observe and
assess their students’ performance? How do they negotiate criteria
among themselves through discussion?
3. Do elementary and secondary school teachers assess 6th-grade students differently?

METHOD
Participants
The participating teachers were recruited from an in-service teachers’ training site in central South Korea. The local provincial government selects groups of English teachers each year from elementary and
secondary schools from across the province to receive approximately two

months of professional training at various local sites. The elementary and
secondary school teachers receive their training separately. The current
study was conducted as part of the in-service training program. However,
participation in the current study was on a voluntary basis. With the help
of a training organizer, 26 elementary school teachers and 23 secondary
school teachers were recruited at one of the training sites during the summer of 2007. All the participants came from different schools and their
424

TESOL QUARTERLY


TABLE 1
Participating Teachers’ Profiles
Elementary
school teachers

Secondary
school teachers

Subjects taught*
English
Multiple subjects

11 (42.3%)
15 (57.7%)

23 (100%)
0 (0%)

Educational background

4-year college
Some postgraduate education

15 (57.7%)
11 (42.3%)

16 (69.6%)
7 (30.4%)

Average years of teaching (SD)**
English
Including other subjects

5.81 (3.91)
10.01 (6.07)

14.21 (7.42)
15.11 (6.53)

Age
20s
30s
40s

2 (7.7%)
16 (61.5%)
8 (30.8%)

4 (17.4%)
5 (21.7%)

14 (60.9%)

Gender
Male
Female

4 (15.4%)
22 (84.6%)

10 (43.5%)
13 (56.5%)

1 (3.8%)
9 (34.6%)
5 (19.2%)
3 (11.5%)
8 (30.8%)

0 (0%)
0 (0%)
2 (8.7%)
14 (60.9%)
7 (30.4%)

Average class size (SD)

29.1 (10.15)

32.1 (6.24)


Familiarity with curriculum of a school level other than the
level they teach (i.e., familiarity with elementary school
curriculum for secondary school teachers and vice-versa)
Fully understood
Good knowledge, if not full
Some knowledge
Little knowledge
(No response)

16 (61.5%)
2 (7.7%)
3 (11.5%)
2 (7.7%)
3 (11.5%)

6 (26.1%)
11 (47.8%)
3 (13.0%)
0 (0%)
3 (13.0%)

Hours of English taught per week
Not taught yet
4 hours or less
4–10 hours
10–20 hours
More than 20 hours

Experience of observing English classes at a school level
other than the level at which they teach


2 (7.7%)

0 (0%)

Note. * As of 2008 in South Korea, English is taught by homeroom teachers who teach multiple
subjects, as well as by teachers who specialize in teaching English only at elementary schools.
Teachers may change their status on their principals’ requests each year. As a result, English
teachers may become homeroom teachers and vice-versa. At the secondary school level, English
teachers are specialized and teach English only. ** SD = Standard deviations.

backgrounds were diverse. Table 1 summarizes the teachers’ profiles based
on a background survey that was distributed to the teachers prior to the
study. Notably, the secondary school English teachers who participated in
the current study appeared to be less familiar with the elementary school
English curriculum whereas the elementary school teachers who participated were more familiar with the secondary school English curriculum.
ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE

425


Very few of the teachers in either group had observed English classes at the
other school level. Although language assessment is considered an important component of professional development for English teachers in South
Korea, the participating teachers had received relatively little training on
how to conduct teacher-based assessment, and they had not had any extensive discussions on this topic with other teachers prior to this study.

Materials and Procedures
The participating teachers watched videotapes of students’ group activities in English and were asked to assess the students’ communicative performance. This activity was conducted separately for the elementary school
teachers and the secondary school teachers. The video showed four 6thgrade students engaging in two different activities in a group: One was
a simple jigsaw activity and the other was a more complicated decisionmaking activity (Pica, Kanagy, & Falodun, 1993). In the first activity, the

students were asked to complete the weekly schedule for a student named
Minho on a collaborative basis. Each of the students had different information about the schedule and thus two-way interactions were required to
complete the task. However, only a limited set of expressions and vocabulary (e.g., “What does Minho do on Monday afternoon?” and “He plays
baseball”) were needed for the completion of the task. The second activity
was an open-ended shopping task wherein one student played a customer
and the rest of the students played shop owners. Each shop carried different items at different prices. The goals of this task included: (a) to buy a
list of items for a party and leave with as much money as possible (for the
shopper), and (b) to sell many items and make as much money as possible (for the shop owners). This task required the students to use a variety of English expressions and vocabulary related to shopping in order to
buy or sell goods and to negotiate prices (all of the necessary expressions
and vocabulary had already been covered in class based on the National
Curriculum). Unlike the first activity, the number of utterances produced and the content of the discussions among the students varied substantially in the second activity. This latter activity required some simple
math skills to help solve the problems encountered in the task itself. Both
activities were commonly used in 6th-grade English classrooms in South
Korea, and each activity lasted for 15 minutes. The participating teachers
could see the written notes that the students took during the activities.
Among the four students shown in the video, two were boys and two
were girls. They used English pseudonyms during the activities: Tom,
John, Jane, and Sally. In the video, the four students appeared to differ in
terms of their activeness/shyness and their general English proficiency
levels. The only objective measure of their proficiency available to the
426

TESOL QUARTERLY


researcher was their listening scores (raw scores) from the Cambridge
Young Learners’ English Test (the Movers’ level) which was administered
immediately before the taping of the video.2 The scores were 14 for Tom,
20 for John, 10 for Jane, and 16 for Sally (out of a possible total score
of 25). The activities were videotaped at the end of the school year (shortly

before the students graduated from elementary school and moved on to
secondary school). These video clips also had been used in past in-service
training sessions in Seoul and Pusan. However, none of the participating
teachers had seen the video prior to the current study.
Both groups of teachers individually assessed the students’ performances in two steps. The first step was a holistic assessment. The teachers
were asked to use a 5-point scale as well as to take verbal notes on each of
the four students, in accordance with the suggestions of the South Korean
government. The 5-point grading scale has been used in the South
Korean education system for years, and the current study, as suggested by
KICE, used the following scale: excellent (5), good (4), moderate (3), deficient
(2), and poor (1). The teachers were asked to take notes freely as they usually do in their own classrooms.
The second step tested multiple traits (Hamp-Lyons, 1991). The students’ performances were evaluated based on three or four specific traits.
In this study, the traits were self-selected by individual teachers out of 12
prescribed traits. These traits were determined based on multiple sources
including the national curriculum, teachers’ manuals, and other commercial resources that are popular among Korean teachers. The traits
were (1) listening comprehension, (2) speaking fluency, (3) speaking accuracy,
(4) pronunciation, (5) range of vocabulary use, (6) content being spoken about
(content), (7) use of appropriate expressions in the given context (pragmatics),
(8) confidence in talking, (9) motivation, (10) task completion, (11) the ability to
interact effectively with other students (interpersonal), and (12) others. These traits
were certainly not exclusive or fixed traits but they often appeared in documents available to the teachers. In this study, these traits were used with
the intention of helping the teachers become aware of what they do when
they produce a single score (e.g., whether they base the score on an overall
judgment or rely on a single or multiple traits) and which abilities they pay
attention to when they make holistic judgments. The teachers were encouraged to identify those traits (any number of such traits) that they thought
they paid attention to while they observed the students’ performances.
The teachers were shown the video clips twice for each of the activities.

2


Cambridge Young Learners’ English Tests have three levels: Starters, Movers, and Flyers.
The Movers level and Flyers level correspond to the A1 and A2 levels in the Common
European Framework Level, respectively. The Flyers level is equivalent to the Key English
Test with respect to difficulty, but its vocabulary and content are designed to be more suitable for young learners (Cambridge ESOL, n.d.).

ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE

427


After individually assessing the students holistically and identifying the
traits they paid attention to, the teachers then discussed in a small group
how they evaluated each student and which criteria they used for their
assessments. All of these activities took place as part of the aforementioned teachers’ training program and took approximately 2 hours for
each group of teachers. All of the teachers’ small group discussions were
transcribed. For reporting the results, the transcribed data were translated into English and back translations were used to ensure accuracy.

RESULTS
This study found that both groups of teachers showed substantial variations in their holistic evaluations, both within the same group as well as
across groups. As we shall see, the most frequently chosen traits by both
groups of teachers were speaking fluency, confidence in talking, listening comprehension, motivation, and speaking accuracy. The results showed similar
tendencies in both groups of teachers. However, a qualitative analysis
revealed that the elementary and secondary school teachers interpreted
these traits and arrived at judgments in different ways. One could also
observe substantial variability within the same groups of teachers; different teachers relied on different reference points when assessing students.
Moreover, their judgments were frequently influenced in nonsystematic
ways by different aspects of the students’ behaviors. In this section, I first
review the results of a series of quantitative analyses, followed by a qualitative analysis of the discussions among the teachers. In particular, I focus
on the following three issues with respect to the qualitative data: (a) the
nature of the teachers’ observations; (b) the affective constructs

(i.e., confidence and motivation); and (c) the potential abilities, which
were conceptualized as abilities to acquire high English proficiency in
the future.

Quantitative Analyses of the Teachers’ Judgments
The Variability of Teachers’ Holistic Judgments
As we can see from Figures 1–2 , elementary and secondary school teachers showed substantial variability in their holistic judgments in which they
used the 5-point scale (the same format that has been suggested by KICE).
In the first activity, the students were given an almost equal amount of time
to talk, and they used a limited number of fixed expressions and vocabulary. In this activity the students’ linguistic output was highly controlled,
and yet the teachers’ judgments of the students’ overall performance
showed substantial variability. This result may suggest that the teachers paid
428

TESOL QUARTERLY


FIGURE 1
Teachers’ Holistic Analysis of Activity 1

ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE

429


FIGURE 2
Teachers’ Holistic Analysis of Activity 2

430


TESOL QUARTERLY


attention to various aspects of the students’ performance, including both
linguistic and nonlinguistic aspects, in possibly complicated ways. One can
also observe variability in the teachers’ judgments regarding individual students. The teachers’ judgments were more widely spread for Tom and Sally
whereas they exhibited relatively more agreement with respect to John.
The elementary and secondary school teachers also evaluated individual students differently. The secondary school teachers tended to rate
Tom higher compared with the elementary school teachers in both activities. The reverse tendency was observed for John and Jane.3

Traits Chosen by the Teachers
Table 2 summarizes the traits chosen by the teachers for each activity.
This table lists the traits in order of higher frequency (note that the teachers were not asked to rank these traits; rather, they were asked to indicate
the traits that they paid attention to while they observed the students).
Speaking fluency, confidence in talking, listening comprehension, motivation, and
speaking accuracy were the traits most frequently chosen by both groups of
teachers. On the other hand, pronunciation was the least frequently chosen trait. There did not seem to be a notable difference in the choice of
prescribed traits between the elementary and secondary school teachers.

Qualitative Analysis of the Discussions Among the Teachers
The Nature of Observation: What Did the Teachers Look for?
Although in the quantitative analysis the elementary and secondary
school teachers did not show a notable difference in their choice of traits,
the qualitative analysis revealed that the two groups of teachers looked
for different things during their observations. Elementary school teachers, either consciously or unconsciously, tended to avoid setting any criteria, whereas secondary school teachers tended to depend on some form
of set criteria even when they made holistic judgments based on their
observations.
Some elementary school teachers suggested that identifying traits was
a somewhat artificial activity. A few of them indicated that they made
3


As far as the average scores are concerned, a series of one-way ANOVAs indicated that the
secondary school teachers gave significantly higher scores to Tom than the elementary
school teachers in Activity 1 (F(1, 44) = 4.43, p < 0.05, η2 = 0.09). A similar tendency was
observed in Activity 2 (F(1, 42) = 3.88, p = 0.056, η2= 0.09). The elementary school teachers
gave a higher evaluation for John in Activity 1 (F(1, 43) = 7.02, p < 0.05, η2 = 0.14) and for
Jane in Activity 2 (F(1, 41) = 7.88, p < 0.01, η2 = 0.19). No significant differences were
observed for Sally between the two groups of teachers for either activity.

ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE

431


432

TESOL QUARTERLY

11 (42%) Motivation

Motivation

4 (15%) Pronunciation

0 (0%)

Range of vocabulary use

Others


Others

5 (19%) Range of vocabulary use

Pronunciation

0 (0%)

3 (12%)

5 (19%)

7 (27%)

Ability to interact effectively
with other students
(interpersonal)

7 (27%) Ability to interact
effectively with other
students (interpersonal)

7 (27%)

8 (31%)

8 (31%)

12 (46%)


16 (62%)

19 (73%)

20 (77%)

21 (81%)

Use of appropriate
7 (27%) Content being spoken
expression in the
about (content)
given context (pragmatics)

8 (31%) Task completion

11 (42%) Use of appropriate
expression in the given
context (pragmatics)

Speaking accuracy

9 (35%) Speaking accuracy

16 (62%) Listening comprehension

Listening comprehension

Content being spoken about
(content)


21 (81%) Confidence in talking

Confidence in talking

Task completion

22 (84%) Speaking fluency

Activity 2

Speaking fluency

Activity 1

Elementary school teachers (n = 26)

Speaking fluency

7 (30%) Use of appropriate
expression in the given
context (pragmatics)

9 (39%) Range of vocabulary use

10 (43%) Speaking accuracy

12 (52%) Motivation

17 (74%) Confidence in talking


17 (74%) Listening comprehension

Others

Ability to interact
effectively with
other students
(interpersonal)

Pronunciation

Task completion

0 (0%)

Others

3 (13%) Pronunciation

3 (13%) Ability to interact effectively
with other students
(interpersonal)

6 (26%) Task completion

Use of appropriate
7 (30%) Content being spoken
expression in the given
about (content)

context (pragmatics)

Range of
vocabulary use

Speaking accuracy

Content being spoken
about (content)

Motivation

Confidence in talking

18 (78%) Speaking fluency

Activity 2

Secondary school teachers (n = 23)
Activity 1
Listening
comprehension

TABLE 2
Traits Chosen by the Teachers

0 (0%)

5 (22%)


6 (26%)

6 (26%)

6 (26%)

9 (39%)

9 (39%)

9 (39%)

11 (48%)

16 (70%)

16 (70%)

17 (74%)


their judgments based on “overall judgment” (E8),4 “general performance” (E19), and “overall flow” (E9) without specifying any particular
criteria in mind. The elementary school teachers seemed to believe
strongly that evaluation at the elementary school level should focus only
on students’ strengths and that setting standard criteria or traits may lead
to a more measurement-oriented practice that should be avoided at the
elementary level. Some indicated that they set different criteria depending on the individual student because “I pay attention to the strength of
each student” (E4). Inconsistencies in their evaluations across students
did not seem to matter too much to these teachers.
Some elementary school teachers, however, did express concern regarding such inconsistencies. “If you don’t have any criteria, lower performers

tend to get higher evaluations than they are supposed to” (E3). Others
were concerned that holistic judgments without specific criteria could be
easily influenced by other students’ performance; one teacher stated, “I
don’t think holistic judgment is valid because I cannot help but compare
students with one another” (E7). Another teacher commented that “salient
features can greatly influence our judgment. High performers and troublemakers catch our attention, but it is difficult to assess middle-range students if you don’t have criteria” (E10). However, even those who supported
setting criteria acknowledged the practical challenges of doing so: “We are
not trained to do it and there is no clear guideline for us” (E1).
Another concern frequently cited by the elementary teachers was their
large class sizes. They commented that it was hard to pay attention to
multiple traits in each student during the activities. In practice, teachers
often appear to set only one or two criteria at a time, and they do so relatively flexibly depending on the activities in question. The assessment literature suggests that the multiple-trait approach has more diagnostic
merit than the holistic approach (Hamp-Lyons, 1991). It is interesting,
however, that neither group of teachers mentioned the possible diagnostic merits of having multiple traits in their judgment.
In general, the secondary school teachers opted for employing criteria
for observation. This difference may be due to their familiarity with various types of criterion-based oral and written assessments. One secondary
school teacher said that “holistic judgment without setting any criteria is
too subjective. My evaluation would differ depending on my mood” (S23).
Some teachers clearly advocated multiple-trait scoring (as opposed to
scoring based on a single trait): “it is hard to evaluate students by giving
them a single score. If a student is excellent in grammar and vocabulary,
then these qualities should be evaluated separately as such” (S2). Others
indicated that multiple traits help teachers be attentive to students’
4

“E” refers to elementary school teachers and the numbers that follow indicate his or her ID.
Similarly, “S” refers to secondary school teachers.

ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE


433


performance. As with the elementary school teachers, however, they made
no clear statements indicating the potential pedagogical merits of applying multiple-trait scoring. Rather, more teachers supported having criteria from the viewpoint of fairness. A number of teachers indicated that
making holistic judgments is potentially unfair to students who fail to
show the abilities that a particular teacher thinks are important. Some
said that fairness should be attained by setting clearer and more detailed
criteria (as opposed to roughly identifying traits), and others felt that tasks
should be carefully designed so that everybody has enough output to be
assessed. Referring to Activity 2 in the video, a teacher said, “Sally would
make more mistakes if she spoke more. She had few mistakes because she
did not speak a lot. This is ‘less talk, better score’ and is unfair” (S1).

Different Understandings of Criteria and Conflicting
Values Among Teachers
The discussions among the teachers also revealed that the teachers differed in their understanding of their self-chosen criteria. Their judgments
were also frequently influenced by different aspects of the students’ behaviors in nonsystematic ways. For example, speaking fluency, which was the
most frequently chosen trait by both groups of teachers, meant different
things to different teachers. For some teachers, fluency was synonymous
with general proficiency, and for others it meant “ability to grasp” (E19;
i.e., the ability to “pick up” vocabulary and expressions that were taught
and/or corrected by others). For yet other teachers it referred to having
no hesitation and/or displaying an “attitude that is not afraid to make mistakes” (E7, S5). Some teachers even equated fluency with confidence and
indicated that “a loud voice” (e.g., E11, E20, E22) was a sign of fluency.
Both groups of teachers frequently encountered a dilemma with
regards to how best to weigh the different traits in their holistic judgments. Speaking fluency and speaking accuracy appeared to present such a
dilemma to teachers, as one can see from the following exchange:
E12: I think that fluency is always the priority in speaking. It is followed by
grammar and pronunciation. Students need to be able to speak English

first.
E13: This is true, but the new national curriculum also emphasizes
accuracy.
E14: In fact, I usually don’t focus on grammar in speaking, but I picked up
accuracy today, because Tom’s speech was full of mistakes. Although fluency is important, we should correct his errors in order to help him speak
English accurately.
E13: Yes, we should not let students keep on speaking incorrectly.
434

TESOL QUARTERLY


E14: Do you mean that accuracy is more important than fluency?
E11, E12, E13: Well, fluency is more important …
E11: I think that for the 6th grade, fluency is the most important. And for
lower grade students who learn English for the first time, confidence and
attitude are more important.
E13: I have a slightly different view from you… . As for Tom, he just spoke.
He spoke fluently but did not use correct grammar.

The secondary teachers often interpreted good fluency as a sign of communicative competence, and they struggled when other traits showed
inconsistent results in student performance.
Among the traits that the teachers discussed, confidence in talking proved
to be a particularly interesting trait from a formative point of view, and
yet one may say that it could be a problematic construct from a summative point of view. As with speaking fluency, the teachers saw different student behaviors as evidence of confidence, including speaking with “a
loud voice” (e.g., E11, E20, E22, S18), “active participation in activities”
(e.g., E1, E7, E10, E12, E13, S2, S6, S22) and an “attitude of not being
afraid to make mistakes” (e.g., E7, S5, S6). Yet these same behaviors were
also perceived as negative by some teachers. Moreover, confidence tended
to attract teachers’ attention and could easily mask other abilities of the

students. The teachers often showed substantial disagreement in their
assessments when students were perceived as being confident but did not
show strong performance in linguistic aspects such as speaking accuracy
and pronunciation (or vice-versa):
E7: The boys made a lot of grammatical errors, I gave them 3 points overall. The girls performed better, so I gave them 4.
E8: You suggested that the boys had weaker grammatical ability. But I
found them confident because they were so active in the activities.
E7: Sally was less confident and less active.
E8: Judging from her written note [note: the teachers could see the written notes
which the students took during the activities], I thought that Sally’s writing was
good. She must have good English ability. But she did not show it in her
oral activity as much as I expected. That’s why I gave her lower scores.
E9: I thought that Sally’s writing was good, too, and I thought that she
would also be excellent in her speaking ability. I was surprised to find out
she was not very good in the activities.
E7: But Sally certainly has a good command of English. She just did not
display it!
E8: Sally is good at writing. She seemed to have good listening skills as
well. But I think that she does not have good speaking skills.
ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE

435


E9: Sally does not have any confidence in speaking.
E8: I gave Sally 2 points because I couldn’t see her participation in the
activity.
…….
E10: I gave Sally 5 points, because she always gave immediate answers. It’s
not true that Sally doesn’t have a good ability to communicate.


The wide variation in the teachers’ judgments that appeared in the quantitative analysis, particularly for Tom and Sally, turned out to stem from
such disagreements over how to value the students’ confidence.
Curiously, in the small group discussion, the secondary school teachers tended to place greater emphasis on students’ confidence and other
affective aspects in evaluating the 6th-grade students’ performance compared with the elementary school teachers, whereas the elementary
school teachers were more concerned with the linguistic aspects of student performance. For example, Tom was considered confident and was
more highly rated by the secondary school teachers than by the elementary school teachers, despite his perceived weaknesses in speaking accuracy
and pronunciation. This seemed to be due, in part, to different expectations of the goals of FLES between the two groups of teachers. One may
recall that, in the background survey, the secondary school teachers in
the current study were not very familiar with the elementary school curriculum and practice. It is important to note that the majority of the secondary school teachers appeared to have a different standard for the
elementary and secondary school levels: Namely, confidence in talking and
motivation were judged to be important at the elementary school level (as
in the case of the 6th grade students), but not as much so at the secondary school level. In fact, some teachers explicitly stated their concern
over the perceived discrepancy in expectations between teachers at the
two school levels. For example,
I agree with you that confidence and motivation are important at elementary school. Tom made a good impression on the teachers with his
confidence and I think that speaking with lots of errors is fine at the elementary school level. But students like Tom don’t adapt to English classes
at secondary school well because teachers at secondary school expect students to speak accurately, as in saying “I go” and “He goes.” Tom would
not be ready to receive such treatment. As a result, teachers have difficulty
in teaching students like Tom. (S14)

In addition, although many secondary school teachers indicated that confidence in talking and motivation are important at the elementary school
level, they also admitted that such traits are difficult to assess, and thus
they questioned if such affective traits could be assessed objectively.
Many secondary school teachers agreed that the affective aspects of
436

TESOL QUARTERLY



communication would not play a significant role in the assessment of students at the secondary school level.

Potential Ability
Another concept that was difficult to specify was potential ability. The
teachers frequently mentioned the importance of students’ potential
ability to be a competent communicator in a foreign language. This
potential ability was predominantly cited by the elementary school teachers. In their discourse, confidence in talking and other affective aspects
such as motivation were often considered as a sign of such potential ability,
whereas some teachers included cognitive-based abilities such as “the
ability to self-correct” (E2) and “the creative use of language” (E7) as
indicative of such potential ability. Interpersonal-related skills such as
“the ability to listen to others carefully” (E18) and “helping others to
understand” (E26) were also often included. The teachers tended to give
higher evaluations to those students who were perceived to have such
potential ability. However, the potential ability referred to by the teachers
was not the same as the gap between ability for use and performance as
described in Hymes (1972). Rather, it appeared to be similar to Vygotsky’s
(1978) zone of proximal development, which refers to the gap in developmental levels between what a child can actually do individually and what
he or she can do with assistance from adults or more capable peers. We
should note, however, that the teachers tended to view such issues from a
long-term point of view (e.g., performance once students got to secondary school or their ultimate attainment of proficiency) rather than from
an immediate one. That is to say, the potential ability that the teachers in
this study referred to included cognitive and affective traits which were
considered helpful for the children to acquire communicative competence in the long run. Such attributes might be better characterized as an
aptitude for language learning, if we could broaden our current view of
aptitude from one that predominantly restricts it to the cognitive
domain.
Although the teachers saw students’ different behaviors as signs of
potential ability, the teachers differed with respect to who had such ability. The notion of potential ability appeared to stem from the elementary school teachers’ wish to shed light on the positive side of their
students’ performance. These teachers’ inclusion of such potential

abilities in their assessments of student performance may have some
important value from a pedagogical point of view, such as giving students essentially a delayed judgment, and in turn encouraging them.
However, it also raises a number of questions regarding how to conceptualize such abilities in language assessment if teacher-based assessment
ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE

437


is also used for summative purposes. It is also important to note that
such potential ability was rarely mentioned by the secondary school
teachers.

DISCUSSION
The current study investigated how elementary and secondary school
teachers in South Korea observe and assess elementary school students’
foreign language performance in class. The study found that there was
substantial variability among the teachers in their holistic judgments and
their attitudes toward holistic observation and the importance of establishing criteria. Although both groups of teachers chose similar traits for
their evaluations, including speaking fluency, confidence in talking, listening
comprehension, motivation, and speaking accuracy, the teachers differed in
the ways in which they interpreted such traits (or constructs) and in how
they arrived at their respective evaluations of individual students. Most
notably, the teachers appeared to disagree over the interpretations of
affective aspects such as confidence in talking and motivation. The notion of
potential ability was also frequently included in the elementary school
teachers’ judgments, but not in those of the secondary school teachers. It
was evident that elementary and secondary school teachers had different
expectations of FLES and different perceptions of how assessment should
be conducted for elementary school students.
In interpreting such variability and the differences among the teachers in this study, Davison’s “cline for mapping teacher assessment beliefs,

attitudes and practices” (2004, p. 324–325) appears to be useful. As summarized in Table 3, in this framework, the teachers’ beliefs and practices
regarding teacher-based assessment are classified into five orientation
types according to the teachers’ views toward assessment tasks, assessment processes, assessment products, inconsistencies, and assessor needs.
In one of the extreme orientations, assessor as technician, the teachers are
highly restricted by criteria and see the assessment process very mechanically. At the other extreme, the assessor as God has a strong communitybound orientation, and the teachers in this orientation take a highly
personalized and intuitive approach toward assessment.
In the present data from South Korea, the elementary school teachers
tended to be more oriented toward the assessor as God and assessor as the
arbiter of “community” values viewpoints, whereas the secondary school
teachers tended to be more oriented toward the assessor as technician
and assessor as the interpreter of the law positions. As we have seen, some elementary school teachers avoided employing any standard criteria and
made their judgments in highly intuitive and personalized ways. Other
elementary school teachers were concerned with the inconsistencies in
438

TESOL QUARTERLY


ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE

439

Text focused

Seemingly
unaffected by
inconsistencies

Need better
assessment criteria


Secondary school teachers →

Transition from grammar-translation to
communicative teaching, teachers more experienced with measurement-oriented assessments and
standards, English teaching specialists, students
assessed using
standardized tests.

View of the
assessment
product

View of
inconsistencies

View of
assessor needs

Korean teachers

Context of
teaching/
assessment

Need more time for
moderation and
professional dialogue
(to make basis of
judgments more explicit)


Inconsistencies inevitable,
cannot necessarily be
resolved satisfactorily,
teachers need to rely on
professional judgment

Text and student focused

Principled, explicit but
interpretative, attuned
to local cultures/
norms/expectations

Criteria-referenced,
but localized
accommodations

Assessor as principled yet
pragmatic professional

System not open to
scrutiny, not
accountable,
operated
by the “chosen”

Seemingly unaffected
by inconsistencies


Student focused

Personalized, intuitive,
beyond analysis

Community-bound

Assessor as God

English only recently introduced, emphasis on
communicative competence and motivation, many
teachers are generalists, mentality of avoiding
measurement-oriented assessment and competition
among students.

←Elementary school teachers

Need better assessors
(to uphold standards)

Inconsistencies a problem,
threat to validity, assessor
training needs to
be improved

Student focused

Personalized, implicit,
high impressionistic,
culturally-bound


Community-referenced

Assessor as arbiter of
‘community’ values

Note. Adapted from Chris Davison, Language Testing, 21, pp. 305–334, copyright © 2004 by Sage Publications. Reprinted by permission of Sage.

Need better assessor,
training
(in interpreting criteria)

Inconsistencies
a problem, threat
to reliability

Text focused, but
awareness of student

Mechanistic,
De-personalized,
procedural, automatic, explicit, codified,
technical, seemingly
legalistic, culturally
universalized
detached

View of the
assessment
process


Criteria-based

Assessor as interpreter
of the law

Criterion-bound

Assessor as technician

View of the
assessment task

Davison’s
orientations

TABLE 3
Analysis of Teachers’ Beliefs and Practices in This Study Based on Davison (2004, p. 325)


their personal judgments and viewed these inconsistencies as a threat to
validity, as exemplified in the remark by one teacher that teachers only
pay attention to the salient features of student performance in holistic
assessments. These teachers were more open to depending on community norms but felt that they would need more training to be better assessors in order to apply standards in practice. On the other hand, the
secondary school teachers in this study tended to be more oriented
toward the assessor as technician and assessor as the interpreter of the law positions, the other end of the framework. Inconsistencies were clearly a
problem for the secondary school teachers, and many of them preferred
to have clear standards based in large part on a wish to maintain fairness.
For these teachers, fairness should be secured by having objective and
explicit criteria and applying them equally to everybody. A clear discrepancy in beliefs and practices in assessment appeared in this study between

the elementary school teachers and the secondary school teachers. Their
attitudes toward confidence in talking serve as a good example. Many secondary school teachers felt that confidence was a very important trait for
elementary school students, but many admitted that confidence was difficult to objectify and that it would not be highly valued as part of assessment once the students got to secondary school.
One can easily see that these differences in beliefs and practices
between the elementary and secondary school teachers are highly
embedded in their respective teaching and assessment contexts. English
FLES was recently introduced at the elementary school level in South
Korea as part of the reforms to change from traditional grammar-translation oriented language teaching to communicative-based language
teaching. Affective domains such as confidence in talking and motivation
have been strongly emphasized in setting the goals of FLES, as has developing communicative competence. The belief prevails among elementary school teachers that they should avoid measurement-oriented
assessment at the early stages of English learning and avoid competition
among students (Butler, 2005). At the same time, they are increasingly
held accountable for reporting individual student’s performance. The
dilemma they face and their confusion regarding assessment result in
many ways from such policy requirements as well as their various backgrounds. The inclusion of potential ability can be understood as their
attempt to shed light on their early learners’ strengths, and their motivation for doing so is readily understandable given the context in which
they are working.
Unlike the majority of elementary school teachers in South Korea who
may teach multiple subjects, the secondary school teachers are all trained
as English language teaching specialists. In their highly exam-oriented
secondary school culture, the teachers are much more used to using standards and criteria than their elementary school counterparts, though
440

TESOL QUARTERLY


many of them are not familiar with the newly implemented FLES curricula and assessment practices. Much of their assessment experience, both
as students and as teachers, has largely been with measurement-oriented
practices. They therefore still appeared to be heavily constrained by measurement-oriented notions of assessment. Although it may not be easy to
develop and adapt assessment for learning in teaching practice, the current

discrepancies in beliefs and practices in assessment between the elementary and secondary school levels are potentially very harmful for some
students, as articulated by one of the teachers cited previously.
Davison (2004) suggested a potential middle ground in her framework. This is the assessor as the principled yet pragmatic professional, or what
she also referred to as classroom-referenced assessment, as an alternative
approach to teacher-based assessment. Perhaps both elementary and secondary school teachers in South Korea need to work together to reach
this orientation and to narrow the gap between their assessment practices. In classroom-referenced assessment, “the assessor-teacher is attuned to
local cultures and expectations, yet is keen to articulate and interpret
community norms, to make explicit their own and others’ underlying criteria and to hold them up for critique” (p. 326). To date there has unfortunately been very little dialogue on assessment among teachers at the
different school levels in South Korea. In fact, this is not unique to South
Korea. This lack of dialogue appears to be a common problem in FLES
programs in different parts of the world (e.g., Bolster, Balandier-Brown, &
Rea-Dickins, 2004; Butler, 2005). As Davison indicated, although inconsistencies may be inevitable and may not necessarily be resolved in a
satisfactory fashion, a mutual understanding of each others’ teaching
practices is an indispensible first step toward helping students make a
smoother transition from elementary school to secondary school. The
current study suggests that some of the important issues that need to be
discussed among teachers include the identification of the most important traits at different grade levels, the role of affective aspects (such as
confidence and motivation) in assessment, how to account for students’
potential abilities, and so forth.

CONCLUSION
This study examined how teachers observe and assess elementary
school students’ foreign language performance in class, focusing on
FLES in South Korea as an example. It was found that the teachers within
the same school levels as well as across different school levels varied substantially in both their holistic judgment of the performance of young
learners working in groups as well as in how they arrived at such judgments. Although the elementary and secondary teachers chose similar
ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE

441



×