Tải bản đầy đủ (.pdf) (28 trang)

An intensive look at intensity and language learning

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (194.49 KB, 28 trang )

An Intensive Look at Intensity and
Language Learning
LAURA COLLINS
Concordia University
Montreal, Canada

JOANNA WHITE
Concordia University
Montreal, Canada

In this longitudinal study we investigated whether different distributions of instructional time would have differential effects on the
acquisition of English by young (aged 11–12 years) French-speaking
learners. Eleven classes of Grade 6 students (N 5 230) in two versions of
a similar intensive English as a second language program were followed
throughout their intensive experience. In one program, the 400 hours
of instruction were concentrated in a 5-month block; in the other, the
400 hours were experienced in a series of intensive exposures across the
full 10-month academic year. Language development was compared
across the two contexts four times via a battery of comprehension and
production measures. Overall, the findings showed substantial progress
over time for both groups, with no clear learning advantage for either
concentrating or distributing the intensive experience. These results
are consistent with research comparing the effects of massed and
distributed conditions on the learning of complex skills in other
domains. The practical implications of the findings for the organization
of instructional time for second language learning, as well as directions
for future research in which variables such as age, proficiency, and
learning targets are manipulated, are discussed.
doi: 10.5054/tq.2011.240858

his article reports findings from a longitudinal study of the effects


of different distributions of instructional time on the learning of
English as a second language (ESL). The research question that
motivated the study was how varying the degree of intensity of
instruction—concentrated in an uninterrupted learning experience
versus distributed across a series of successive intensive experiences—
would affect different aspects of language proficiency over time. The
findings have implications for theoretical accounts of the optimal

T

106

TESOL QUARTERLY Vol. 45, No. 1, March 2011


distribution of time for language learning and for the learning of
complex skills in general (the spaced–distributed practice phenomenon). In addition, an assessment of the relative effectiveness of the
two types of intensive programs may inform decisions made by educators
who weigh a range of practical issues when implementing intensive
instruction in the various contexts in which such courses may be offered.

THE DISTRIBUTION OF TIME IN LANGUAGE LEARNING
Within the language learning literature, it has long been recognized
that a few hours a week of exposure to a new language, even if continued
for several years, does not allow students to attain very high levels of
proficiency. This is especially true when additional access to the target
language is not readily available outside the classroom. Benseler and
Schulz (1979), for example, noted that, in an early issue of the Modern
Language Journal, Hills (1919) had lamented the level of oral proficiency
in modern languages of young Americans at that time and included the

intensifying of second language (L2) instruction at the postsecondary
level among his proposed reforms. Several decades later, Stern (1985)
reported on similar concerns within the European context. He
described an initiative for improving secondary school students’ foreign
language skills, which involved concentrating the time available for
instruction into ‘‘compact’’ courses. There is also empirical evidence
demonstrating that the distribution of small amounts of instruction over
successive school years leaves students with limited communicative
abilities in the L2 (e.g., Donato, Tucker, Wudthayagorn, & Igarashi,
2000; Netten & Germain, 2004; Spada & Lightbown, 1989).
Although many learners of all ages continue to receive their language
instruction under limited exposure conditions, intensive courses are
increasingly common across a range of contexts, institutions, and
languages. In general, a course or program is deemed intensive when
the hours available for instruction are concentrated into blocks of time,
giving students exposure to the L2 for several hours a day. The length of
the intensive experience varies widely, however. When offered as
professional development by an employer, which may require releasing
employees from their regular responsibilities, the total amount of time
may be quite short (e.g., 25 hours concentrated into a week-long course
to enable receptionists in a French-speaking region of Quebec to take
basic phone messages in English). At the other extreme, intensive
programs may continue over several months as participants strive to
achieve sufficient proficiency to pursue postsecondary education in their
L2 (e.g., the multilevel intensive ESL programs offered at Englishmedium universities around the world), to meet the language
INTENSITY AND LANGUAGE LEARNING

107



requirements of a particular job (e.g., the months of language training
in French and English provided to Canadian federal civil servants) or to
become citizens of an adopted country (e.g., the years of language
training and civics programs provided to newcomers in countries such as
Sweden). Among these prolonged programs are the well-documented
Canadian English and French intensive courses for children at the end
of elementary school, which typically last 5 months (Collins, Halter,
Lightbown, & Spada, 1999; Lightbown & Spada, 1994; Netten &
Germain, 2004).
There is evidence that an intensive language learning experience can
lead to substantial progress in an L2 in a relatively short amount of time,
for both children (Collins et al., 1999; Germain, Netten, & Movassat,
2004; Lightbown & Spada, 1994; White & Turner, 2005) and adults
(Freed, Segalowitz, & Dewey, 2004; Serrano, 2007). There is also some
evidence that the benefits of an intensive course may persist over time,
although this aspect has received less research attention (Dussault, 1997;
Lightbown & Spada, 1991). What is less clearly understood is whether
one continues to extract maximum benefit from an intensive experience
that persists over several months, or whether distributing the total time
available for instruction across a series of intensive experiences may yield
similar or possibly even superior results. In other words, how extensive
should an intensive language experience be? Are there advantages to
spacing intensive exposure?
The issue of distributing practice has received considerable attention
in the cognitive and educational psychology literature, in which the
effects of concentrating exposure in a single, or massed, learning
experience is contrasted with two or more spaced exposures to the
learning target, with total time on task held constant. The evidence in
support of the advantages to learning of spacing practice (summarized
in Dempster, 1996) is so robust and well-established that a study

investigating the effects of distributed practice in teaching physics began
by justifying why there was even a need for such research: ‘‘with the
overwhelming preponderance of evidence supporting distributed
practice, it is natural to question the value of yet another study on this
topic’’ (Grote, 1995 p. 97). If these findings are relevant for language
learning, they would suggest an advantage for spacing intensive practice.
However, there are two aspects of the research done in this paradigm
that need to be considered when discussing the implications for
language learning: time and learning targets. In these studies, a practice
session typically lasts a matter of minutes (even in the concentrated
condition), and the total time devoted to learning may be spread over
just a few hours or days (Willingham, 2002). This is far less exposure
than even the shortest of intensive experiences, and in fact, even
nonintensive language programs provide longer and more sustained
108

TESOL QUARTERLY


exposure. In addition, the convincing findings for the spacing effect
have been observed for discrete learning targets, such as nonsense
syllables, mathematical operations, lists of uncommon or specialized
vocabulary, as well as for a range of motor skills, such as typing, ball toss,
upside-down printing, and gymnastics moves (Donovan & Radosevich,
1999). It is not clear whether these findings would be relevant for the
acquisition of a more complex skill like effective communication in an
L2, especially in light of the findings from Donovan and Radosevich’s
meta-analysis of the effect sizes of 63 studies of the spacing effect. They
found much less difference between the two types of practice when the
task involved a number of distinct behaviours and choices, and a degree

of uncertainty, such as air traffic controller simulations and music
performance. Furthermore, in the relatively few experiments in which
the targets were a closer approximation to L2 learning conditions, such
as the retention of ideas rendered in spoken or written form in first
language (L1; sentences, paragraphs, or lectures), the spacing effect was
reduced or eliminated when the exposure involved paraphrase
(Dellarosa & Bourne, 1985; Glover & Corkill, 1987) or changes to the
learning context (Verkoeijen, Rikers, & Schmidt, 2004), including the
same ideas uttered by different speakers (Dellarosa & Bourne, 1985).
The language program evaluation literature does not provide
conclusive evidence in favour of a more spaced or distributed intensive
experience, either. In L2 classroom-based research investigating the
effects of the distribution of time on language learning outcomes, when
total time for instruction is held constant, the findings show more
advantages for concentrating exposure (whether in the form of half days
or full days of instruction), when comparisons are made with limited
exposure of a few hours a week (Lapkin, Hart, & Harley, 1998; Serrano &
Mun
˜ oz, 2007; Spada & Lightbown, 1989; White & Turner, 2005).1 In the
current study, however, given the well-documented shortcomings of
limited exposure, we were interested in examining a different distribution of more substantially concentrated amounts of instruction. This
issue has received less research attention. Collins et al. (1999) compared
two versions of a communicatively oriented intensive ESL2 program for
beginner-level3 11- to 12-year-old francophone students in Frenchspeaking regions of Quebec: one in which students’ ESL instruction was
massed into full days over 5 months; and a more distributed version in
1

Serrano and Mun
˜ oz (2007) found no significant differences on the final outcomes among
the adult English as a foreign language (EFL) learners in three distributions of 110 hours

of instruction: intensive, semi-intensive, and extensive. However, post-hoc comparisons of
gains scores showed differences in favour of the two intensive groups.
2
We refer to these as ESL programs, although they resemble EFL contexts, in that learners
typically have little or no exposure to English outside the classroom.
3
The students are not absolute beginners, but have very limited proficiency in English at the
outset of their intensive experience.

INTENSITY AND LANGUAGE LEARNING

109


which students spent half days in ESL over the full 10 months of the
school year. Although both groups made substantial progress on the
various measures of written production and aural comprehension used
(oral production was not assessed), there were small but significant
differences in favour of the massed condition. However, there was a time
confound in this study. Because of the complexities of the cyclical
timetable used in the 10-month model of intensive ESL, students ended
up with fewer total hours of instruction than their counterparts in the 5month model, which may have influenced the direction of the findings.
In addition, this study, like the intensive or limited-exposure comparison
studies cited earlier, looked at postprogram learning outcomes only, not
development over time. Thus we do not know whether different
distributions of intensity affect all aspects of proficiency (listening
comprehension, vocabulary growth, grammatical accuracy, etc.) in
similar ways and at similar rates. Furthermore, Collins et al. did not
investigate the impact of variations in intensity on oral skills. Given the
high priority assigned to this aspect of language learning in many

intensive language programs around the world (including intensive ESL
in Quebec), it is important to understand how the distribution of
instructional time may affect communicative competence.
In summary, although the cognitive and educational psychology
literature points to advantages for spaced or distributed learning
conditions, there are limitations (boundary conditions) on the effect
when the to-be-learned targets are more closely related to the types of
skills typically associated with L2 learning. Furthermore, although the
language program literature has shown some advantages for concentrating L2 exposure, the evidence is not conclusive, because of the role of
intervening variables, the reliance on postprogram comparisons, and the
paucity of data on oral production skills.
The current study was designed to investigate these issues in a
naturally occurring context in Quebec, in which school boards have
been experimenting with different distributions of intensive ESL for
francophone children at the Grade 6 level (11–12 year olds). In the two
conditions chosen for this study, the children received the same total
number of hours of instruction, but the time was distributed differently.
In the concentrated condition, the students were exposed to 4-5 hours
of English every day in a single 5-month intensive experience. In the
distributed condition, students received a series of mini-intensives
throughout the full 10 months of the school year; that is, 4-5 hours of
English per day in blocks of 4-, 5-, and 9-day exposures. We investigated
how varying the degree of intensity (concentrated versus distributed)
affected the development of different aspects of L2 proficiency over time
in this learner population.

110

TESOL QUARTERLY



METHODOLOGY
Context and Participants
The 230 Grade 6 (aged 11–12 years) francophone children who
participated in the study began learning ESL in Grade 3 (aged 8 years).
They had spent 90–100 hours, spread out over 3 years, in a regular,
limited-exposure ESL program. During the period covered by this study,
they were in one of two intensive ESL programs that afforded them
substantially increased exposure to English, but which differed with
respect to how the instructional time was distributed. There were five
intact classes of 5-month concentrated intensive ESL (n 5 137), all
housed in the same school, and four intact classes of 10-month
distributed intensive ESL (n 5107), located in two different schools in
the same neighbourhood. The schools were situated in two towns, each
about 1 hour outside of Montreal. There were a number of similarities
between the concentrated and distributed instructional contexts. First,
English could be considered a foreign, rather than a second, language
because there were few opportunities for exposure to English outside of
the classroom. Second, the total number of hours of ESL instruction was
the same, approximately 400 hours. Third, teachers followed a themebased approach to lesson planning and focused on the development of
speaking and listening skills and the expansion of the students’
vocabulary. A fourth similarity is that the regular French academic
curriculum was concentrated into half of the time normally allocated to
it. That is, French mother tongue, math, science, and social studies were
also taught intensively, with additional hours of homework required to
complete the grade-level objectives. Finally, participation in intensive
ESL in both intensive models was open to students from a range of
academic abilities.
The main difference between the contexts was the distribution of
instructional time. In the concentrated model, students had full days of

English every day for 5 months, from late January through June. They
had already completed their French academic program during the first
half of the year. In the distributed model, students had blocks of full
days of English alternating with blocks of full days of French for the
entire 10 months of the academic year. These distributed mini-intensives
were in one of the following two cycles, such that, over a period of
18 days, students had 9 full days of English: 4 days of English, 5 days of
French; 5 days of English, 4 days of French; or 9 days of English, 9 days of
French.
The classes were taught by seven trained, experienced ESL teachers
who were all proficient speakers of English. They all had their own
classrooms, which they were free to decorate as they wished. Although
INTENSITY AND LANGUAGE LEARNING

111


each ESL teacher in the concentrated model taught one class over the 5month period from late January to June, each ESL teacher in the
distributed model taught two groups over 10 months, alternating with
the French teacher and following the pattern just described. Thus there
were five teachers for the five concentrated groups and two teachers for
the four distributed groups.

Procedure
All students were pretested at the beginning of their intensive ESL
program to establish that they had similar knowledge of English. They
were tested four more times at 100-hour intervals during their respective
programs: after 100, 200, 300, and 400 hours of instruction (henceforth
Time 1, Time 2, Time 3, and Time 4). Language proficiency tests
included measures of oral and written production and of aural and

written comprehension. In addition, although we had carefully selected
the participating teachers and groups so that they were as similar as
possible with respect to the points mentioned above (including students’
level of English, total instructional time, curriculum, and language
teaching approach) and had included several groups from each
condition to mitigate teacher effects, we know as experienced language
teachers ourselves that teachers and classes can differ from each other in
ways that cannot all be controlled for in classroom-based research. To
identify any instructional practices that might distinguish the groups or
be important in interpreting the findings, four classes, two from the 5month program and two from the 10-month program, were observed
four times, for the entire school day, during each of the four testing
weeks. In addition, teacher and student questionnaires were administered to all participants during the final testing session.4 The data
collection schedule is presented in Table 1. The paper and pencil
measures were administered to whole classes and took 45–60 minutes to
complete. Times for oral measures are explained below.

Instruments
All tasks used in the study were either original or adapted versions of
tasks that had been used in previous studies with students of the same
age and proficiency level.
4

112

The teacher questionnaires also provided biographical data on the teachers’ experience
and education (reported earlier), whereas the student questionnaires yielded information
on language use and language attitudes, which will be relevant in follow up studies of
individual performance.

TESOL QUARTERLY



TABLE 1
Data Collection Schedule

Paper and pencil
measures (N 5
244)

Pretest

Time 1

Time 2

Time 3

Time 4

AVR
Dictation
Cloze

AVR
Y/N
Vocabulary
Narrative
Information
gap


ASC
Y/N
Vocabulary
Narrative
Role-play

Dictation
Y/N
Vocabulary
Narrative
Information
gap

MEQ test
Y/N
Vocabulary
Narrative
Role-play

Oral measures
(N 5 108)

Note. AVR 5 aural vocabulary recognition; ASC 5 aural sentence comprehension; MEQ 5
Ministry of Education of Quebec; Y/N 5 yes or no.

Pretests
The three pretests were an aural vocabulary recognition (AVR) test, a
dictation test, and a cloze test. The AVR required students to match 80
words spoken on a tape to pictures on a series of pages in a test booklet.
Scores were based on the total number of words correctly identified. No

data from students scoring above 75% on the AVR were retained for the
study. The cloze passage contained 10 blanks in a 54-word text about a
school routine. Students were given credit for any word that made sense
in the context (for sample items, see Collins et al., 1999). The dictation
was a 50-word text about a vacation, which had been used previously in
the Barcelona Age Factor Project (Mun
˜ oz, 2006). Students were given
one point for each correct word.
Longitudinal Tests
Vocabulary knowledge. A yes–no vocabulary recognition test, adapted
from Meara (1992), evaluated familiarity with the 1,000 most frequent
words of English. It consisted of a checklist that contained 120 real words
and 60 nonsense words. A different version of the test was administered
at each of the four testing times.
Narrative writing. A picture-prompted written narrative task was used
at each testing session (adapted from Collins et al, 1999; see also
Lightbown, Halter, White, & Horst, 2002). Four different pictures were
chosen to match learners’ developing vocabulary, based on our
knowledge of the classroom themes typically used during the intensive
programs. The first two pictures involved animals, family members, body
parts, and occupations, while the last two allowed students to imagine a
range of activities, relationships, and outcomes. The prompt in each case
was Imagine what is happening now, what happened before, and what is going to
happen next. Learners were given 15–20 minutes to write and were
encouraged to use their imagination and to provide as many details as
they could. No dictionaries were permitted, but to encourage students to
write as much as possible, they were told that they could use a French
INTENSITY AND LANGUAGE LEARNING

113



word if they got stuck. There were two measures for this task: fluency
(number of words in the text) and grammatical knowledge (use of verb
inflections).
Listening skills.The students’ listening skills improve dramatically in
these intensive programs, creating the problem of potential ceiling
effects on the readministration of tasks that had been appropriate in the
initial testing times. Consequently, we used a different measure at each
testing time. At Time 1, we readministered the AVR, which measured the
ability to match individual words to pictures. At Time 2, a 20-item aural
sentence comprehension task required learners to match a sentence to
one of three pictures (Mun
˜ oz, 2006). At Time 3, we readministered the
dictation from the pretest. At Time 4, a 32-item general listening
comprehension test required the interpretation of short utterances. This
test was developed in the 1980s by the Quebec Ministry of Education for
Grade 9 ESL students and has been used frequently in Grade 6 intensive
ESL research in Quebec (see Collins et al., 1999 for a sample item).
Oral interaction. To measure communicative effectiveness, two timed,
paired oral interaction measures were used with a stratified subset of 12
students from each class (n 5 108), selected on the basis of pretest
performance and in consultation with the teachers. Students were
paired with a different partner from their class for each of the four
testing times (information gap at Times 1 and 3; role play at Times 2 and
4), and each partner received the same score (see White & Turner, 2005,
regarding co-constructed oral scores).5
The information gap task was presented as a game similar to those
used in intensive classes. Each student in the pair described five items
that were missing in his or her partner’s picture (a school at Time 1, a

house at Time 3), so that the student’s partner could draw the objects in
the correct place. A screen prevented students from seeing each other’s
picture and forced them to rely on words, rather than gestures, to locate
objects in the different rooms. They had five minutes to complete the
task. The task was audio-recorded, and the drawings were collected.
Following the scoring procedure developed by White and Turner
(2005), three points were allocated for each item: one point if the
student drew the correct object, one point if it was in the correct room,
and one point if the object was in the correct location. The total possible
score was 30 (10 items 6 3 points). One point was subtracted if L1French was used to describe the object, room, or location.
The role-play task was administered at Times 2 and 4. It was modeled
on one developed for the Barcelona Age Factor Project (Mun
˜ oz, 2006).
There were two versions of the task, which was completed in pairs,
5

114

Because of time constraints, only one oral task was administered at each testing session.
The information gap task was more appropriate than the role play task at Time 1, because
it did not require extensive discourse.

TESOL QUARTERLY


recorded, and transcribed. In each, one learner playing a child
negotiated with another learner playing a parent over having a party
(Time 2) or getting a pet (Time 4). Learners were required to assume
their roles without the intervention of the task supervisor, and they were
expected to stay on topic and respond to each other in English, to

discuss different aspects of the situation, to elaborate and justify their
points of view. They were given 2 minutes to negotiate a solution.
Because there were no established guidelines for scoring a role-play task
carried out by two students, we developed a five-level global rating
scheme for this task in collaboration with our Barcelona colleagues. See
Table A1 in Appendix A for a description of the rating scale.
The information gap and role-play tasks are typical of the ones used in
intensive classes, where the primary focus is on the development of oral
interaction skills. Students often engage in pair or small group problemsolving activities and skits, working cooperatively with different
classmates of varying linguistic abilities, and making the most of
whatever language they have to accomplish the task. The tasks thus
have high ecological validity. They also have construct validity, in that
learners demonstrate the type of competence targeted by the intensive
program, namely oral communicative effectiveness in terms of successful
mutual comprehension rather than grammatical accuracy.

Classroom observations
Members of the research team spent a full day observing and
videotaping in the classrooms of two teachers in each program the same
week that longitudinal data were collected, that is, Times 1, 2, 3, and 4.
The observations were later coded following an adapted version of the
Communicative Orientation of Language Teaching (COLT) scheme
(Spada & Fro¨hlich, 1995).

Questionnaires
At Time 4, questionnaires were administered to students and
teachers. The student questionnaire contained 12 items asking students
their attitudes toward English and the intensive experience, what they
could and could not do in English, and their opportunities for exposure
to English outside the classroom. The 19 items of the teacher

questionnaire asked teachers for information about the students in
their class, the ESL materials and activities they used, their focus on skills
and language features, their teaching experience and training, and their
own language learning experiences.

INTENSITY AND LANGUAGE LEARNING

115


ANALYSES AND FINDINGS
This section is divided into three parts. The first reports results from
the pretests. The second reports results from the longitudinal tests of
different aspects of learning (vocabulary knowledge, listening skills, oral
interaction, and narrative writing skills). In the final part, relevant
information about the instructional contexts yielded from the classroom
observations and teacher questionnaires is summarized.

Pretests
Independent t-tests were run on each of the three pretest measures,
the AVR, cloze, and dictation. To maximize the probability of detecting
differences between groups, the alpha level was set at 0.0.05. The
analyses reveal no significant differences between the program groups
(concentrated and distributed) on any of the three pretest measures: the
AVR, t (236) 5 3.69, p 5 0.712; the cloze, t (236) 5 2.46, p 5 0.806; or
the dictation, t (236) 5 1.710, p 5 0.089. Thus students in the two
groups started their intensive experience with similarly limited knowledge of English.

Longitudinal Measures
The findings for the measures of different aspects of L2 knowledge

and performance over time are reported by proficiency area. A total of
five analysis of variance (ANOVA) tests and four t-tests were carried out.
To avoid a type 1 error inherent in multiple comparisons, the alpha level
for all main effects and interactions was set at 0.05 and adjusted for the
nine statistical tests to 0.005. For any post-hoc pairwise comparisons,
alpha was set at 0.05. The Bonferroni adjustment for the number of
pairwise comparisons was applied to the significance levels of the
comparisons. Partial eta squared (gp2) was used to estimate the effect
size.6
Vocabulary Recognition (Yes/No)
A mixed between-within ANOVA revealed a significant effect for time,
F(3, 642) 5 126.350, p , 0.005, and a significant interaction between
6

116

A reviewer asked whether multiple ANOVA (MANOVA) would be a more appropriate
procedure. Using MANOVA would indeed protect from the possibility of making a type 1
error inherent in using separate ANOVA tests or t-tests for each dependent variable.
However, because the tests that make up the dependent variables are not all administered
at the same time or in the same number of repetitions (for reasons described earlier), it is
impossible to enter all of them into one MANOVA. To avoid making a type 1 error, the
alpha was adjusted for the number of analyses carried out.

TESOL QUARTERLY


time and group F(3, 642) 5 5.311, p , 0.005. The effect size for time was
moderate (gp2 5 0.371) and very small for the time–group interaction
(gp2 5 0.024). As can be seen in Figure 1, the performance over time of

the two groups was very similar; the only significant between-group
difference occurred at Time 4 (p , 0.05) and was very small, both in
terms of actual score difference (5%) and effect size (gp2 5 0.029).
There was also little difference in rate of learning within groups across
time. With the exception of the Time 1 and Time 2 comparison in the
concentrated intensive group and Time 2 and Time 3 in the distributed
group, all other time comparisons were significant for both groups (see
Tables 3 and 4 in Appendix B for a complete summary of the withingroup comparisons). Overall, the recognition of common English words
steadily improved in both groups, such that, by the end of the intensive
experience, it can be estimated that students were familiar with roughly
75% of the 1,000 most frequent words of the English language.
Listening Skills
As explained earlier, listening skills were measured by a different task
at each of the four times. They were thus analyzed through four separate
t-tests. The findings for each of the tasks are illustrated in Figures 2–5. At
Time 1 there was no significant difference between the groups on the
AVR, t (235) 5 1.473, p 5 0.142. At Time 2, there was also no significant
difference between the groups on the aural sentence comprehension
task, t(232) 5 2.800, p 5 0.006. However, at Time 3, there was a small but
significant difference between the groups on the dictation task in favour
of the concentrated group, t (233) 5 3.056, p , 0.005. At Time 4, on the
MEQ test of general listening comprehension, the significant difference
was again in favour of the concentrated group, t (232) 5 7.599, p ,

FIGURE 1. Vocabulary recognition (*p , 0.05).

INTENSITY AND LANGUAGE LEARNING

117



0.005. In summary, small but significant differences in listening skills in
favour of the concentrated intensity group appeared midway through
the program (at Time 3) and persisted through to the end of the
program (at Time 4).
Oral Interaction
As explained in the Methodology section, a subset of 108 students was
followed for the paired tasks over the four testing times. Interrater
reliability for both oral interaction tasks was high: 95% agreement for
information gap and 90% for role-play.
Information Gap
The findings for the information gap task are displayed in Figure 6. A
mixed between-within ANOVA revealed a significant effect for time,
F(1,52) 5 76.551, p , 0.005 with a moderate effect size (gp2 5 0.595).
There was no significant difference for group, F(1, 52) 5 7.537, p 5 0.008,
and no interaction between time and group, F(1, 52) 5 0.639, p 5 0.428 .
Thus both groups improved significantly over time and at similar rates.
Role Play
The role-play findings are displayed in Figure 7. A mixed betweenwithin ANOVA revealed a significant effect for time only, F(1, 47)7 5
51.951, p , 0.005, with a moderate effect size (gp2 5 0.525). There was
no significant difference for group, F(1, 47) 5 0.118, p 5 0.773, and no
interaction between time and group, F(1, 47) 5 3.252, p 5 0.078. Thus
both groups showed similar performance on the role-play at Times 2 and

FIGURE 2. Listening comprehension: aural vocabulary recognition task.

7

118


The number of pairs for the role-play is slightly lower than for the information gap,
because an equipment malfunction resulted in the loss of some of the role play data.

TESOL QUARTERLY


FIGURE 3. Listening comprehension: aural sentence comprehension task.

4, and both experienced a significant and comparable improvement on
this task over time.
Narrative Skills (Writing)
Each student produced four written narratives, one at each of the
testing times. Mixed between-within ANOVAs were performed on both
the length of the narratives in words (a fluency measure) and the use of
verbal morphology on a four-point scale (a grammatical knowledge
measure). The findings are displayed in Figures 8 and 9.

Length of narratives
There was a main effect for time F(3, 630) 5 78.755, p , 0.005, and
group F(1, 210) 5 10.716, p , 0.005. There was also an interaction
between time and group, F(3, 630) 5 24.390, p , 0.005, with a moderate
effect size for the time difference (gp2 5 0.467) and a small effect size
for the interaction (gp2 5 0.187). The between-group pairwise

FIGURE 4. Listening comprehension: dictation task (*p , 0.005).

INTENSITY AND LANGUAGE LEARNING

119



FIGURE 5. Listening comprehension: Ministry of Education of Quebec test (*p , 0.005).

comparisons revealed a significant difference at Time 4 only (p , 0.05),
in favour of the concentrated group. The within-group pairwise
comparisons (see Table B3 in Appendix B) for the concentrated group
were all significant, indicating that the students’ narratives got
progressively longer over the course of the intensive experience. For
the distributed group (Table B4 in Appendix B), overall there was also a
significant improvement in length of narrative from the beginning to
the end of the intensive program (Time 1 and Time 4), but the profile
between the four sampling times shows flat performance between Times
1 and 2 and a drop-off in performance between Times 3 and 4 (see
Figure 8).

FIGURE 6. Oral interaction: information gap.

120

TESOL QUARTERLY


FIGURE 7. Oral interaction: role-play.

Knowledge of verb inflections
The scoring of the verb inflections was based on the four-point scale
developed by Collins et al. (1999) for the same task. Level 1 represented
no use of inflections; Level 2, emergent use; Level 3, developing use; and
Level 4, productive use. Interrater reliability between two raters was 88%,
reflecting the difficulty of determining productive use in some of the

shorter texts produced at Times 1 and 2.
There was a main effect for time [F(3, 630) 5 131.669, p , 0.005] and
for the interaction between time and group [F(3, 630) 5 19.899, p ,
0.001], with a moderate effect size for the time difference (gp2 5 0.385)
and a small effect size for the interaction (gp2 5 0.087). The

FIGURE 8. Writing: narrative length (*p , 0.05).

INTENSITY AND LANGUAGE LEARNING

121


FIGURE 9. Writing: use of verb inflections in narratives (*p , 0.05).

between-group pairwise comparisons revealed a significant difference at
Time 1 (p , 0.05), in favour of the distributed group, and at Times 3 (p
, 0.05) and 4 (p , 0.05) in favour of the concentrated group. The
within-group pairwise comparisons (see Tables B5 and B6 in Appendix
B) for both groups showed significant differences in use of inflections
between Time 1 and Times 2, 3, and 4, indicating that the students’
knowledge of verb inflections in both groups progressively improved
over time. However, as Figure 9 shows, the concentrated group displayed
steady improvement between times, whereas the distributed group’s
progress appeared to level off between Times 3 and 4.
To summarize the findings on the narrative task, although both groups
demonstrated improvement in ability to write sustained narratives in
English (as measured by length of compositions) and to use verbal
morphology to situate the events in time (as measured by the use of verb
inflections), the within-group analyses point to plateaus in improvement

in the distributed group, particularly between Times 3 and 4.
A synthesis of the findings across tasks for the two groups is displayed
in Table 2. The significant differences for the various measures at the
four sampling times are highlighted. Before turning to the interpretation of the student performance in the two models of intensity, we first
report on the main findings from the classroom observations and
teacher questionnaires.

Classroom Observations and Teacher Questionnaires
The main purpose of the classroom observations and the teacher
questionnaires was to confirm that the teaching approaches in the two
122

TESOL QUARTERLY


TABLE 2
Summary of Significant Differences
Time 1
Vocabulary recognition
Listening comprehension
Oral production
Written production:
Length
Written production:
Verb inflections

Time 2

Time 3


Time 4

*C

*C
*C
*C

*D

*C

*C

Note. *C 5 significant difference in favour of concentrated intensity; *D 5 significant difference
in favour of distributed intensity.

models were similar. Indeed, all teachers in both models followed the
communicatively oriented approach mandated by the Quebec Ministry
of Education, emphasizing speaking and listening skills and the
development of students’ vocabulary through a series of common
themes. With rare exceptions, English was used at all times by the
teacher and the students for teacher–student and student–student
interactions. There were, however, three differences in the pedagogical
contexts of the two models that emerged from these qualitative
measures that merit comment.
In the concentrated model, all five teachers had worked at the same
school for many years and sometimes consulted with each other
regarding day-to-day teaching matters. Although the two teachers in
the distributed model also had some contact with each other, the

opportunities for collaboration were not as frequent. The closer
collaboration in the concentrated model yielded two pedagogical
practices relevant to our study. The first is that all five concentrated
model teachers assigned half an hour of TV watching in English as a
daily homework activity. This amount was increased to an hour in the
latter half of the program. This TV-watching homework was the initiative
of the teachers who participated in the study and not a ‘‘program’’
difference per se. Students who were not able to complete the
homework on a given day were required to make up the time on a
subsequent day, and discussions of the content of programs watched
were a regular feature of these classes. In the distributed model,
although some students did indeed watch TV in English outside of class
time, this was not assigned as homework and was thus not a regular
feature of the program for all students. Consequently, students in the
condensed model continually received additional daily exposure to aural
English.
Another pedagogical difference surfaced at Time 3, when verb
conjugation charts appeared on the walls of the concentrated students’
INTENSITY AND LANGUAGE LEARNING

123


classrooms. Research assistants observed students consulting the charts
during the writing of their narratives, and the teachers confirmed that
they had begun encouraging students to make use of this reference
material during both oral and written production. These visual
reminders of verb tenses were not displayed in any of the distributed
model classrooms. That is not to say that this aspect of language did not
receive some pedagogical attention (i.e., it was present in some of the

language materials common to both groups), but it appeared to receive
greater sustained emphasis in the concentrated classes.
A third factor differentiating the two intensive contexts was the
scheduling of the academic testing in L1 French mandated by the
Ministry of Education for all Grade 6 students in the province of
Quebec. The tests focus on all academic subjects and are spread out over
several days. In the concentrated version of intensity, the testing took
place in January, at the end of the students’ intensive French L1
academic program, and prior to the intensive ESL experience. In the
distributed model of intensity, in which students were alternating L1
academic subjects with ESL across the 10 months of the school year,
students took the Ministry tests in June, along with the rest of the Grade
6 students in the province. This was also the same period of time in
which we were administering the Time 4 tasks. Not only did the
provincial testing affect the scheduling of the ESL sessions (some
regularly scheduled English periods were replaced by testing or review
sessions in French), it also affected students’ concentration on ESL,
including their normally enthusiastic participation in the language
measures used in this study. This was particularly noteworthy during the
writing of the narrative. It was always the final task in the battery, and
because students took different amounts of time to complete it (within
the overall time limit of 15 minutes), they were instructed to take out a
pleasure book to read in English while waiting for their other classmates
to finish the task. At Time 4, many distributed model students raced
through the task and then took out a French textbook or notebook to
review for one of the tests. Further evidence of the students’ shift in
focus was observed in the narrative itself, which was written in response
to a picture of a school-yard altercation among a group of students. One
participant who had written a well-structured 89-word narrative at Time 3
cut her narrative short at Time 4 to 67 words and added: ‘‘Today we will do

important exams and I hope I will have a good result, because it’s the Minister’s
exams.’’ Other students incorporated the exam theme into their
narrative, even though there was no obvious reason from the picture
to do so. ‘‘Today Lana are not happy. She have many exam.’’ ‘‘The boy’s said at
the girl’s: ‘you are so stupid because you have 5% in your exam.’’8
8

124

The authors of these two extracts had both written longer and better structured narratives
at Time 3 (121 versus 108 and 158 versus 107 words, respectively).

TESOL QUARTERLY


In the discussion section we refer to these factors in interpreting the
overall findings of student performance in the two models.
Summary of Findings
Varying the degree of intensity appears to have influenced the
language learning outcomes of these students in three main ways.
1. Table 2 shows that, when there were significant differences between the
two groups, in all but one case they were in favour of the more
concentrated version of intensity. The only exception to this trend was a
small difference in the use of verb inflections in favour of the distributed
model at Time 1; but neither group, in fact, used inflections productively
at this time.
2. Of all observed differences, the most consistent across time was with
listening skills and use of verb inflections in the latter half of the intensive
programs: At Times 3 and 4 the concentrated group out-performed the
distributed group on the final two listening comprehension measures

and on the verb morphology scale for the final two picture-prompted
narratives.
3. The greatest number of differences between the two groups was observed
at Time 4, during the last few weeks of the learners’ intensive experience.
The concentrated group had superior performance on three of the five
tests. Only the oral production tasks (information gap at Times 1 and 3;
role-play at Times 2 and 4) yielded similar results for the two groups.

In summary, there were significant differences between the two
groups on 7 of the 20 between-group comparisons displayed in Table 2.
Of the significant differences, six showed advantages for the concentrated group, compared to just one for the distributed group.

DISCUSSION
The research question posed at the outset of the study asked whether
varying the degree of intensity would affect the development of different
aspects of L2 proficiency over time for these young francophone ESL
learners. The findings of significant differences on 6 of the 20 measures
in favour of the concentrated group would seem to suggest that the
beginner level students in this study derived somewhat greater benefits
from a language learning program in which the hours available for
instruction were concentrated into a single, sustained intensive exposure
to the L2, as opposed to a series of intensive experiences distributed
across time. However, there are a number of reasons to consider a much
more nuanced interpretation of the results.
The first is the size of the observed between-group differences: Not
only did the gap between actual scores tend to be quite small, but the
INTENSITY AND LANGUAGE LEARNING

125



magnitude of the effect sizes for group differences was also small. Score
differences and effect sizes for differences across time, however, were
more substantial, underlining the finding that students in both versions
of intensity made considerable progress.
In addition, although many key aspects of the contexts and
participants in the two intensive models were similar, information
obtained through classroom observations and teacher questionnaires
pointed to a few differences in the pedagogical contexts, unrelated to
the distribution of instructional time. These may explain some of the
between-group differences, notably those related to listening, use of verb
inflections, and performance overall at the end of the intensive
experience (Time 4). The TV homework feature of the concentrated
students’ intensive experience resulted in extra practice with listening
comprehension (3.5 hours a week in the first half of the program;
7 hours a week in the latter half), which may have contributed to the
superior performance of these students on the listening tasks at Times 3
and 4. The significant difference in the use of verb inflections in the
written narratives by concentrated model students at Times 3 and 4
coincided with the increased attention given to this aspect of language at
this point in their intensive experience, as evidenced by the verb chart
reminders on the classroom walls. Finally, the distributed model
students’ preoccupation with the end-of-year exams in L1 French may
help explain the superior performance of the concentrated group on
three of the five measures in English at Time 4, as well as the decline in
length of narrative from Time 3 to Time 4 in the distributed group. The
concentrated group had completed the provincial exams in French
before beginning their intensive ESL experience.
Given these observations about instructional practices, it is not
possible to argue with confidence that students in the concentrated

model were superior in listening, use of verb inflections, or overall
performance as a direct result of the distribution of instructional time.
Indeed, the more nuanced interpretation of the findings is that both
models of intensive instruction resulted in substantial ESL learning over
the course of the 400 hours by all students. Students in both programs
who had started off with very limited knowledge of English emerged at
the end of their intensive experience as confident, intermediate-level
users of the L2. Overall, both groups made considerable progress in
listening and oral skills and in recognizing common English words. They
also improved in their fluency at recounting a narrative (as measured by
length of the texts) and in their ability to use grammatical markers to
situate events in time in a narrative (as measured by use of verb
inflections in the narratives). Unlike differences between groups, which
tended to be small with small effect sizes, observed differences over time
for both groups were substantial and yielded moderate-to-large effect
126

TESOL QUARTERLY


sizes. Moreover, factors other than the distribution of intensity may have
played a role in the small number of significant between-group
differences that were found.
The finding of little practical difference between the two models of
intensive learning is consistent with research comparing the effects of
massed and distributed conditions on the learning of complex skills.
Donovan and Radosevich’s (1999) review of studies on the two
conditions concluded that the strong spacing effect observed in previous
task performance studies may be limited to relatively simple tasks. None
of the tests used to assess learning in this study measured a discrete set of

learning targets, and all required control over a range of types of
knowledge (lexical, grammatical, phonological) in contexts with varying
degrees of unpredictability. Although the yes/no vocabulary test
sampled individual words from among the 1,000 most frequent words
of English, none of the words had been the target of the type of
deliberate practice that is associated with the studies in which the strong
advantage for spaced practice has been observed. Thus one issue of
interest for future research is whether experimental manipulation of
predetermined learning targets, more in line with typical spacing effect
research, will show differential effects under different intensive learning
conditions. Potential language items for such research could be a set of
vocabulary items, a subsystem of the language (such as tense and
aspect), or aural and oral discrimination of specific speech sounds.
Another important direction for future research will be to explore
whether proficiency also influences the amount of learning that takes
place in massed and distributed learning contexts. Mumford, Costanza,
Baughman, Threlfall, and Fleishman (1994), for example, hypothesized
that participants who possess a certain knowledge threshold may extract
more benefits from massed (concentrated) practice than novices, who
must develop the knowledge structures needed to perform complex
tasks. In this study, the students were all novices at the outset; whether
more advanced learners would have extracted greater benefit from a
single intensive experience than several shorter ones remains an
empirical question.
In light of the modest differences between the two distributions of
instructional time and the substantial progress made in both program
versions, the findings do not suggest a clear learning advantage for
either concentrating or distributing the intensive experience, at least
among this population of learners. This finding is important in the
research context in which the study took place. Although we chose

schools where access to the special intensive program was not restricted
to students with special abilities, this is not typical of the 5-month model.
The perceived challenge of students completing the regular Grade 6
curriculum in L1 French in just 5 months often results in participation
INTENSITY AND LANGUAGE LEARNING

127


being limited to students with strong academic profiles from the
previous school year. Spreading the French L1 portions of the
curriculum over the 10 months is seen as being less academically
challenging, which may result in the distributed version being offered to
a wider range of students.
The finding of strong results for mini-intensive L2 courses may be
important for other populations of learners and may meet local needs
better than the more common longer intensive courses. Immigrant and
refugee children in reception classes around the world, who are typically
separated for months from their same-age native-speaking peers, could
instead have a series of mini-intensives. This would give them periodic
language support while allowing them to be partially integrated into
regular classes, with access to rich input from teachers and classmates
and opportunities to make new friends. Not only would the children be
part of the regular school program, but their L2 teachers would also be
better integrated than when they are doing long stretches of intensive
teaching on their own.
Mini-intensives may also be more practical to organize than long
intensives and may make language instruction accessible to a wider
range of students. Adults, for example, especially those who are
employed and need to develop L2 skills as part of a job requirement,

may find it much easier to participate in a series of intensive experiences
spread out over time than to negotiate several weeks or months of
release time to take language courses. Finally, mini-intensives may also
increase motivation, because the plateaus that typically occur during L2
learning may be less evident in successive learning experiences than a
single intensive one.

CONCLUSION
As noted at the outset of this report, there is clear evidence that
limited exposure to an L2, even if continued over several school years,
does not afford students the opportunity to advance very far in their
learning. The research issue of this article was whether there were clear
advantages to different distributions of substantial amounts of L2
instructional time.9 Our findings suggest that, over the time period
represented by this study (400 hours distributed over 5 or 10 months),
students’ performance on the various measures of proficiency that we
used was quite comparable. Our findings also point to the importance in
classroom-based research of including qualitative measures (observing
classrooms, obtaining information from teachers) to document the
instructional contexts, even when those contexts are familiar and when
9

128

See Mun
˜ oz (2008) for discussion of substantial exposure in foreign language classrooms.

TESOL QUARTERLY



other measures of control over variables are in place. This information
proved to be crucial in interpreting our findings.
The longitudinal design of the study allowed us to track progress over
the several months of the intensive programs. However, it did not
continue beyond this time, which does not allow us to know whether the
knowledge acquired in a single sustained intensive experience is as
robust as that which is acquired in successive intensive experiences,
including whether the effects are the same for all aspects of language
proficiency. There is evidence from comparisons between intensive and
semester-length psychology courses for university students that the initial
superiority of a one-time intensive experience may diminish if the
knowledge does not continue to be used (Seamon, 2004). This could
suggest that successive intensive language experiences may prove
superior over time, but this is clearly an area for further research.
Answers to this and the questions raised earlier will be relevant not
only to language program planners but also to anyone involved in
teaching, administering, designing, and taking L2 courses.
ACKNOWLEDGMENTS
This research was supported through grants from the TESOL International Research
Fund and the Quebec Ministry of Education (Fonds Que´becois de la Recherche sur
la Socie´te´ et la Culture). A preliminary version of the findings was presented at the
joint AILA/AAALconference in Madison, Wisconsin, in July 2005. We would like to
acknowledge the contributions of our project manager Suzy Springer, our statistics
advisor Randall Halter, our research collaborators Carmen Mun
˜ oz and Carolyn
Turner, and our large team of student research assistants. We are also grateful to
Patsy M. Lightbown and the anonymous reviewers of TESOL Quarterly for their
insightful comments on earlier versions of the manuscript.

THE AUTHORS

Laura Collins and Joanna White are both associate professors of TESL and applied
linguistics in the Department of Education at Concordia University in Montreal,
Quebec, Canada. In their teaching and research, they focus on ways of maximizing
the benefits of instruction for second language learners across a variety of classroom
contexts and language features.

REFERENCES
Benesler, D., & Schulz, R. A. (1979). Intensive foreign language courses. Arlington, VA:
Center for Applied Linguistics.
Collins, L., Halter, R. H., Lightbown, P. M., & Spada, N. (1999). Time and the
distribution of time in second language instruction. TESOL Quarterly, 33, 655–
680. doi:10.2307/3587881.
Dellarosa, D., & Bourne, L. (1985). Surface form and the spacing effect. Memory and
Cognition, 13, 529–537.
INTENSITY AND LANGUAGE LEARNING

129


Dempster, F. (1996). Distributing and managing the conditions of encoding and
practice. In E. Bjork & R. Bjork (Eds), Memory (pp. 317–344). San Diego, CA:
Academic Press.
Donato, R., Tucker, R., Wudthayagorn, J., & Igarashi, K. (2000). Converging
evidence: Attitudes, achievements, and instruction in the later years of FLES.
Foreign Language Annals, 33, 377–392. doi:10.1111/j.1944-9720.2000.tb00620.x.
Donovan, J., & Radosevich, D. (1999). A meta-analytic review of the distribution of
practice effect: Now you see it, now you don’t. Journal of Applied Psychology, 84,
795–805. doi:10.1037/0021-9010.84.5.795.
Dussault, B. (1997). Les effets a` long terme de l’esnsiegnement intensif de l’anglais, language
seconde [Long-term effects of intensive ESL instruction] (Unpublished master’s

thesis), Universite´ du Que´bec a` Montre´al, Montreal, Canada.
Freed, B., Segalowitz, N., & Dewey, D. (2004). Context of learning and second
language fluency in French: Comparing regular classroom, study abroad, and
intensive domestic immersion programs. Studies in Second Language Acquisition,
26, 275–301.
Germain, C., Netten, J., & Movassat, P. (2004). L’e´valuation de la production orale
en franc¸ais intensif: Crite´res et re´sultats. The Canadian Modern Language Review,
60, 295–308.
Glover, J., & Corkill, A. (1987). Influence of paraphrased repetitions on the spacing
effect. Journal of Educational Psychology, 79, 198–199. doi:10.1037/00220663.79.2.198.
Grote, M. (1995). Distributed versus massed practice in high school physics. School
Science and Mathematics, 95, 97–101. doi:10.1111/j.1949-8594.1995.tb15736.x.
Hills, E.C. (1919). Has the war proved that our methods of teaching modern
languages in the colleges are wrong? The Modern Language Journal, 4, 1–13.
doi:10.2307/313791.
Lapkin, S., Hart, D., & Harley, B. (1998). Case study of compact core French models:
Attitudes and achievement. In S. Lapkin (Ed.), French second language education in
Canada: Empirical studies (pp. 3–30). Toronto, Canada: University of Toronto
Press.
Lightbown, P. M., Halter, R., White, J., & Horst, M. (2002). Comprehension-based
learning: The limits of ‘‘Do it yourself.’’ The Canadian Modern Language Review, 58,
427–464.
Lightbown, P. M., & Spada, N. (1991). E´tude a` long terme de l’apprentissage intensif
de l’anglais, langue seconde, au primaire [Long-term study of intensive ESL
teaching in primary school]. The Canadian Modern Language Review, 53, 315–355.
Lightbown, P. M., & Spada, N. (1994). An innovative program for primary ESL in
Quebec. TESOL Quarterly, 28, 563–579. doi:10.2307/3587308.
Meara, P. (1992). EFL vocabulary tests. Swansea, Wales: University College, Centre for
Applied Language Studies.
Mumford, D., Costanza, D., Baughman, W., Threlfall, K., & Fleishman, E. (1994).

Influence of abilities on performance during practice: Effects of massed and
distributed practice. Journal of Educational Psychology, 86, 134–144. doi:10.1037/
0022-0663.86.1.134.
Mun
˜ oz, C. (2006). Age and the rate of foreign language learning. Clevedon, England:
Multilingual Matters.
Mun
˜ oz, C. (2008). Symmetries and assymmetries of age effects in naturalistic and
instructed L2 learning. Applied Linguistics, 29, 578–596. doi:10.1093/applin/
amm056.
Netten, J., & Germain, C. (2004). Introduction: Intensive French. The Canadian
Modern Language Review, 60, 251–262.
130

TESOL QUARTERLY


×