Tải bản đầy đủ (.pdf) (22 trang)

A corpus based study of the linguistic features and processes which influence the way collocations are formed some implications for the learning of collocations

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (136.74 KB, 22 trang )

A Corpus-Based Study of the Linguistic
Features and Processes Which Influence
the Way Collocations Are Formed: Some
Implications for the Learning of
Collocations
CRAYTON PHILLIP WALKER
University of Birmingham
Birmingham, England

In this article I examine the collocational behaviour of groups of
semantically related verbs (e.g., head, run, manage) and nouns (e.g.,
issue, factor, aspect) from the domain of business English. The results of
this corpus-based study show that much of the collocational behaviour
exhibited by these lexical items can be explained by examining some of
the linguistic features and processes which influence the way collocations are formed. These include the semantics of the individual items
themselves, the use of metaphor, semantic prosody, and the tendency
for many of the selected items to be part of larger phraseological units.
I show that it is possible to explain many of these collocations by
considering the linguistic features and processes which have influenced
the way they have been formed. My contention is that, if the learner is
encouraged to look for an explanation, it makes the process of learning
collocations more memorable.
doi: 10.5054/tq.2011.247710

he subject of collocation has received considerable attention in the
field of language teaching over recent years. A number of authors
(Lewis, 1993, 1997, 2000; McCarthy, 1990; Nation, 2001; Thornbury,
2002; Woolard, 2000) have represented collocations as being either
partially or fully arbitrary, and several studies (Benson, 1989; Nesselhauf,
2003, 2005; Smadja & McKeown, 1991) have even used arbitrariness as
part of their definition of what constitutes a collocation. Lewis claimed


that ‘‘collocation is an arbitrary linguistic phenomenon’’ (Lewis, 1997,
p. 32), and, as a consequence, teachers are urged not to attempt to
explain collocations to their learners.
If collocations are simply arbitrary combinations of words, it means
that the foreign language learner has little option but to memorise large
numbers of collocations with very little in the way of explanation or any

T

TESOL QUARTERLY Vol. 45, No. 2, June 2011

291


other help in memorising them. The learner is liable to become very
dependent on a dictionary, especially a collocational dictionary,
checking whether a particular combination is acceptable or not before
using it in his or her writing. If, on the other hand, there is some sort of
explanation as to why a particular word is frequently found in the
company of one or more others, it means that the foreign language
learner is able to understand how and why a particular combination is
frequently used by native speakers. Instead of trying to remember large
numbers of collocations, the learner would be able to produce some of
these combinations by using his or her understanding of the linguistic
features and processes which influenced the way they were formed.
More recently there have been a few publications (Crowther, Dignen,
& Lea, 2002; McCarthy & O’Dell, 2005) which have taken the position
that not all collocations are arbitrary and have started to present
collocations in such a way that students can begin to understand why one
particular word is frequently found in the company of another.

Unfortunately, there is very little research so far to support this position.
Although Kennedy (2003) did not go into the question directly, his
corpus-based research concerning the collocational behaviour of
adverbs of degree or amplifiers (e.g., absolutely, completely, utterly, rather,
about, somewhat) seems to show that the collocations they form are not
that arbitrary. Liu (2010) is one of the few studies which critically
examined the accepted definition of collocation and found that many
collocations can be explained using a combination of techniques drawn
from the disciplines of corpus linguistics and cognitive linguistics.
The aim of the current study is to show that collocation is not simply
an arbitrary phenomenon but is a process which can be partially
explained by examining some of the linguistic features and processes
which influence the way collocations are formed. In order to do this, the
study uses a corpus-based methodology to investigate the collocational
behaviour of groups of semantically related nouns and verbs taken from
the domain of business English. The study found that the process of
collocation is influenced by, for example, the precise meaning or
meanings of a particular lexical item, the use of metaphor, and any
phraseological behaviour or semantic prosody associated with the item.

COLLOCATION
In this article the term a collocation (countable noun) is used to refer
to a combination of two or more words which occur together or in close
proximity to each other in both written and spoken discourse, whereas
the term collocation (uncountable noun) is used in a more general sense

292

TESOL QUARTERLY



to refer to ‘‘the habitual co-occurrence of individual lexical items’’
(Crystal, 2003, p. 82).
It is clear from the literature that a collocation is defined in a variety
of ways, and that these different definitions reflect differences in
approach, the only common denominator being that the term is used to
refer to some kind of syntagmatic relationship between words. However,
it is possible to group the different definitions into two broad categories,
those which use what I call a lexical approach to collocation (Carter,
1987; Cowie, 1998; Howarth, 1996, 1998) and those which use a
frequency or statistically based approach (Moon, 1998; Nesselhauf, 2003,
2005; Sinclair, 1991). Studies which follow a lexical approach use lexical
criteria to decide whether a particular combination can be classified as a
collocation or not. According to this approach a collocation will typically
exhibit a degree of fixedness and/or a lack of transparency in meaning.
There is a tendency with this type of approach to create categories (e.g.,
unrestricted, semirestricted, familiar, and restricted collocations; Carter,
1987, p. 63) based on the lexical characteristics exhibited by different
combinations.
Studies which use a frequency or statistically based approach generally
consider a collocation to be a co-occurrence of words within a certain
distance of each other. Collocations are seen as being co-occurrences
that are ‘‘more frequent than could be expected if words combined
randomly in a language’’ (Nesselhauf, 2005, pp. 11–12). Frequencybased approaches are often associated with the work of Sinclair, whose
own approach to collocation was, in turn, influenced by the work of Firth
(1957, 1968). Collocations are viewed more in terms of probability,
where the strength of a particular collocation is assessed on the basis of
how frequently it appears in a large representative sample of discourse.
According to Halliday, ‘‘the native speaker’s knowledge of his language
will not take the form of his accepting or rejecting a given collocation:

he will react to something as more acceptable or less acceptable on a
scale of acceptability’’ (1966, p. 159). In other words, the question is not
whether something is a collocation or not but rather whether a
particular collocation is more or less acceptable.
This means that there are virtually no impossible collocations, but
that some collocations are much more likely to occur than others.
However, as Halliday has pointed out, there is a need for at least one
cutoff point in order to eliminate combinations which are simply the
result of a random distribution of items within the discourse. Sinclair,
writing in the Office of Scientific and Technical Information (OSTI)
report (Krishnamurthy, 2004) first circulated in 1970,1 used the term
1

The original OSTI report (1970) only had a limited distribution but has recently been
republished. This new edition, entitled English Collocation Studies, is edited by Ramesh
Krishnamurthy (2004).

A CORPUS-BASED STUDY OF COLLOCATION

293


significant collocations to refer to combinations which co-occur more
frequently than ‘‘their respective frequencies and the length of the text
in which they appear would predict’’ (Sinclair, Jones, & Daley, 1970,
p. 10). Sinclair (1966) also used three very useful terms for any
discussion of collocation.
We may use the term node to refer to an item whose collocations we are
studying, and we may define a span as the number of lexical items on each
side of a node that we consider relevant to that node. Items in the

environment set by the span we will call collocates. (Sinclair, 1966, p. 415)

Writing in the OSTI report, Sinclair went on to explain that there is
essentially no difference in status between the node and a collocate.
if word A is a node and word B one of its collocates, when B is studied as a
node, word A will be one of its collocates. In practice, however, it is
convenient to examine the behaviour of one item at a time and the use of the
two terms enables a useful distinction to be made when describing results.
(Sinclair et al., 1970, p. 10)

Sinclair and Jones (1974) proposed a span of four words on either side
of the node word. The following nomenclature is normally used to
describe the positions in the span; node –1 to –4 describe the four
positions to the left of the node and node +1 to +4 describe the positions
to the right, as can be seen in the example below:
the
long
node –4 node –3

and
node –2

painful process
node –1 node

of
rebuilding
node +1 node +2

this

Country
node +3 node +4

Although there is some statistical basis for using a span of four words
(Mason, 1997, 1999), the distance between a collocate and a node will
depend on both lexical and grammatical elements. For example, the
distance between the node and the collocate(s) will normally be greater
in the case of verb/noun collocations compared with adjective/noun or
noun/noun combinations, and consequently it may be necessary to use
a wider span when verb/noun collocations are being examined.
Arguably, a frequency or statistical approach is more suited to a
corpus-based methodology, because it enables large quantities of spoken
or written discourse stored on a computer to be analysed by software
programmes (concordancing packages) which can extract the most
frequent, or the most statistically significant collocates associated with a
particular node. These programmes can be used to rank collocates
according to frequency or statistical significance for each of the different
positions within the span. It is also possible to specify a cutoff point, as
proposed by Halliday (1966), in order to eliminate combinations which
294

TESOL QUARTERLY


may simply be the result of random distribution. The approach to
collocation used in the current study has been influenced by both the
lexical and frequency or statistically based approaches.

SEMANTIC PROSODY
I would like to briefly discuss semantic prosody here in the

introductory section, because the concept is referred to a number of
times later in the article. The term semantic prosody2 was first used by
Louw in an article published in 1993, where he credits Sinclair with
having provided him with both the idea and the term in a personal
communication. Sinclair (1991) examined the collocational behaviour
of the phrasal verb set in and found that most of the subjects associated
with it referred to ‘‘unpleasant states of affairs’’ (Sinclair, 1991, p. 74).
Louw suggested that semantic prosody is the result of a diachronic
process, whereby meaning has been transferred from one word or words
to another, and defined semantic prosody as being a ‘‘consistent aura of
meaning with which a form is imbued by its collocates’’ (1993, p. 157).
The term semantic prosody is also used by some writers (Nelson, 2006;
Sinclair, 1996, 2004a, 2004b; Stubbs, 2001, 2009) in a wider sense to
describe the way in which a lexical item can develop one of a range of
different prosodies such as ‘‘ ‘something nasty’ or ‘something worrying’
or ‘disturbing’ [ . . . ] ‘something magnificent’, ‘socially appropriate’
‘positively constructive’ etc.’’ (Sinclair, 2004b, p. 173). However, it can be
argued that when the term is used in this wider sense, it is simply
reflecting the rather complex and multifaceted nature of the meaning of
a lexical item. I have chosen to limit the use of the term in this article to
Louw’s original notion of a lexical item having either a positive or
negative prosody, depending on whether it is frequently associated with
collocates which refer to desirable or undesirable items or events.

METHODOLOGY
The main corpus used in the current study was the Bank of English
(BoE)3 which is a large corpus of general English consisting of 450
million words. A second more specialised corpus of business English was
also used in order to check that the results obtained from the corpus of
2

3

For a comprehensive account of semantic prosody, please refer to Stewart (2009).
The Bank of English (BoE) corpus is jointly owned by HarperCollins Publishers and the
University of Birmingham. During 2003 to 2006, when most of the research for this study
was carried out, the corpus contained 450 million words.

A CORPUS-BASED STUDY OF COLLOCATION

295


general English are valid in the domain of business English. The second
corpus, which was made up of commercial and financial data files from
the British National Corpus,4 contains 6.3 million words. In the current
study this second more specialised corpus is referred to as the British
National Commercial Corpus (BNCc).
The lexical items selected for study (i.e., the nodes), referred to as the
selected items, were chosen for two reasons. First, because they are all
high-frequency items in the BNCc, and, second, because each item
within a particular group is a partial or close synonym of the other (e.g.,
process was chosen because it is a close synonym of procedure and system).
Experience gained from the pilot studies showed that it was more
fruitful to establish the collocational behaviour of a particular selected
item by comparing its collocational behaviour with that of a synonym or
near synonym. It was therefore decided to study the collocational
behaviour of groups of synonyms or near synonyms rather than of
individual items. Table 1 shows the four groups of items selected for
study (the table does not include plural forms, which were also studied).
Synonymy, near-synonymy, and frequency were not the only criteria

used when selecting the items. Some items were chosen because they are
particularly important within the context of teaching business English
(e.g., RUN,5 HEAD, MANAGE), whereas others were selected because, in
my experience,6 learners frequently have difficulties using the item or
items appropriately (e.g., issue, aspect, factor). These difficulties are
frequently caused by cross-linguistic factors, such as a level of semantic
incongruency between items in the learner’s first language and the
target language or the fact that the item already exists as a loan word in
the first language.
As already mentioned, it is often difficult to attach a precise level of
significance to a list of collocates ranked solely according to the number of
times they occur (raw frequency) together with the node. For this reason
statistical measures such as t-score7 are used in order to assign a more
precise level of significance to each co-occurrence. For example, any
collocate with a t-score of 2.00 or above can be regarded as significant
(Barnbrook, 1996, p. 98); that is, the way that it combines with the node is
4

The British National Corpus (BNC) is a 100 million word corpus developed in the 1980s. It
is maintained and distributed by the Oxford University Computer Service (OUCS). http://
www.natcorp.ox.ac.uk
5
Capital letters are used to indicate that reference is being made to all members of a lemma.
RUN, for example, refers to run, ran, runs, running. In this study the lemma is only used
with verb forms, that is, the members of Group 1.
6
I spent 18 years in Germany teaching business English in large organisations such as Bosch
GmbH, Audi AG, Siemens AG, and Deutsch Bank AG.
7
The t-score is a statistical instrument which is used to measure distribution, or more

specifically how the distribution of something deviates from what is standard. For more
information regarding t-score, please refer to Barnbrook (1996) and Hunston (2002).

296

TESOL QUARTERLY


TABLE 1
The Four Groups of Selected Items
Group
1
2
3
4

Item
issue, aspect, factor
aim, objective, target, goal*
RUN, HEAD, MANAGE, DEAL with, HANDLE
system, process, procedure

Noun forms
Noun forms
Verb forms
Noun forms

not simply the result of random distribution. The t-score value usually
reflects how frequently a particular combination occurs in the corpus, that
is, the more frequent the collocation, the higher the t-score. Given that

there have been a number of reservations expressed about the use of
statistical measures in corpus research (Clear, 1993; Stubbs, 1995), both tscore and raw frequency data are included in the current study.
The first stage of the research consisted of establishing a collocational
profile for each of the selected items using a corpus of general English,
in this case the BoE. This involved identifying the most frequent
collocates for each of the positions within a span of four words to the left
and right of the node. The easiest way to do this with the BoE is to use
the picture function, which identifies and ranks the most frequent
collocates for each of the positions within the span. Table 2 shows a tpicture for the node word aspect where the collocates are ranked
according to their t-score values, with the highest (i.e., the most
significant collocates) at the top of each of the four columns to the left
and to the right of the node.
During the first stage of the research, all the relevant information from
the collocational profile was carefully recorded. This involved listing each
of the 20 most frequent collocates together with its t-score value and raw
frequency; that is, the number of times the collocate was found to occur
with the node in this particular position. If one takes the first group of
selected items (issue, aspect, factor) as an example, an examination of the
TABLE 2
A t-Picture for the Node Word Aspect Where the Most Frequent Collocates Are Ranked
According to Their t-Score Values
is
perhaps
there
this
not
it
but
has
,p.

focus

the
is
on
there
,p.
about
to
or
this
was

most
the
a
an
on
one
only
about
every
this

every
one
this
an
important
another

any
other
some
particular

aspect
aspect
aspect
aspect
aspect
aspect
aspect
aspect
aspect
aspect

A CORPUS-BASED STUDY OF COLLOCATION

Of
Is
Ratio
between
Which
computing
Call
however
though
to

The

This
His
Our
their
your
life
it
her
my

life
s
game
work
lives
is
business
job
policy
relations

is
s
that
which
life
was
work
and
system

write

297


corpus data reveals both shared collocates, that is, collocates which are
frequently associated with all the selected items in the group, and
characteristic collocates, that is, collocates which are more frequently
associated with one item in the group. It is significant that, as a general
rule, shared collocates are more frequent than characteristic ones.
Once the data had been recorded, they could then be manually
examined for characteristic collocations (e.g., controversial issue, worrying
aspect, growth factor), which reflect the precise meaning of individual
items within the group. The data were also examined for fixed or
semifixed phrases (e.g., every aspect of, take issue with), for collocations
which reflect either different polysemous or homonymous forms (e.g.,
the latest issue of, a controversial issue, a share issue), for signs of a particular
semantic prosody (e.g., a long and difficult process) and for the use of
metaphor (e.g., meet the 3% target).
The aim of the second stage of the research was to establish a
collocational profile for each selected item using a corpus of business
English. By comparing the two profiles (i.e., the profile obtained from
the BNCc and the profile obtained using the BoE) it was possible to
establish whether there are any significant differences in the way the
selected items are used in a business domain compared with a more
general one. In this particular case, the results showed that there were
very few differences in the way the selected items are used in the two
domains and, as a result, much of the data used in this article have been
taken from the BoE because, as the larger of the two corpora, it is liable
to yield more reliable results.


RESULTS AND DISCUSSION
Results from the current study show that there are a number of
linguistic features and processes which influence the way in which
collocations are formed. The first of these is concerned with semantic
and pragmatic features associated with the selected item (i.e., the node
word) itself.

Semantics and Usage
The corpus data show how items such as issue, aspect, or factor are
frequently used as cohesive devices in both spoken and written
discourse. Halliday and Hasan (1976) used the term general noun8 to
8

298

Francis (1994, pp. 83–88) used the terms ‘‘advance labels’’ and ‘‘retrospective labels’’ to
refer to nouns or noun groups which are frequently used to label stretches of text.
Partington (1998) also examined the way in which these general nouns function as
cohesive devices.

TESOL QUARTERLY


refer to a class of nouns (and noun phrases) which are frequently used
as cohesive devices in text. They are part of the system of deixis in
English and function as proforms, which typically refer to either
individual items (e.g., place, man, woman, boy) or to whole stretches of
discourse (e.g., situation, question of, issue). Seven out of the ten most
frequent node 21 collocates associated with issue, aspect, and factor

belong to a group of evaluative adjectives (important, key, main, major,
crucial, critical, vital) which seem to have the same semantic function—
that of attributing a level of importance to the node word. These shared
collocates were found to be associated mainly with the way in which issue,
aspect, and factor are used as general nouns. Here is one example taken
from the BoE data, where issue refers forward to what the writer regards
as being the most important issue in the presidential election.
By far the most important issue in the campaign was the state of the national
economy. Clinton won because he presented himself as a competent,
moderate alternative to a president who was perceived as having failed to
manage the economy.

Although the shared collocates are generally the most frequent collocates,
it can be argued that the characteristic collocates, which normally occur
slightly lower down in any list of collocates ordered by t-score or
frequency, are more useful to learners as they highlight slight but
significant differences in the way that the selected items from a particular
group are used. In the case of the items from Group 1, for example, an
issue is frequently seen as something which is contentious and controversial,
whereas an aspect is something which can be worrying or disturbing
(Table 3). Factor, on the other hand, was found to be frequently associated
with more technical usages (e.g., growth factor) but is also used in a kind of
pseudotechnical way (e.g., feel-good factor), which may be an attempt to
bring a sense of objectivity to something that can only really be measured
by more subjective means. Table 3 shows the most frequent node 21
collocates associated with issue, aspect, and factor. For example, sensitive
issue example, issue is the node and sensitive is the collocate which appears
in the node 21 position. The values in Table 3 show that this collocation
occurs 316 times in the BoE and a t-score value of 17.77 is a measure of the
statistical significance of this combination i.e., the collocation is not

simply the result of chance as the t-score is well above 2.00.
Although it is clear from the data that all three Group 1 items are
frequently used as cohesive devices, it is also clear from the characteristic
collocates that they do not all have the same meaning and associations.
By choosing one item over another, the user is obviously making some
form of evaluation and is not simply referring to another item or stretch
of discourse in a neutral manner. It is precisely these slight but
A CORPUS-BASED STUDY OF COLLOCATION

299


TABLE 3
The Most Frequent Characteristic Collocates Associated With Issue, Aspect, and Factor
Issue
Node –1
sensitive
contentious
controversial
worrying
disturbing
pleasing
risk
feel-good
growth

Aspect

Factor


t-Score

Frequency

t-Score

Frequency

t-Score

Frequency

17.77
15.45
15.09
2.64
1.99
0.00
1.00
0.00
2.00

316
239
229
7
4
0
1
0

4

2.00
2.45
6.32
8.42
6.41
5.29
0.00
0.00
0.00

4
6
40
71
41
28
0
0
0

1.00
1.41
1.00
3.60
3.16
2.23
19.61
18.11

13.69

1
2
1
13
10
5
386
329
189

Note. Data are from the Bank of English.

significant differences in usage and therefore in meaning, that learners
need to be aware of in order to use the target language effectively.

Polysemy and Homonymy
Where a word has a number of different senses it is normally the
collocates in the surrounding cotext which can be used to disambiguate
the item. It is possible, for example, to identify three different senses of
issue in the corpus data, each associated with different characteristic
collocates (e.g., contentious issue, latest issue, share issue). The values in
Table 4 show, for example, that there are 535 occurrences of the
collocation political issue, 452 of latest issue, and 1,024 of rights issue in the
BoE.
The corpus data from the current study show that one of the most
significant features which influences the way collocations are formed is
the semantics of the selected item, and, where the item has two or more
distinct senses, each of them is generally associated with a different set of

characteristic collocates. Hoey (2005) argued that, where ambiguity is
possible, speakers deliberately avoid collocates that increase this
ambiguity and generally choose ones which decrease it.
However, it was not always so easy to discern a clear number of
different senses for a particular selected item from the corpus data. An
examination of the most frequent node 21 collocates for system, for
example, shows how it is used to refer to a variety of different types of
system and that it is possible to group these collocates according to the
type of system they refer to (Table 5). In this case the collocates have
been grouped together to reveal seven different types of system, but this
is based on my own rather subjective judgement, and the number of
different types of system would seem to vary according to who is doing
300

TESOL QUARTERLY


TABLE 4
The Most Frequent Collocates Associated With Three Meanings of Issue
Issue (meaning 1)

Issue (meaning 2)

Node –1

Fret-score quency

political
Palestinian
contentious

controversial
thorny

22.05
18.09
15.43
14.96
12.59

535
332
239
229
159

Node
–1
latest
current
next
special
last

Issue (meaning 3)

Fret-Score quency
20.80
18.57
16.78
14.45

14.02

452
368
282
240
197

Node –1

t-Score

Frequency

rightsa
bond
share
stock
currency

33.48
19.41
18.66
6.95
6.76

1024
377
349
49

46

Note. Data are from the BoE. aCollocations such as human rights issue or civil rights issue are not
included in the values for frequency or t-score.

the grouping. The Collins COBUILD Advanced Learner’s Dictionary
(Sinclair, 2006), for example, lists six different types of system, whereas
the Oxford Advanced Learner’s Dictionary (Wehmeier, 2005) and the
Longman Dictionary of Contemporary English (Summers, 2003) only list
three and four different types, respectively.
The corpus data also contained a number of verbal collocates which
were associated with specific types of system. For example, verbs such as
DEPRESS and STIMULATE were found to be associated with biological
systems, INSTALL and ASSEMBLE with technical systems, and REFORM
and RESTRUCTURE with social systems. The following concordance lines
taken from the BoE serve to illustrate some of these associations.

TABLE 5
The Most Frequent Node –1 Collocates Associated With Seven Different Types of System
Node –1

t-Score

Frequency

social systems
legal
education

39.31

36.90

1,554
1,380

business systems
management
accounting

18.78
10.56

transport systems
transport
rail
geographical systems
solar
river

Node –1

t-Score

Frequency

political systems
capitalist
democratic

18.78

18.31

354
341

461
112

technical systems
computer
telephone

40.11
17.16

1,610
305

23.62
14.99

565
225

biological systems
immune
nervous

55.43
46.67


3,074
2,200

40.17
11.04

1,615
122

Note. Data are from the Bank of English.

A CORPUS-BASED STUDY OF COLLOCATION

301


key factor, because that can depress the human immune system." Analyses
a chemical transmitter that stimulates the heart, digestive system and
s one of the first brewers to install a cellar cooling system free from
his fever, Professor Saito assembled a temporary distillation system
ent has unveiled its plan for reform of the banking system. Treasury office
of worker participation, to restructure the social security system, and

These collocations result from the semantic relationship which exists
between the verb and the relevant noun phrase, and the majority of
these verbs appear to have precise meanings which limit the number of
possible associations. On the other hand, the higher frequency verbs
such as DEVELOP, INTRODUCE, and USE, which seem to have less
precise meanings, were found to be associated with a larger number of

different types of system. INTRODUCE, for example, was found to be
associated with at least five different types of system (political, social,
technical, business, transport), as can be seen from these concordance
lines taken from the BoE data.
the first country to introduce a state education system. 1877: Edison rec
ity is also planning to introduce a pensions forecasting system that will
agency had attempted to introduce a new computer system and compulsory passp
production manager had introduced a daily bonus system but he proposed that
a positive move towards introducing an integrated transport system. Rod Lit

A comparison of the data for node 21 collocates in the two corpora
showed that collocates which refer to business or technical systems (e.g.,
management, computer) occur more frequently in the BNCc, whereas
collocates which refer to biological or geographical systems (e.g.,
immune, solar) occur more frequently in the BoE. Differences in the
frequencies in the two corpora of collocates which refer to either social,
political, or transport systems were found to be less significant. These
findings would seem to reflect the difference in the content of the two
corpora, and it is only to be expected that a corpus of business English
will include more occurrences of collocates which refer to business and
technical systems.

Semantic Prosody
There is evidence from both corpora to show that the word process may
have a negative semantic prosody and that this has a significant
influence upon its collocational behaviour. Corpus data for both the
singular and plural forms of process show how they are associated more
frequently with adjectives which refer to negative attributes rather than
with adjectives which refer to positive ones. However, this negative
semantic prosody only seems to be associated with the individual items

(i.e., process or processes) and not with noun phrases containing process or
302

TESOL QUARTERLY


processes (e.g., learning process, manufacturing process, biological processes,
etc.). The left-hand column of Table 6 shows the most frequent
attributive adjectives associated with process (e.g., long, lengthy, slow,
gradual + process), whereas the data in the right-hand columns show how
their antonyms are less frequently associated with process (e.g., short,
quick, fast, painless + process). The values also show how this negative
prosody does not occur consistently throughout and that, for example,
process is also associated (although not quite so frequently) with more
positive adjectival collocates such as simple and easy.
Further evidence for this negative prosody can be seen in the pattern
adjective and adjective + process. The adjectives which most frequently
appear within this pattern are nearly all negative, as can be seen in these
concordance lines taken from the BoE data.
italism could only be a slow and gradual process because of the generali
the case through the long and expensive process of trial and appeals. La
spent fuel is a dangerous and complex process. But there is no law to is
the beginning of a long and painful process of rebuilding this count
-optic cable. It’s an expensive and slow process. There are estimates tha
scheme cuts out a lengthy and difficult process of obtaining rechecks of

There is also corpus evidence from the current study to show how a
negative or positive semantic prosody can only really be attributed to one
or more senses of a word or phrase and not to an item as a whole. The
data for DEAL with, for example, show how it has at least seven different

but related senses, but only three are associated with collocates which
refer to negative items or events (e.g., deal with stress, the problem,
wrongdoers). It is therefore only possible to attribute a negative semantic
prosody to a few of the different senses.

TABLE 6
Most Frequent Node 21 Attributive Adjectives Associated With Process (left-hand columns) and
How Frequently Their Antonyms Occur in the Node 21 Position With Process (right-hand
column)
Node 21

t-Score

Frequency

long
lengthy
slow
gradual
complex
difficult
painful

17.42
11.74
16.90
12.20
13.37
11.31
12.44


304
138
286
149
179
128
155

Node 21

t-Score

Frequency

short

1.73

3

quick
fast
simple
easy
painless

3.60
1.73
9.16

6.56
3.74

13
3
84
43
14

Note. Data are from the Bank of English.

A CORPUS-BASED STUDY OF COLLOCATION

303


Metaphor
Another linguistic feature which influences the process of collocation
is the use of metaphor. Data from both corpora show how some of the
features associated with the literal senses of target and goal are retained
when the items are used metaphorically. For example, in its literal sense,
a prototypical target is something which has been identified, something
which is to be aimed for, and something which you can either hit or
miss. The literal senses of goal refer to either the wooden structure or to
something which is scored when the ball enters the area formed by the
posts and crossbar. Some of the features associated with the literal senses
of goal, such as the fact that a player strives to score a goal during the
game of football or that one can generally see if a goal has been scored
or not, influence the way the word is used metaphorically. This retention
of features can be seen in the way that, for example, the metaphorical

senses of target and goal are more frequently associated with verbs such as
SET, HIT, MISS, REACH and MEET (Table 7). However, the data also
show how only certain features are mapped (Ko¨vecses, 2002) from the
literal onto the metaphorical sense, and that other features such as the
fact that a target is often destroyed, or that a goal is rectangular, would
seem to be ignored when the items are used metaphorically. Findings
from the current study add to the weight of evidence from other corpusbased studies (Deignan, 1997, 2005), which show how only certain
features associated with the literal sense of an item are mapped onto the
metaphorical.
The fact that target, and to a lesser extent goal, were found to be
associated with numerical values supports the proposition that the
metaphorical senses of target and goal are frequently associated with
exact values and that this feature of exactness has been mapped from
the literal to the metaphorical senses of both items. It is also clear from
TABLE 7
Data Show How the Verbs SET, HIT, MISS, MEET, and REACH Occur Far More Frequently With
Target and Goal
Aim
Node –3

t-Score

Frequency

SET
HIT
MEET
REACH
MISS


1.00
1.41
2.45
1.00
1.00

1
2
6
1
1

Goala

Objective

Target

Fret-Score quency

Fret-Score quency

2.64
1.00
2.45
2.45
1.00

7
1

6
7
1

18.91
8.72
8.72
7.14
3.46

331
76
76
51
12

t-Score

Frequency

11.57
11.18
4.00
6.40
3.61

134
125
16
41

13

Note. Data are from the Bank of English. aIn the case of goal, only the metaphorical sense has
been included in the data.

304

TESOL QUARTERLY


the data that this feature of exactness is largely lacking in the case of aim
and objective. The following examples taken from the BoE show how
numerical values are associated with the metaphorical senses of target
and goal.
than anything seen so far to meet that 3% target. But, as the prime minist in
January - more than double the 2,500 target for job losses outlined look
forward to surpassing the $1 million goal for the Hospice Endowment
showrooms, with an eventual sales target of 100,000 cars a year -double they
did not wait longer than the target of 18 months. Health watchdogs
extra year of life to achieve its goal of 1000 processors. Fuchi says

This feature of exactness can also be seen in the way that target is
frequently combined with prepositions such as on, above, or below in
order to describe, for example, the financial position of a company or
project. The following examples taken from the BNCc data illustrate the
way these prepositional phrases are used to describe the relationship
between the planned and the actual situation.
the bonuses but you tell me I am on target for the large bonus in April. was
sorry twenty seven percent above target er for the quarter and most of that
profit levels were 37 per below target in 1949, 19 per below in 1950 a


Phraseology
Some of the selected items were found to be associated with a range of
different types of fixed or semi-fixed phraseological units. The
preposition of, for example, directly follows aspect and aspects in 75% of
all occurrences of the items in the BoE. In the majority of cases this is
not because aspect of or aspects of are frequent combinations in
themselves, but because they are elements of a whole series of longer
sequences (Table 8). The phrases all aspects of, some aspects of, one aspect of,
and every aspect of, for example, account for 23% of the total number of
occurrences of aspect of and aspects of in the BoE. It was also found that
the phrase one aspect of occurs far more frequently in the corpus data
than other combinations such as two aspects of or three aspects of, an
indication that one aspect of is more than simply a loose grouping of
items.
The phraseological units associated with aspect are both fixed and
compositional. However, there are also examples in the corpus data of
units which are less fixed and only partially compositional, and TAKE
issue with is an example of one of these. Although the meaning of the
phrase appears to be related to one of the three different senses of issue,
the phrase TAKE issue with would also seem to have its own distinct
A CORPUS-BASED STUDY OF COLLOCATION

305


TABLE 8
The Most Frequent Node 21 Collocates Associated With Aspect of and Aspects of
Aspect of
Node –1

every
one
an
this
important

t-Score

Frequency

36.94
27.35
21.85
21.28
20.51

1,373
786
534
520
426

Aspects of
Age of
total
15%
9%
6%
6%
5%


Node –1

t-Score

Frequency

Age of
total

all
other
many
some
certain

40.50
30.06
25.36
23.45
19.66

1,707
939
666
588
391

13%
7%

5%
4%
4%

Note. Data are from the Bank of English.

meaning (i.e., to disagree with something someone said). It can be seen
from the corpus data that the collocates associated with issue when it
appears as part of the phrase TAKE issue with are very different (e.g.,
polite, strong, fierce) to those associated with issue when it occurs as a single
item (e.g., controversial, contentious, political). The following examples
taken from the BoE data serve to illustrate this point.
editor, Ian Black, took polite issue with some of Pilger’s more outla of the
stiffs May I take gentle issue with Morton Schatzman’s pessimism in
the Netherlands - took strong issue with his colleague. While he diff salty
but Debs and I took fierce issue with him, having helped ourselves

The fact that a phrase such as TAKE issue with was found to be associated
with its own set of characteristic collocates would seem to suggest that
the phrase has developed a meaning of its own, probably as a result of
some form of lexicalisation process.
It is obvious from the corpus data that some of the selected items are
associated with various types of phraseological units and that these units
generally have their own collocational behaviour. Consequently, any
phraseological behaviour associated with a particular lexical item needs
to be taken into account when attempting to describe and explain its
collocational behaviour.

Some Implications for the Learning of Collocations
Far from being purely arbitrary combinations of words, evidence from

the current study shows how some collocations can be partially or fully
explained by considering one or more linguistic features or processes
which played a part in their formation. In order to make the process of
learning collocations more meaningful, and hence more memorable,
language learners need to be aware of these explanations. A study of
collocational exercises in three course books designed to teach business
306

TESOL QUARTERLY


English (Walker, 2008, p. 198) found that this type of exercise typically
asks the learner to ‘‘match items on the left with items on the right.’’ In
order to successfully match all the items in an exercise, the learner will
often have to take four or five different linguistic features or processes
into account. This type of exercise should focus on one feature or
process at a time in order to present both the collocations themselves
but also an explanation of why these words are frequently found
together.
A contemporary English language teaching course book contains
many grammatical exercises that are designed so that the learner is able
to derive the rule from the results of the exercise, and current
methodology frequently emphasises the importance of allowing the
learners to deduce grammatical rules for themselves (Brown, 2001;
Cook, 2001; Harmer, 2007). There is no reason why many of the
exercises which present or practise collocations could not be designed in
exactly the same way. Learners would be asked to complete the
collocational exercise and to speculate as to the reason why, for
example, verbs such as SET, REACH, and MEET are associated with target
and goal rather than with aim or objective. A collocational exercise could,

for instance, focus on the different senses associated with a polysemous
or homonymous item or the way that some of these senses may be
associated with a negative semantic prosody. Where possible, collocations should be explained in the language classroom in order to help
with their memorability, and encouraging learners to look for an
explanation will help them to increase their awareness of the linguistic
features and processes which influence the way collocations are formed.
As part of the current study (Walker, 2009), the entries in three
learner’s dictionaries and three collocational dictionaries for each of the
fifteen selected items were examined and their content compared with
the findings from this study. Results from the examination of the
learner’s dictionaries9 showed that most of the collocations included in
the entries were chosen in order to exemplify different aspects of the
definition of the headword. Although most of the collocations included
in the entries are the same or similar to those revealed by the current
study, it is apparent that these dictionaries have tended to select the
most frequent collocates (e.g., important/major/key/crucial factor; Longman
Dictionary of Contemporary English [Summers, 2003, p. 561]), whereas
findings from the current study show that it would be beneficial for
learners if these dictionaries included more characteristic collocates
(e.g., risk/growth/feel-good factor). The entries in the three dictionaries
often failed to explain important differences in meaning between items
9

The three dictionaries examined were the Collins COBUILD Advanced Learner’s English
Dictionary (5th edition), the Longman Dictionary of Contemporary English (4th edition), and
the Oxford Advanced Learner’s Dictionary (7th edition).

A CORPUS-BASED STUDY OF COLLOCATION

307



such as aim, objective, target, and goal or RUN, HEAD, and MANAGE. If the
dictionaries focused less on the most frequent collocates and included
more characteristic collocates (i.e., the slightly less frequent collocates),
it would help to bring these slight but significant differences in meaning
to the fore.
A comparison of the collocates listed in the entries for the selected
items in the three collocational dictionaries showed that there is a
considerable lack of agreement in the content of the three dictionaries.
The results of the comparison showed that, for instance, only 3% of the
total number of collocates listed appear in all three dictionaries and that
more than 80% appear in only one of the three. This lack of agreement
seems to result from differences in what each of the dictionaries regards
as a collocation. The BBI Dictionary of English Word Combinations (Benson,
Benson, & Ilson, 1997), for example, includes large numbers of
grammatical collocations (e.g., concerned about, blockade against, angry
at) in its entries, whereas both the Oxford Collocations Dictionary
(Crowther et al., 2002) and the Dictionary of Selected Collocations (Hill &
Lewis, 2002) concentrate more on lexical collocations.
There are also differences in the way that the three collocational
dictionaries order the collocates within an entry. The BBI Dictionary and
the Dictionary of Selected Collocations list collocates alphabetically, whereas
the Oxford Collocations Dictionary groups collocates with similar or related
meanings together. This helps to show how the collocates relate to the
different senses of the headword in exactly the same way that grouping
the most frequent collocates of system revealed seven or so different types
of system. Grouping collocates alphabetically obscures this semantic
relationship and, once again, encourages the learner to think of
collocations as being arbitrary combinations.

Results from the current study show that some collocations are not
simply arbitrary combinations and can, to some extent, be explained. An
examination of three business English course books, learner’s dictionaries, and more specialised dictionaries of collocation shows that
collocations are often presented and practiced with little or no
explanation as to why a native speaker frequently uses particular
combinations. Dilin Liu (2010) showed that, by combining techniques
used in corpus linguistics with approaches used in cognitive linguistics, it
is possible to demonstrate how many collocations are either partially or
fully motivated. Unfortunately for the learner, although a corpus-based
cognitive analysis may be successful in explaining collocations which
have been formed as a result of polysemy or homonymy, the use of
metaphor, or simply as a result of the precise semantics of the node and
its collocate(s), it may be less successful in explaining collocations which
have been influenced by factors such as semantic prosody or the
phraseological behaviour of the node.
308

TESOL QUARTERLY


CONCLUSION
The current study only looked at a total of fifteen lexical items.
Although the collocational behaviour of each form within a lemma (a
total of twelve different forms) was also examined, this is still a minute
sample of the total number of items in the language, and consequently
any findings can only be regarded as preliminary. However, despite the
obvious limitations of the study, results still show that not all collocations
are arbitrary, and therefore any definition of collocation which sees it as
being purely ‘‘an arbitrary linguistic phenomenon’’ (Lewis, 1997, p. 32)
has to be something of an overgeneralisation.

ACKNOWLEDGMENTS
I would like to thank Professor Susan Hunston for her very helpful comments on a
draft of this manuscript and my three anonymous reviewers for their thoughtful and
constructive feedback.

THE AUTHOR
Crayton Walker is a lecturer in applied linguistics at the University of Birmingham in
Birmingham, England. He has a background in teaching English, with over 20 years
of experience teaching business English in Germany.

REFERENCES
Barnbrook, G. (1996). Language and computers: A practical introduction to the computer
analysis of language. Edinburgh, Scotland: Edinburgh University Press.
Benson, M. (1989). The structure of the collocational dictionary. International Journal
of Lexicography, 2, 1–14. doi:10.1093/ijl/2.1.1.
Benson, M., Benson, E., & Ilson, R. (1997). The BBI dictionary of English word
combinations. Amsterdam, The Netherlands: John Benjamins.
Brown, D. H. (2001). Teaching by principles: An interactive approach to language pedagogy.
Harlow, England: Longman.
Carter, R. (1987). Vocabulary: Applied linguistic perspectives. London, England: Allen
and Unwin.
Clear, J. (1993). From firth principles: Computational tools for the study of
collocation. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Texts and
technology: In honour of John Sinclair (pp. 271–292). Amsterdam, The Netherlands:
John Benjamins.
Cook, V. (2001). Second language learning and language teaching. London, England:
Arnold.
Cowie, A. P. (1998). Phraseology: Theory, analysis and application. Oxford, England:
Clarendon Press.
Crowther, J., Dignen, S., & Lea, D. (Eds.). (2002). Oxford collocations dictionary for

students of English. Oxford, England: Oxford University Press.
Crystal, D. (2003). A dictionary of linguistics and phonetics. Oxford, England: Blackwell.

A CORPUS-BASED STUDY OF COLLOCATION

309


Deignan, A. (1997). A corpus-based study of some linguistic features of metaphor
(Unpublished doctoral dissertation). University of Birmingham, Birmingham,
England.
Deignan, A. (2005). Metaphor and corpus linguistics. Amsterdam, The Netherlands:
John Benjamins.
Firth, J. R. (1957). Papers in linguistics 1934–1951. Oxford, England: Oxford
University Press.
Firth, J. R. (1968). Descriptive linguistics and the study of English. In F. R. Palmer
(Ed.), Selected papers by J. R. Firth (pp. 96–113). London, England: Longman.
Francis, G. (1994). Labeling discourse: An aspect of nominal-group lexical cohesion.
In M. Coulthard (Ed.), Advances in written text analysis (pp. 83–101). London,
England: Routledge.
Halliday, M. A. K. (1966). Lexis as a linguistic level. In C. Bazell, J. Catford, M. A. K.
Halliday, & R. Robins (Eds.), In memory of J. R. Firth (pp. 148–162). London,
England: Longman.
Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. London, England:
Longman.
Harmer, J. (2007). The practice of English language teaching (4th ed.). London,
England: Longman.
Hill, J., & Lewis, M. (2002). LTP dictionary of selected collocations. Boston, MA: Heinle &
Heinle.
Hoey, M. (2005). Lexical priming: A new theory of words and language. London,

England: Routledge.
Howarth, P. A. (1996). Phraseology in English academic writing. Tu¨bingen, Germany:
Max Niemeyer Verlag.
Howarth, P. A. (1998). Phraseology and second language proficiency. Applied
Linguistics, 19, 24–44. doi:10.1093/applin/19.1.24.
Hunston, S. (2002). Corpora in applied linguistics. Cambridge, England: Cambridge
University Press.
Kennedy, G. (2003). Amplifier collocations in the British national corpus:
Implications for English language teaching. TESOL Quarterly, 37, 467–487,
doi:10.2307/3588400.
Ko¨vecses, Z. (2002). Metaphor: A practical introduction. Cambridge, England:
Cambridge University Press.
Krishnamurthy, R. (2004). English collocation studies: The OSTI report. London,
England: Continuum.
Lewis, M. (1993). The lexical approach. Hove, England: Language Teaching
Publications.
Lewis, M. (1997). Implementing the lexical approach. Hove, England: Language
Teaching Publications.
Lewis, M. (2000). Teaching collocations. Hove, England: Language Teaching
Publications.
Liu, D. (2010). Going beyond patterns: Involving cognitive analysis in the learning of
collocations. TESOL Quarterly, 44, 4–30. doi:10.5054/tq.2010.214046.
Louw, W. (1993). Irony in the text or insincerity of the writer: The diagnostic
potential of semantic prosodies. In C. Bazell, J. Catford, M. A. K. Halliday, &
R. Robins (Eds.), In Memory of J. R. Firth (pp. 157–176). London, England:
Longman.
McCarthy, M. (1990). Vocabulary. Oxford, England: Oxford University Press.
McCarthy, M., & O’Dell, F. (2005). English collocations in use: Intermediate. Cambridge,
England: Cambridge University Press.
Mason, O. (1997). The weight of words: An investigation of lexical gravity. Proceedings

of PALC 97 (pp. 361–375). Lodz, Germany: University of Lodz.
310

TESOL QUARTERLY


Mason, O. (1999). Parameters of collocation: The word in the centre of gravity. In J.
Kirk (Ed.), Corpora galore. Amsterdam, The Netherlands: Radopi.
Moon, R. E. (1998). Fixed expressions and idioms in English: A corpus-based approach.
Oxford, England: Clarendon Press.
Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge, England:
Cambridge University Press.
Nelson, M. (2006). Semantic associations in business English: A corpus-based
analysis. English for Specific Purposes, 25, 217–234. doi:10.1016/j.esp.2005.02.008.
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and
some implications for teaching. Applied Linguistics, 24, 223–242. doi:10.1093/
applin/24.2.223.
Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam, The Netherlands:
John Benjamins.
Partington, A. (1998). Patterns and meanings. Amsterdam, The Netherlands: John
Benjamins.
Sinclair, J. (1966). Beginning the study of lexis. In C. Bazell, J. Catford, M. A. K.
Halliday, & R. Robins (Eds.), In memory of J. R. Firth (pp. 410–430). London,
England: Longman.
Sinclair, J. (1991). Corpus, concordance and collocation. Oxford, England: Oxford
University Press.
Sinclair, J. (1996). The search for units of meaning. Textus, 9, 75–106.
Sinclair, J. (2004a). The lexical item. In J. Sinclair, & R. Carter (Eds.), Trust the text:
Language, corpus and discourse (pp. 131–148). London, England: Routledge.
Sinclair, J. (2004b). Lexical grammar. In J. Sinclair, & R. Carter (Eds.), Trust the text :

Language, corpus and discourse (pp. 164–176). London, England: Routledge.
Sinclair, J. (2006). Collins COBUILD advanced learner’s English dictionary (5th ed.).
Glasgow, Scotland: Harper Collins Publishers.
Sinclair, J., Jones, S., & Daley, R. (1970). The OSTI report. Birmingham, England:
University of Birmingham.
Sinclair, J., & Jones, S. (1974). English lexical collocations: A study in computational
linguistics. In J. Foley (Ed.), J. M. Sinclair on lexis and lexicography (pp. 110–128).
Singapore: University of Singapore Press.
Smadja, F., & McKeown, K. (1991). Using collocations for language generation.
Computational Intelligence, 7, 229–239. doi:10.1111/j.1467-8640.1991.tb00397.x.
Stewart, D. (2009). Semantic prosody: A critical evaluation. London, England:
Routledge.
Stubbs, M. (1995). Collocation and semantic profiles: On the cause and trouble with
quantitative methods. Functions of Language, 2, 1–33.
Stubbs, M. (2001). Words and phrases: Corpus studies of lexical semantics. Oxford,
England: Blackwell.
Stubbs, M. (2009). The search for units of meaning: Sinclair on empirical semantics.
Applied Linguistics, 30, 115–137. doi:10.1093/applin/amn052.
Summers, D. (2003). Longman dictionary of contemporary English (4th ed.). Harlow,
England: Longmans.
Thornbury, S. (2002). How to teach vocabulary. Harlow, England: Longmans.
Walker, C. (2008). A corpus-based study of the linguistic features and processes which
influence the way collocations are formed (Unpublished doctoral dissertation).
University of Birmingham, Birmingham, England.
Walker, C. (2009). The treatment of collocations by learners’ dictionaries,
collocational dictionaries and dictionaries of business English. International
Journal of Lexicography, 22, 281–299. doi:10.1093/ijl/ecp016.
Wehmeier, S. (2005). Oxford advanced learner’s dictionary (7th ed.). Oxford, England:
Oxford University Press.
A CORPUS-BASED STUDY OF COLLOCATION


311


Woolard, G. (2000). Collocations: Encouraging learner independence. In M. Lewis
(Ed.), Teaching collocations (pp. 28–46). Hove, England: Language Teaching
Publications.

312

TESOL QUARTERLY



×