Tải bản đầy đủ (.docx) (71 trang)

Achieve, attain and accomplish from a corpus based perspective

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1001.7 KB, 71 trang )

VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF
LANGUAGES AND INTERNATIONAL STUDIES FACULTY OF
POSTGRADUATE STUDIES

LÊ THỊ THU HỒNG

“ACHIEVE”, “ATTAIN” AND “ACCOMPLISH”
FROM A CORPUS-BASED PERSPECTIVE

“Achieve”, “attain” và “accomplish”
dưới góc nhìn của phương pháp khối liệu

M.A. MINOR PROGRAM THESIS

Field: English Linguistics
Code: 60220201

HANOI, 2017


VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF
LANGUAGES AND INTERNATIONAL STUDIES FACULTY OF
POSTGRADUATE STUDIES

LÊ THỊ THU HỒNG

“ACHIEVE”, “ATTAIN” AND “ACCOMPLISH”
FROM A CORPUS-BASED PERSPECTIVE

“Achieve”, “attain” và “accomplish”
dưới góc nhìn của phương pháp khối liệu



M.A. MINOR PROGRAM THESIS

Field

: English Linguistics

Code

: 60220201

Supervisor : Dr. Trần Thị Thu Hiền

HANOI, 2017


DECLARATION OF AUTHORSHIP

I hereby declare that the thesis entitled ―ACHIEVE, ATTAIN AND
ACCOMPLISH FROM A CORPUS-BASED PERSPECTIVE‖ is the result of my
own study. It was conducted with scientific guidance of Dr. Trần Thị Thu Hiền. The
data and conclusions of the study presented in the thesis have never been published
in any form.

i


ACKNOWLEDGEMENTS

I would like to express my deepest gratitude towards my supervisor, Dr. Trần

Thị Thu Hiền, for her immense support and invaluable guidance without which my
study would be far from finished. Also, I am grateful to all the lecturers and staffers
at the Faculty of Post-graduate Studies, University of Languages and International
Studies, Vietnam National University of Hanoi. Their support and consideration
have enabled me to pursue the course. Last but not least, my sincere thanks go to
my beloved family for their love, encouragement and support while I was
conducting this research.

ii


ABSTRACT

This descriptive research exploits corpus linguistic methods in order to
differentiate the three synonymous verbs achieve, attain and accomplish by
realizing their similarities and differences in meanings and usages. The data
collection instruments include two large corpora, namely the Corpus of
Contemporary American English and the Collins Wordbank Online, and six
dictionaries. Statistical analysis in combination with intuitive-based interpretation of
the data reveals significant findings: (1) the three verbs have both overlapping as
well as exclusive senses, whose frequencies are different across words; (2)
regarding register, all the verbs are most preferred in academic journals even though
accomplish has lower formality level than the other two; and (3) in terms of
collocational properties, despite a few mutual collocates, each verb tends to
favorably co-occur with a distinctive group of nouns as object.
Keyword: near-synonym, corpus linguistics, word sense, collocation

iii



TABLE OF CONTENS
DECLARATION OF AUTHORSHIP..................................................................... i
ACKNOWLEDGEMENTS.................................................................................... ii
ABSTRACT...........................................................................................................iii
LIST OF TABLES.................................................................................................. vi
LIST OF FIGURES............................................................................................... vii
CHAPTER 1. INTRODUCTION........................................................................... 1
1.1. Rationale.......................................................................................................... 1
1.2. Aim and objectives of the study...................................................................... 2
1.3. Research questions.......................................................................................... 2
1.4. Research methods............................................................................................ 2
1.5. Scope of the study............................................................................................ 3
1.6. Significance of the research............................................................................. 3
1.7. Organization of the study................................................................................. 3
CHAPTER 2. LITERATURE REVIEW............................................................... 5
2.1. Synonymy.......................................................................................................... 5
2.1.1. Synonymy as absolute synonymy.................................................................... 5
2.1.2. Synonymy as near-synonymy.......................................................................... 5
2.1.3. Near-synonymic differences............................................................................ 7
2.2. Corpus linguistics............................................................................................. 14
2.2.1. Corpus........................................................................................................... 14
2.2.2. Corpus linguistics.......................................................................................... 15
2.2.3. Corpus linguistics in synonymy study........................................................... 15
2.3. Previous studies................................................................................................ 16
CHAPTER 3. METHODOLOGY....................................................................... 18
3.1. Research approaches........................................................................................ 18
3.2. Data sources..................................................................................................... 18
3.3. Data collection procedure................................................................................. 20

iv



3.3.1. Phase 1 - Word senses and frequencies of senses.......................................... 20
3.3.2. Phase 2 - Register.......................................................................................... 21
3.3.3. Phase 3 - Collocational properties................................................................. 21
CHAPTER 4. FINDINGS AND DISCUSSION.................................................. 23
4.1. Word senses and frequencies of senses............................................................. 23
4.1.1. Word senses................................................................................................... 23
4.1.2. Frequencies of senses.................................................................................... 27
4.2. Register............................................................................................................ 29
4.3. Collocational properties................................................................................... 32
4.3.1. Preferred collocation..................................................................................... 32
4.3.2. Less preferred and anti-collocation............................................................... 39
CHAPTER 5. CONCLUSION............................................................................. 44
5.1. Concluding remarks......................................................................................... 44
5.2. Implications...................................................................................................... 45
5.3. Limitations of the study and recommendations for further research.................47
REFERENCES...................................................................................................... 48
APPENDIX............................................................................................................... I

v


LIST OF TABLES
Table 2.1. Dimensions of denotation variations...................................................... 11
Table 4.1 Dictionary senses of achieve, attain and accomplish...............................27
Table 4.2. Sense distribution of achieve, attain and accomplish.............................27
Table 4.3. Frequencies of achieve, attain and accomplish in different genres.........30
Table 4.4. Top mutual collocates of achieve, attain and accomplish.......................33
Table 4.5. Top object collocates of achieve only,.................................................... 36

attain only and accomplish only.............................................................................. 36

vi


LIST OF FIGURES
Figure 2.1. Classification of synonymic difference by Edmonds (1999) and
Edmonds and Hirst (2002)...................................................................................... 10
Figure 2.2. Gove‘s (1973) entry (abridged) for the near-synonyms of lie...............11
Figure 3.1. Corpus command for frequencies on the COCA (screenshot)...............21
Figure 3.2. Command for collocation in the CWO (screenshot).............................22
Figure 4.1. The proportion of tokens in different genres for achieve, attain and
accomplish.............................................................................................................. 31
Figure 4.2. Sketch difference of objects between achieve and attain......................40
Figure 4.4. Sketch difference of objects between attain and accomplish................42
Figure 4.5. Summary of preferred, less preferred and anti-collocates of achieve,
attain and accomplish............................................................................................. 43

vii


CHAPTER 1. INTRODUCTION
1.1.

Rationale
More than being a linguistic instrument, English, the world language, leads

its learners to broadened horizons and brings them to various perspectives. Hence,
the teaching of English as a second or foreign language has never ceased to be vital.
Indeed, it is hardly of any surprises that English as a school subject accounts for the

most teaching hours in classrooms all over the world compared to any other subject.
Vietnam, in the process of renovating its education and particularly its
English language teaching, has placed emphasis on the development of teachers‘
and learners‘ proficiency of the language.
As a Vietnamese learner and teacher of English, the author has recognized
difficulties met by non-native speakers in terms of understanding and using correct
vocabulary in different contexts. This challenge becomes even more significant
when it comes to word choice among confusing synonyms. Among various
confusing groups of synonyms, achieve, attain and accomplish appear one of the
most challenging to English learners as well as the author. Are they completely the
same in meaning? If not, how are they different? In which aspects do they resemble
and/or differ? Motivated by the desire to better understand this issue, the author
intends to investigate the meaning and usages of these often-misused synonyms,
achieve, attain and accomplish.
Since the arrival of information technologies and the development of
computer, corpora have been revolutionized into enormous electronic collections of
authentic texts which provide invaluable insights into the distribution of words in a
language. This would assist language researchers as well as language users,
especially those who are non-native, to differentiate between near-synonyms based
on their patterns of distribution retrieved automatically from corpora. This
encourages the author to exploit this immensely promising tool to examine the nearsynonyms mentioned.

1


1.2.

Aim and objectives of the study

The study aims to distinguish the three synonyms achieve, attain and

accomplish from a corpus-based perspective as an illustrative example of one way
to realize the nuances of meanings and usages between synonymous words.
It is the author‘s assumption that the corpus linguistic approach applied in the
study can prove that achieve, attain and accomplish are near-synonyms and they
have overlapping senses as well as distinct shades of meaning. Also, they may be
similar and/or different in terms of usage, to be more specific, in their genre
preference and collocational behaviors.
With the aforementioned aim and assumption, the objectives of the study are
to (1) identify the overlapping and exclusive senses of each synonym and the
frequencies of these senses, (2) find out the genre preferences of each term, and (3)
draw out and compare the collocational properties of the target verbs.
1.3.

Research questions
In order to fulfil the above objectives, the research questions are conducted

as follows
(1) What are the similarities and differences in word sense and frequencies of

sense of achieve, attain and accomplish?
(2) What are the similarities and differences in register of achieve, attain and

accomplish?
(3) What are the similarities and differences in collocational properties of

achieve, attain and accomplish?
1.4.

Research methods
The study relied on corpus-based approach with the vast data collected from


two large corpora (Corpus of Contemporary American English and Cobuild
Wordbank Online) and six dictionaries. The process of dictionary consultation
revealed similarities and differences in the senses of achieve, attain and accomplish,
while an analysis of concordances from the corpora showed the frequencies of these
senses. Later the two corpora were used to extract data on the synonymous verbs‘
register and finally compare their collocational patterns. The results were then
thoroughly analyzed and interpreted by the author.
2


1.5.

Scope of the study
This study does not aim to be an extensive account of all aspects of

synonymy. Rather, it just covers some of the most practical patterns of usage,
including word senses and sense frequencies, style or register and collocational
behaviors. Also, within the study space permission, the focus of the investigation is
limited to three specific verbs, achieve, attain and accomplish, not any other
member of their synonym cluster. Similarly, the variations among different varieties
of English are not considered for that would complicate the comparison. In terms of
collocational properties, for the same space restriction, only collocating objects are
analyzed.
1.6.

Significance of the research
This piece of research is significant in the context of both English linguistics

research and English teaching and learning in Vietnam for a number of reasons.

Firstly, while the number of studies applying corpora in linguistics in general and
lexical semantics in particular is soaring worldwide, that in Vietnam is still very
limited. In fact, to the best of the author‘s knowledge, very few research on the
similar topic can be found to have been done in Vietnam. This considerably
emphasizes the necessity of this research. Secondly, this research and its results
would, in the author‘s opinion, greatly facilitate teaching and learning English for
EFL teachers and learners. It is often that teachers of English find themselves asked
questions about synonyms such as ―How are these words different?‖, ―Are they
the same in every situation?‖, ―Can they substitute each other?‖, etc. More often
than not, they may have no better answers than ―It is just the way it is‖, which
could hardly help students‘ language ability. This research, especially its
methodology and results, would illustrate a very promising tool and method for
synonym differentiation for EFL teachers and students.
1.7.

Organization of the study
The study consists of five chapters. Chapter One gives a brief overview of

the study, including the rationale, aims and objectives, research questions, research

3


methods, the scope of the study and the significance of the research. Chapter two
presents the literature review on corpus linguistics and synonymy study with the
emphasis on the framework for this research. Chapter three outlines the research
methodology adopted with detailed description of the research approach, data
collection instruments and data collection procedure. Chapter four illustrate the
findings and analysis of those findings from dictionary and corpus data. Finally,
chapter five sums up and give an interpretation to the findings described in the

previous chapter, pointing out the relevance of this research to teaching and learning
English in general as well as lexicon in particular; also, limitations of the study and
recommendations for further research are mentioned in this concluding chapter.

4


CHAPTER 2. LITERATURE REVIEW
This chapter provides an overview on synonymy and synonymic difference, along
with the literature of corpus-based approach on synonymy study.
2.1. Synonymy
Defining synonymy has been challenging due to different approaches to this
phenomenon. If synonyms are understood as words with exact same meaning, it can
be proved that such words do not exist in any language. However, if the scope of
synonyms is widened to include words with merely similar meanings, virtually any
pair of words could be considered synonyms at some level. This paradox comes due
to the many different approaches to defining synonymy, which will be reviewed in
this section.
2.1.1. Synonymy as absolute synonymy
Some linguists, such as Lyons (1977), have looked at synonym as absolute
synonyms, that is, words which are interchangeable in all possible contexts without
meaning alteration. This, however, has been challenged by Quine (1951), on the
grounds that it is impossible to determine whether the expressions before and after
substitution have the same meaning. From a different angle, Goodman (1952),
claims that no two words can have the same meaning, for there would always be
some contexts in which two putative words are not completely interchangeable.
Even if absolute synonyms are arguably possible, pragmatic and empirical evidences
show that it is very rare. Clark (1992), in her Principle of Contrast, pronounces that
language constantly changes to eliminate absolute synonyms. If an absolute synonym
would not take on new nuance(s) of meaning, it would fall into disuse.


2.1.2. Synonymy as near-synonymy
It is quite largely agreed that absolute synonyms are virtually non-existent.
However, there are nearly absolute synonyms which can substitute each other in
contexts with minor differences in the overall expression. Lexicographers obviously
acknowledge that synonym is a matter of degree, on account that every dictionary of
synonyms, in fact, differentiate between near-synonyms. Synonymy is defined in
5


terms of similarities in meaning, although how similar in meaning is still a question
of debate. Traditionally, synonyms are defined as closely related words that differ in
minor ways, but a broader definition includes words with merely one or more
related characters of meaning (Egan, 1973). To be specific, Roget applied the
principle of ―the grouping of words according to ideas‖ (Chapman, 1992), while
lexicographers of Webster‘s New Dictionary of Synonyms used the following more
precise definition (as in Edmonds, 1999):
A synonym, in this dictionary, will always mean one of two or more words in the
English language which have the same or very nearly the same essential meaning
[

the two or more words which are synonyms can be defined in the same terms

up to a certain point.
(Gove, 1973)

Ultimately, the level of openness of a synonym definition in each dictionary
depends on its purposes, in the sense that Roget‘s Thesaurus is likely to be better for
word searching, whereas Webster‘s New Dictionary of Synonyms appears to be superior
for word discrimination. Due to the study‘s aim being finding the differences among

near-synonyms, the latter would serve as one effective tool for data collection.

Similar to lexicographers, semanticists have appeared to agree on synonym as a
matter of degree. Ullmann (1962) defined near-synonymy as having similar
―objective‖ meaning, but possibly having different emotive, stylistic or dialectal
meaning. Lyons (1995) argued that near-synonyms are ―more or less similar, but
not identical in meaning‖. He also added a distinction between near-synonym and
partial synonym, though it is not clear why. Partial synonyms fail to qualify as
absolute synonyms for either they are not ―complete‖, i.e. not identical ―on all
dimensions of meaning‖, or they are not ―total‖, i.e. not ―synonymous in all
contexts‖ (1981). For example, big and large are partial synonyms because despite
being complete synonyms, they are not total synonyms. A big mistake is fine
whereas a large mistake is unacceptable.
Giving a more precise definition on synonymy, Cruse (1986) differentiated two
kinds of near-synonymy which roughly correspond to Lyons‘ classification. One

6


is cognitive synonyms, which refer to words which the same truth conditions but
different expressive meaning, style or register, such as fiddle and violin. The other is
plesionyms, which are lexical items without totally the same truth conditions, but
still yield semantically similar expressions, for instance, foggy and misty.
Unfortunately, the aforementioned distinction seems unrealistic for
determining synonym differences, for it only covers the aspect of propositional
meaning, one among the many types of synonym variations. Moreover, two
definitions of synonyms would just complicate the categorization.
In order to solve the problem, Edmonds (1999) introduced the concept of
granularity into defining synonymy, aiming to include the level of detail used to
describe or represent the meanings of words. Due to its possibility of marking the

difference between the essential and peripheral meanings of a word, this concept
helps construct more rigorous definition of synonym. However, it is still difficult to
set a benchmark for an appropriate level of granularity in the representation of
word meaning to precisely define near-synonym. In an attempt for a rigorous
definition of near-synonym, Edmonds proposed that:
Near-synonym preserve truth conditions, or propositional meaning, to a level of
granularity of representation consistent with language independence in most
contexts when interchanged.

Having the same school of thoughts, DiMarco, Hirst and Stede (1993) (in
Edmonds and Hirst, 2002) claimed that near-synonyms are words that are close, but
not identical in meaning. They ―vary in their shades of denotation, connotation,
implicature, emphasis or register‖. Similarly, Inkpen and Hirst (2006) emphasized
that near-synonyms are not completely interchangeable but differ in denotational or
connotational meaning; they may vary in grammatical or collocational behaviors.
Overall, these notions can hardly settle the debate on synonymy, but they provide
theoretical implications for lexical semantics.
2.1.3. Near-synonymic differences
As presented in the previous section, it is generally agreed that absolute
synonyms virtually do not exist. Synonymy is widely considered near-synonymy,

7


for which examples can be found easily. Thin, slim and skinny all denote a state of
body figure; however, while thin carries a neutral tone, slim and skinny respectively
convey a positive and negative sense from the speaker. Similarly, pissed, drunk and
inebriated are correspondingly informal, neutral and formal expression of the same
denotation, which is being affected by alcohol to the extent of losing control of
one‘s faculties or behavior, according to Cambridge English Dictionary.

In any discussion of near-synonym, the most discussed concept would be
synonymic difference (Edmonds and Hirst, 2002), for there must be some
distinctions between two putative synonyms that make them unidentical. As
illustrated by the examples above, near-synonyms not only differ in denotational
meaning, but also in every aspect of their meaning. Comprehension of synonym
differences is crucial in language use, especially for EFL learners, who usually lack
native linguistic intuition in word selection.
There are multiple ways in which synonyms can differ. Cruse (1986) lists
four broad type of differences in synonymic meanings:


denotational or propositional meaning



stylistic meaning (dialect and register)



expressive meaning (affect, emotion and attitude), and



presupposed meaning (selectional and collocational variations)
DiMarco, Hirst and Stede (1993) investigated synonyms in terms of semantic

and stylistic distinctions, i.e. denotational and connotational differences. However,
this categorization seems not precise enough. Denotation refers to the literal,
explicit meaning of a word, while connotation covers any other aspect that is not
denotation. This makes the term too broad and ambiguous to become a criterium for

synonymous distinction.
Having a to some extent similar classification to Cruse‘s, Gove (1973)
argues that synonyms may have distinctions in:


implications



connotations, and

8




applications

Gove‘s criteria include both propositional and peripheral meaning; however, it is
unclear why he did not include stylistic difference in the categorization despite his
extensive discussion on the matter. All of the above classifications are combined by
Edmonds (1999) and Edmonds and Hirst (2002) to develop a categorization of
synonymous differences with more sub-classes. The categorization also includes
four main variations, which are illustrated in figure 2.1.
Denotational variation of near-synonym has proved to be the most
complicated to sort out. It involves differences not only of simple features but of
―full-fledged concepts or ideas‖, with relation to roles and aspects of a context.
According to Edmonds (1999), many concepts or ideas in which near-synonyms
differ can be considered to be dimensions of variations, such as continuous, binary,
different phases of a process, referent to world-knowledge, etc. See table 2.1 for

examples of synonyms with different dimensions of variations. Within the limited
scope of the study, the author would not go into such detailed classification but
would just consider different dimensions of denotational variations as denotational
variations, or nuances in word senses.
In terms of variations in manner of expression, the most likely related aspect
of denotational variation to this study is synonymic difference in frequencies of
senses. This represents the frequency that a synonym expresses a specific sense in
real language usage, which is usually referred to in frequency terms such as always,
often, usually, etc. in dictionaries. However, this use of frequency terms by
lexicographers could not adequately specify how similar or different the frequencies
of expression are between two synonyms. Take Gove‘s (1973) entry for the nearsynonyms of lie (as shown in figure 2.2) as an example. The underlined frequency
terms, ―usually‖, ―often‖, and ―sometimes‖, only give a very vague idea of the
words‘ frequencies of senses, i.e. dictionary users can hardly determine which sense
of which word is more prominent/popular than the other(s).

9


Classification of synonymic difference
fine-grained
technical
variations
abstract
denotational
variations
manner of
expression
DENOTATIONAL
VARIATION


indirectness

emphasis

frequency of
sense

basic dimension

dimensions of
denotational
variation

complex
dimension

specificity
dialect
STYLISTIC
VARIATION
register

fuzzy and
overlapping
words

emotive aspects
EXPRESSIVE
VARIATION
attitudinal aspects


STRUCTURAL
VARIATION

collocational
aspects
syntactic aspects

Figure 2.1. Classification of synonymic difference by Edmonds (1999) and
Edmonds and Hirst (2002)
10


Continuous dimension
Binary dimension
Multi-value dimension

Complex (process)
Complex (world-knowledge)
Complex (inference)
Specificity

Extensional overlap

Fuzzy overlap

*The first term is the most general term
Table 2.1. Dimensions of denotation variations
Lie usually felt to be a term of extreme opprobrium because it implies a flat and unquestioned
contradiction of the truth and deliberate intent to deceive or mislead.

Falsehood may be both less censorious than lie and wider in its range of application…. Like lie,
the term implies known conformity to the truth, but unlike lie, it does not invariably suggest a
desire to pass off as a true something known to be untrue.
Untruth is often euphemistic for lie or falsehood and may carry similar derogatory implications.
… Sometimes, however, untruth may apply to an untrue statement made as a result of ignorance or
a misconception of the truth.
Fib is an informal or childish term for a trivial falsehood; it is often applied to one told to save
one‘s own or another‘s face.
Misrepresentation applies to a misleading and usually an intentionally or deliberately misleading
statement which gives an impression that is contrary to the truth.

Figure 2.2. Gove’s (1973) entry (abridged) for the near-synonyms of lie
This is where corpus linguistic methodology proves to be promising, for it provides
statistical data which will likely ease the process of calculating frequencies of
expression. This will be later discussed in more details in Chapter 3.
11


Another noticeable point in synonym differentiation is stylistic variation,
which includes dialect and stylistic tone, or register. While dialectal differences
closely relate to language users, register variation is more associated with the
environment where the text happens, making it feasible to be compared basing on
corpus data. These dimensions of register are absolute and can be compared on the
same finite scale of dimensions with a range of possible values from low to high.
For example, pow wow appears in informal contexts, while meeting in neutral and
assembly in more formal ones.
Distinction of synonyms also involves expressive variation, which consists of
two main categories of differences. One is about the speaker‘s emotions, and the
other is the speaker‘s attitude or judgement toward the referent. However, this
cannot be judged from the corpus data and therefore is not studied within the scope

of this research.
Finally, near-synonyms may be different in their structural patterns, i.e.
collocational and syntactic behavior. In terms of syntactic behavior, near-synonyms
can differ in their grammatical patterns. For instance, John teaches tricks to the dog
is acceptable while John *instructs tricks to the dog is impossible. On the other
hand, collocational variation associates with the words which can combine with
the putative word. For example, make a cake but not *do a cake.
This notion on collocational variation overlap the co-occurrence approach
which is based on the assumption that the semantic and functional traits of a lexical
item can be shown through its distributional characteristics. This assumption can be
traced back to Firth‘s famous saying in 1957 ―you shall know a word by the
company it keeps‖. Similarly, Bolinger (1968) claimed that different syntactic form
always indicates meaning difference. Harris (1970) agreed to this assumption when
asserting overtly:
If we consider words or morphemes A and B to be more different in meaning than
A and C, then we will often find that the distribution of A and B are more different
that the distribution of A and C. In other words, difference of meaning correlates
with difference of distribution.

Cruse (1986) also stated that ―the semantic properties of a lexical item are
12


fully reflected in appropriate aspects of the relations it contracts with actual and
potential contexts‖.
This theory has been the underlying logic for a great deal of studies on
synonymy, in which collocational distribution and/or syntactic distribution is
exploited. Some of these studies are Church et al. (1998) on strong and powerful,
Partington (1998) on absolutely, completely, and entirely, and Biber et al. (1998) on
big, large and great. In these studies, the differences in collocational properties of

the putative words indicates the differences in their meanings. To be specific, these
meaning differences were interpreted from the distributions of formal elements of
the words within their context provided by the corpora.
Adopting from Pearce (2001), this study will look at collocations as a set of
three joint subclasses, which are


preferred collocation (words which are collocates of the target words),



less-preferred collocation (words which tend not to be used with the target

word although, if used, do not lead to unnatural readings), and


anti-collocation (words which must not be used with the target word since

they will lead to unnatural readings).
This classification by Pearce, to the author of this study, would enable a
clearer path into investigation of near-synonym‘s collocation in the light of the cooccurrence approach.
Overall, the four main types of synonym variations are denotational, stylistic,
expressive and structural. This categorization will be applied in the research
analysis to collect meaning distinctions of the three verbs achieve, attain and
accomplish.
In summary, in order to find the similarities and differences of the three
verbs achieve, attain and accomplish, the study is based on Edmonds‘ classification
of synonymic differences, with the focus on three aspects, namely denotational
(word senses and frequencies of senses), stylistic and collocational variations.


13


2.2. Corpus linguistics
2.2.1. Corpus
First and foremost, it is necessary to define corpora, which have been
described in numerous ways during their decades of development. One is by
Kennedy (1998), who identifies a corpus as ―a collection of texts in an electronic
database‖. This definition seems to overlook on one of the most important characters
of corpora, that they are designed to be representative and balanced of a
language (Gries, 2009). This means that a corpus should manifest all different
linguistic varieties in their true proportions as in the language. Even though such
theoretically ideal corpus design is still a challenge not yet overcome by corpus
compilers, corpora are anything but random collections of texts. Leech (1992)
defines corpora more strictly as ―generally assembled with particular purposes in
mind, and are often resembled to be representative of some language or text type‖.
However, this definition seems not strict enough, for it misses one criterion for a
text to be qualified in a corpus. All texts that form a corpus must have been occurred
in natural communicative settings, not formulated for the sole purpose of being
gathered into a corpus. Covering all the criteria, McEnery, Xiao and Tono (2006)
propose a more satisfying definition – a corpus is ―a collection of machinereadable authentic texts which is sampled to be representative of a particular
language or language variety‖. It is important to note that, unlike their paper-based
predecessor, modern electronic corpora in combination with computer corpus
software have immense advantages in language study such as easy manipulation of
data at minimal cost, accurate data processing and limitation of human bias. Crystal
(1985) adds that this collection of data can be used ―as a starting point of linguistic
description or as a means of verifying hypotheses about a language‖.
Another point to cover is the various types of corpora. In fact, corpora differ
in various ways. First, there are general corpora which depict language as a whole
and specific corpora¸ which represent only a particular variety of language. Second,

diachronic corpora and synchronic corpora differ in terms of their span – one cover

14


changes over time while the other only provide language data at one specific point
of time. Another distinction may be between monolingual and parallel corpora,
which provide texts in either one or multiple languages. Finally, corpora can be
different in terms of whether they are fixed in size. A corpus which stays the same
once created is static, while one which is constantly extended with updated data is
dynamic.
2.2.2. Corpus linguistics
It is still a matter of debate whether corpus linguistics is a branch of
linguistics or a methodology. On the one hand, it is said that corpus linguistics has
become an independent ‗philosophical approach‘ (Leech, 1992); on the other hand,
it is considered indeed a methodology that is not restricted to a particular aspect of
language (McEnery et al, 2006). It considers ‗natural-occurring‘ language as a
credible source for investigation and classification of linguistic structures. Similarly,
Hanks (2008) states that corpus linguistics is primarily concerned with interpreting
observed language in order to arrive at statements on patterns in word meaning or
syntactic composition. Gries (2009) lists a number of areas in which corpus
linguistic helps in investigation:
- Phonology: how possible is the prediction of the degree of phonological
assimilation or reduction based on its‘ components‘ frequency of co-occurrence as
in Bybee and Scheibman (1999)
- Morphology: what do regular and irregular verb forms suggest about the

probabilistic nature of the linguistic system as in Baayen and Martin (2005)
- Syntax: how to predict which syntactic choice speakers will make as in Leech et


al. (1994)
- Semantics and pragmatics: how do near-synonyms differ from each other, as in

Okada (1999), Oh (2000), Gast (2006) and Gries and David (2006).
2.2.3. Corpus linguistics in synonymy study
In order to determine synonym similarities and differences, one could
consult a number of sources (Edmonds, 1999). The first one is one‘s own intuition;

15


however, this could be too biased to produce reliable synonym distinction.
Secondly, it could be helpful to consult dictionaries - the much less biased work by
generations of lexicographers. Although dictionary definitions and usage notes serve
as a decent source of data for synonym comparison, this source alone is not in-depth
and detailed enough. A more fruitful source for analysis is raw text corpora. As
presented earlier, corpora with their powerful computer databases and language
analysis tools facilitate researchers to judge word behaviors in a myriad of authentic
contexts. This opens the door to concluding the meanings of words from their
repeated syntactical and collocational patterns. Therefore, it is reasonable to use
corpus as a source of data for investigating synonyms. This is advocated by Church
et al. (1994) as they claim collocational and constructional similarity collected from
corpora can be used to investigate semantic relations like synonym and antonymy. A
review of past studies on synonymy in light of corpus linguistic approach can be
found in 2.4.
2.3. Previous studies
In Vietnam, linguistics research using corpora is still of limited number,
many of which are research on languages other than English. For example, Dao
(2011) emphasized the importance of corpus linguistics and corpus technology in
teaching and learning Vietnamese as a foreign language, or Nguyen (2016) studied

variation modes of speech sound in Vietnamese by using Sino – Vietnamese corpus
of yuanyun. Studies concerning English language phenomena seem scarce; one that
can be found is by Luu (2016) in which she did a critical discourse analysis of
power relation in New York Times‘ reconstruction of global climate change
conferences. Despite a lot of effort to find previous studies in Vietnam on synonymy
in light of the corpus approach, the author have not been able to find one.
On the universal scale, research on synonym and corpus linguistics are a lot
easier to find. In fact, the number of studies on near-synonymy has surged during
the last few decades along with the arrival of electronic corpora and computerized
language tools. One of the most well-known is Kennedy‘s (1991) study of between

16


×