VIETNAM NATIONAL UNIVERSITY
COLLEGE OF FOREIGN LANGUAGES
DEPARTMENT OF GRADUATE STUDIES
NGUYN TH VÂN HNH
LEXICAL AND MORPHOLOGICAL CHARACTERISTICS OF
MEDICO-PHARMACEUTICAL TEXTS AND PEDAGOGICAL
IMPLICATIONS
(NHNG C IM V MT T VNG VÀ HÌNH THÁI HC
CA CÁC VN BN Y-DC VÀ NG DNG
TRONG GING DY)
M.A. Combined Programme Thesis
Field: English linguistics
Code: 602215
HANOI, APRIL 2008
IETNAM NATIONAL UNIVERSITY
COLLEGE OF FOREIGN LANGUAGES
DEPARTMENT OF GRADUATE STUDIES
_____________________________
NGUYN TH VÂN HNH
LEXICAL AND MORPHOLOGICAL CHARACTERISTICS OF
MEDICO-PHARMACEUTICAL TEXTS AND THE PEDAGOGICAL
IMPLICATIONS
(NHNG C IM V MT T VNG VÀ HÌNH THÁI HC CA
CÁC VN BN Y-DC VÀ MT S NG DNG TRONG GING
DY)
M.A. Combined Programme Thesis
Field: English linguistics
Code: 602215
Supervisor: Dr. Kiu Thu Hng
HANOI, APRIL 2008
i
STATEMENT OF AUTHORSHIP
This work contains no material which has been accepted for the award of any other
degree in any university or other tertiary institution and, to the best of my knowledge and
belief, contains no material previously published or written by other person, except where
due to references have been made in the text.
Hanoi, April 2008
Nguyen Thi Van Hanh
ii
ACKNOWLEDGEMENTS
I would like first and foremost to express my sincere and deep gratitude to my supervisor,
Dr. Kieu Thi Thu Huong, for her deliberate guidance and invaluable critical feedback and
suggestions during the writing of this study. Her constant support, encouragement and
patience are highly appreciated. But for her help, this work would not have been
completed.
I would like to take this opportunity to express my sincere thanks for the support and
encouragement from Assoc. Prof. Dr. Le Hung Tien toward the completion of my thesis.
I would also like to thank all teachers from the English Department at Hanoi University
of Pharmacy for their unconditional support and their useful ideas for my study.
Particularly, I owe my thanks to Mrs. Nguyen Do Thu Hoai, Head of the English
Department, who has continuously encouraged me and shared with me her experience
relating to teaching and learning ESP at HUP.
My appreciation is also to the professors who participated in my inter-rater reliability
check for their valuable feedback.
I am also indebted to all other people whose suggestions, support and encouragement
have contributed to the completion of my thesis.
iii
ABSTRACT
English for pharmacy at Hanoi University of Pharmacy (HUP) has been taught for three
decades; however, there has been little empirical research on medico-pharmaceutical
English texts which are used for this English for Specific Purposes (ESP) course. This
research has been conducted in order to provide teachers and students at HUP with a
detailed analysis of the lexical and morphological characteristics of the corpus of texts
they are working with and drawing implications for teaching and learning.
To achieve the above aims, this corpus-based study investigates lexical characteristics of
the corpus of medico-pharmaceutical texts used in a pilot ESP course at HUP. This is
carried out by classifying vocabulary into four levels using primarily the RANGE
program (Nation, 2006) and the four-point rating scale by Chung and Nation (2003), and
by exploring the morphological characteristics of this ESP corpus mainly with the Simple
Concordance Program (Reed, 1997-2008). The results show that the size and the
coverage of technical vocabulary are relevant as compared to the previous results of
similar studies, strongly suggesting that the coursebook materials are manageable for
students. The morphological analysis presents the frequency, origin, formation, meanings
and functions of the most frequently used affixes in the corpus, revealing that there is a
high frequency of words in the corpus from technical vocabulary which share the same
origin and formation by means of their affixes. The morphological characteristics are,
therefore, important in helping students to acquire technical vocabulary.
The results brought about by the lexical and morphological analyses in this study suggest
various implications for course design, materials evaluation, and materials development,
as well as for teaching, learning, and testing ESP at HUP in a narrow focus and in EFL
teaching and learning in a wider context. The tools and methods employed in this study
are also intended to assist teachers and researchers in the field of ESP to deal with
technical vocabulary.
iv
TABLE OF CONTENTS
Acknowledgements ii
Abstract iii
PART ONE: INTRODUCTION 1
1. Rationale 1
2. Aims of the study 2
3. Research questions 2
4. Research methods 2
5. Scope of the study 4
6. Significance of the study 4
7. Structure of the thesis 5
PART TWO: DEVELOPMENT 6
CHAPTER 1: THEORETICAL BACKGROUND 6
1.1. An overview of lexicon 6
1.1.1. Some basic concepts 6
1.1.1.1. Word and lexeme 6
1.1.1.2. Word classes 8
1.1.1.3. Closed system versus open classes 9
1.1.2. Lexical relations 10
1.1.2.1. Collocation 10
1.1.2.2. Polysemy and homonymy 11
1.1.3. Word types, word tokens and lemmas 14
1.2. An overview of morphology 15
1.2.1. Some basic concepts 15
1.2.2. Inflection, derivation and compounding 17
1.2.2.1. Inflection 17
1.2.2.2. Derivation 18
1.2.2.3. Compounding 19
1.2.2. The historical sources of English word formation 20
1.2.3. Characteristics of Germanic and non-Germanic derivation 21
1.3. Text analysis 22
v
1.3.1. Quantitative versus qualitative text analysis 22
1.3.2. Corpus linguistics and corpus-based approach to text analysis 22
1.3.4. Tools for corpus-based analyses 26
1.4. ESP texts 26
1.4.1. ESP texts and technical vocabulary 26
1.4.3. Corpus-based approach and analysis tools in ESP 28
1.5. English for medicine and pharmacy 31
CHAPTER 2:LEXICAL CHARACTERISTICS OF MEDICO-PHARMACEUTICAL
TEXTS AT HANOI UNIVERSITY OF PHARMACY 33
2.1. Methodology 33
2.1.2. The selection of texts 34
2.1.3. Major methods for data analysis 35
2.1.4. Major tools for data analysis 35
2.1.5. The inter-rater reliability check 37
2.1.5.1. Introduction of the inter-rater reliability check 37
2.1.5.2. The results of the inter-rater reliability check 41
2.2. Lexical features of the corpus of texts at HUP 42
2.2.1. Initial description and discussion of the data 43
2.2.1.1. General statistics of the corpus 43
2.2.1.2. Processing of the data against the first 2,000 most frequent words in GSL 45
2.2.1.3. Processing of the data against the AWL 47
2.2.1.4. Processing of the data from word list 4 48
2.2.2. In-depth description and discussion of technical vocabulary 49
2.2.2.1. The size of technical vocabulary in the ESP texts 49
2.2.2.2. The importance of technical vocabulary in the ESP texts 51
CHAPTER 3:MORPHOLOGICAL CHARACTERISTICS OF
MEDICO-PHARMACEUTICAL TEXTS AT HANOI UNIVERSITY
OF PHARMACY 55
3.1. Methodology 55
3.2. Discussion of inflectional suffixes in the corpus 56
3.2.1. Suffix -ed 56
3.2.2. Suffix -ing 58
3.3. Discussion of derivational affixation in the corpus 60
vi
3.3.1. Suffix –tion 61
3.3.2. Suffix –al 62
3.3.3. Suffix –ic, -ical and -ous 63
3.3.4. Suffix -ine, -ium and -ia 64
PART THREE: CONCLUSION 68
1. Conclusion 68
2. Major findings 69
2.1. Major findings concerning lexical characteristics 69
2.2. Major findings concerning morphological characteristics 69
3. Implications 70
3.1. Implications for course designers, materials evaluators and materials developers 70
3.1.2. For course designers 70
3.1.3. For materials evaluators 71
3.1.4. For materials developers 72
3.2. Implications for EFL/ESP teaching and learning 73
3.2.1. Implications for teachers 73
3.2.2. Implications for students 77
3.3. Implications for testing 78
3.4. Other implications 79
4. Suggestions for further research 79
REFERENCES 81
APPENDIX 1 I
APPENDIX 2 IV
vii
LIST OF TABLES AND FIGURES
Table 1. Typical differences between lexical words and function words 9
Table 2. Germanic and non-Germanic derivation 21
Table 3. Association patterns in language use 24
Table 4. Percentage of each vocabulary level in academic language courses 27
Table 5. Effectiveness of the four ways of identifying technical terms 30
Table 6. Sample classification in the inter-rater reliability check 39
Table 7. Marked words for the inter-rater reliability check 40
Table 8. Inter-rater reliability accuracy score calculated by the number of words
assigned to four steps by rater 1 and by the researcher 41
Table 9. Inter-rater reliability accuracy score calculated by the number of words
assigned to four steps by the rater 2 and by the researcher 42
Table 10. Coverage of texts by the various levels of vocabulary types and tokens by
RANGE program 43
Table 11. Ratio between number of input files and number of types found 44
Table 12. Word classes vs. word list 1 45
Table 13. The most frequent words vs. word list 1 46
Table 14. The most frequent words vs. word list 2 47
Table 15. The most frequent words vs. word list 3 48
Table 16. The most frequent words vs. word list 4 49
Table 17. Coverage of levels of vocabulary types in the corpus of ESP texts 50
Table 18. Coverage of levels of vocabulary frequency in the corpus of ESP texts 52
Table 19. A sample of raw data for developing a glossary of technical words 53
Table 20. A sample of raw data for developing a glossary from low frequency
words 54
Table 21. Past participles and their frequency of occurrences 57
Table 22. Present participle/gerund and their frequency of occurrences 59
Table 23. The most common suffixes in the corpus 61
Table 24. Words with suffix –ation and their frequency 62
Table 23. Words with suffix –al and their frequency 63
Table 25. Words with suffix –ic and their frequency 64
Table 26. Summary of the most frequently met suffixes 67
Table 27. Sample of an exercise applicable to teaching technical vocabulary 76
Figure 1. Antonymy and synonymy for polysemic and homonymic words 13
Figure 2. Word morphological structure 16
viii
Figure 3. A sample of word morphological structure 16
Figure 4. A sample of concordance of words with suffix -ed 56
ix
LIST OF ABBREVIATIONS
Abbreviations
AWL : Academic Word List
EFL : English as a Foreign Language
ESP : English for Specific Purposes
GE : General English
GSL : General Service List
HUP : Hanoi University of Pharmacy
TTR : Type-Token Ratio
SCP : Simple Concordance Program
1
PART ONE
INTRODUCTION
1. Rationale
English for Pharmacy was first introduced to pharmacy students at Hanoi University of
Pharmacy (HUP) during the 1970s. The course has so far been a remarkable contribution
to the university curriculum. Nevertheless, there are some disadvantages posed for the
course. English for Pharmacy is currently taught for one semester, which is equal to 45
contact hours. The limited time allotment does not allow the syllabus to cover
substantially the subject-matter content. Furthermore, according to the recent survey
carried out by the English Department at HUP, 47% of students at HUP thought the ESP
course was difficult for them, and the same number of students wanted a longer time
allotment for the course. This is explained by the fact that many students feel they do not
have time to get acquainted with and practice skills and sub-skills while participating in
the course.
As a matter of fact, the major tasks throughout the course are concerned with reading
comprehension. Besides, other activities such as speaking/presentation or writing are
included, but not dominant. The lessons in class are only able to provide them with the
rough comprehension of the texts in which the content is pharmacy-oriented, or those in
which the content is both pharmacy-oriented and medicine-oriented (hereinafter called
medico-pharmaceutical texts). It is notable that the students have undertaken few courses
on professional subjects in their curriculum, which indicates that their background
knowledge of their major is scattered and insufficient. Accordingly, the texts used during
the ESP course here are only at a moderate level of difficulty, regarding specialist
knowledge, so that students can thoroughly understand them without previous specialist
background. Despite this, they cannot commit themselves to understand the texts
thoroughly, and therefore they do not acquire enough knowledge to perform the
comprehension tasks. Although they are instructed to deal with them in a basic way, they
still find it a struggle to comprehend the linguistic characteristics of the texts. Therefore,
it is due to these difficulties and the learning needs of the students that a more thorough
analysis is required of the English texts they study in class.
2
A literature search revealed that research into linguistic characteristics, especially on
lexical and morphological features, – which are the two significant distinguishing
features of English texts for Medicine and Pharmacy, – is still modest compared with
other branches of study in applied linguistics in general and in the study of ESP in
particular. Specifically, at the College of Foreign Languages, there has been MA research
on designing a theme-based ESP reading syllabus for students at Hanoi Medical
University (Nguyen Thi Thuy Huong, 2004) and another on students’ evaluation of the
current course at Hanoi University of Pharmacy (Nguyen Do Thu Hoai, 2004). This
clearly indicates that ESP for pharmacists in the context of Vietnam is calling for more
applied linguistic research.
These factors justified a research study which examines the lexical and morphological
features of pharmaceutical and medical texts.
2. Aims of the study
The study is aimed at:
1. finding out the features of the corpus of texts within the lexical and
morphological levels, and
2. drawing implications on the basis of the analysis of the lexical and morphological
characteristics which have been examined.
3. Research questions
There are two questions which will be answered through the data discussion and
pedagogical implications:
1. What are the lexical and morphological characteristics of the texts?
2. How are these characteristics valuable to various aspects of teaching and learning
ESP at HUP in particular as well as teaching and learning ESP in general?
4. Research methods
There are a number of approaches to text analysis, particularly discourse approach
including genre-based approach. A wide variety of applied linguistic research has been
thoroughly exploited using these approaches. In the field of English for Specific
Purposes, the popular method is genre analysis (Dudley-Evans, 1994). Another new
3
approach to ESP text analysis is a corpus-based approach, particularly in corpus
linguistics. This approach, with both statistical and linguistic methods, and both
automatic and interactive techniques, produces useful information on the size, the
importance and other characteristics of technical vocabulary in the corpus, and other
applicable results such as input materials for course design and revision, and glossary and
vocabulary lists according to specific goals set for each type of list for the course.
Seeing these advantages of a corpus-based approach to studying the pilot corpus of ESP
texts used at HUP, the researcher will carry out the study based on the methods of
corpus-based text analysis. There are various tools, some of which are computer-based,
used in corpus linguistics. In particular, the analysis of lexical features investigated
vocabulary of the corpus of the texts by identifying 4 types of vocabulary, and the
prevalent tool for carrying out this analysis is the the RANGE and FREQUENCY
program (hereinafter called RANGE) developed by Nation (2006). The methods that
Chung and Nation (2003) applied to their own study on recognising and analysing
technical vocabulary played an important role in the lexical analysis. At morphological
levels, derivational and inflected affixes which appear in the corpus of the texts are
subjected to analysis using mainly the Simple Concordance Program (Reed, 1997-2008).
However, it is anticipated that there are also some problematic features in the data in the
light of the framework followed, as the lexico-morphological features do have
exceptions. At this point, some modifications will be applied to the corpus. The validity
of the linguistic analysis will be supported by observation and informal interviews with
both teachers and students at HUP; the latter, however, are secondary sources of
information.
Before reporting on the analysis, there are some umbrella terms that require definition,
such as lexicon, morphology, lexeme, morpheme, English for Specific Purposes, text
analysis, corpus linguistics, corpus-based approach and other sub-concepts. The
conceptualisation of the above key terms is all based on publications put forward by, for
example, Biber et al. (1998), Biber et al. (1999), Celce – Murcia and Larsen – Freeman
(1983); Carstairs-McCarthy (2002); Bauer (1983); Hutchinson and Waters (1987),
Chung and Nation (2004), and Lankamp (1988).
4
5. Scope of the study
It is worth noting again that the study is focused on the linguistic characteristics of
medico-pharmaceutical texts at the levels of lexicon and morphology. Ideally, the
linguistic levels of medical English in general are investigated in terms of the following
levels (Lankamp, 1988):
(i) discourse
(ii) semantics
(iii) morphology
(iv) lexicon
It is clear that lexical characteristics are able to clearly distinguish medico-
pharmaceutical English from other English registers. However, Lankamp (1988) adds
that contrasts between medical English and other English language variations are also
discerned by discourse and morphological levels of analysis. However, while discourse
level needs to be researched in a different study under a different approach, lexical and
morphological features of ESP corpora can be analysed under the same approach with the
same group of lexical analysis tools. In particular, the target corpus of ESP texts in this
study bears an interrelation between lexicon and morphology which allows these two
levels to be likely to be combined in the process of developing teaching implications
from the analyses. Consequently, the closely-related lexical and morphological features
of this corpus are chosen as the focus of analysis of a corpus of texts, which consists of 8
texts from the newly-designed pilot coursebook by the English Department at HUP.
6. Significance of the study
There are currently impressionistic ways of teaching and learning through the medium of
the texts which will be analysed in this study, therefore the findings brought about by this
study will hopefully be one of the modest reference sources which helps teachers and
students at HUP to have a systematic grasp of the corpus of technical words and ways to
deal with them.
5
7. Structure of the thesis
The study is divided into three parts: Introduction, Development and Conclusion. Part
One - Introduction - presents the rationale, the aims, the research questions, the
applicable methods, the scope, the significance and the structure of the study. Part Two -
Development, which is the main part of the study, consists of three chapters. Chapter 1
provides a theoretical background for the research development. This chapter gives a
description of the published related materials concerning such basic concepts in lexicon
and morphology as text analysis, corpus linguistics, corpus-based approach and ESP.
Chapter 2 and Chapter 3 respectively give a description of lexical and morphological
features of medico-pharmaceutical texts used in teaching ESP at HUP. In these two
chapters, lexical and morphological features of the target corpus of the texts are analysed
employing a corpus-based approach using the methods and tools mentioned in the first
section of each chapter. Part Three - Conclusion - summarises the major findings from
this study and suggests implications for course design, materials development, teaching,
learning and testing. This part also proposes some suggestions for further research.
6
PART TWO
DEVELOPMENT
CHAPTER 1
THEORETICAL BACKGROUND
An understanding and analyses of lexical and morphological characteristics of the
selected corpus of medico-pharmaceutical texts require various concepts and theoretical
background in the field of lexicology, morphology, text analysis, corpus linguistics and
English for medicine and pharmacy. This chapter will deal with the basic concepts and
ideas to set the theoretical background for the analyses which will be carried out later in
this study.
There are different definitions of and discussions on some basic concepts from various
authors, however, the most significant publications on which this study is based are Biber
et al. (1999). This book gives a comprehensive account of English grammar based on
different large-scale grammar books; however, the feature that distinguishes this book
from other grammar books is that it not only describes the nature of language, but also
the actual use of each grammatical feature, based on corpus analytic research of four
registers: conversation, fiction, newspaper language and academic prose. The
comprehensiveness of the description of English grammar in the book and the fact that
the book itself is a corpus-based research study with data of the real use of linguistic
features are the major factors which account for the heavy reference of the book in this
study.
1.1. An overview of lexicon
Lexicon is defined in Richards et al. (1992:212) as “a set of all the words and idioms of
any language”. In Oxford English Dictionary (Oxford University, 1989), lexicon is “the
complete set of meaningful units in a language”. Other basic concepts within lexicon will
be presented in the following section.
1.1.1. Some basic concepts
1.1.1.1. Word and lexeme
In a language, a grammatical unit consists of one or more elements. The hierarchy of a
grammatical unit is shown in the way “that clause consists of one or more phrases, a
7
phrase consists of one or more words, a word consists of one or more morphemes, etc.”
(Biber et al., 1999:50). It is the words of a language that are the focus when vocabulary
of that language is spoken of.
Biber et al. (1999:51) defines word as “the basic elements of language”. And “they are
clearly shown in writing; they are the units which dictionaries are organised around”.
Carstairs-McCarthy’s definition intensifies Biber et al.’s definition of word as follows:
“words…are units of language which are basic in two senses, both
1. in that they have meanings that are unpredictable and so they must be
listed in dictionaries, and
2. in that they are the building-blocks out of which phrases and sentences
are formed”.
(Carstairs-McCarthy, 2002:5)
A simple description of characteristics of words is presneted in Biber et al. (1999:51).
According to them, words, phonologically, may be preceded and followed by a pause;
orthographically there are spaces of punctuation marks; syntactically, they may be used
alone as a single utterance; and semantically, words can obtain one or more meanings in
a dictionary.
Another term frequently used in lexicology is lexeme. Whereas words are understood as
orthographic words, which are word forms separated by spaces in written texts and the
corresponding forms in speech as discussed above, lexemes are the smallest units of a
lexicon, but may also occur in the form of a phrase, a compound word, or in special
combinations. Biber et al. define lexeme as “a group of word forms that share the same
basic meaning and belong to the same word class” (Biber et al., 1999:54). A lexeme may
be abstract, but it can be simplified by saying a lexeme allows different inflections to
affix to it to make words. For example, speak is a lexeme, meanwhile speaks and
speaking are inflected forms of speak. The dictionary information on a lexeme as a
dictionary entry generally includes its pronunciation, part of speech, inflected forms, and
various meanings, generally grouped according to its senses and sub-senses.
Every lexeme or lexical item in the language must be entered in the lexicon (which is a
comprehensive list of all words and productive derivational affixes in the language) and
8
represented on a number of levels, which include at least the following, according to
Celce – Murcia and Larsen – Freeman (1983:49):
1. spelling (orthography)
2. phonetic representation
3. syntactic features and restrictions
4. semantic features and restrictions
5. morphological regularity or irregularity
According to Celce – Murcia and Larsen – Freeman (1983:50), these different types of
information provides different functions. “Orthographical information is used when we
alphabetise things, phonological information is used when we make words rhyme, and
syntactic information is used when we match determiners and nouns appropriately”, and
“semantic information is used when we accept a lexical item in certain constructions as
meaningful.”
1.1.1.2. Word classes
According to Biber et al. (1999:55), there are three major word classes in English lexicon
which can be summarised as follows:
- Lexical words: carry meaning in a text and are generally stressed. They are the words
that remain in the language of lecture notes, headlines and so on. Lexical words are
numerous and are members of open classes (see 1.1.1.3). They have a complex internal
structure and there are four main classes of lexical words: nouns, verbs, adjectives, and
adverbs.
- Function words: binds the text together. Function words serve two major roles:
indicating relationships between lexical words or larger units, or indicating the way in
which a lexical word or larger unit is interpreted. Function words belong to closed
systems. They have high frequency and tend to occur in any texts, whereas the
occurrence of lexical words varies greatly in frequency. The differences between lexical
words and function words are shown in the following table:
9
Features Lexical words Function words
frequency low high
head of phrase yes no
length long short
lexical meaning yes no
morphology variable invariable
openness open closed
number large small
stress strong weak
Table 1. Typical differences between lexical words and function words
(Biber et al., 1999:55)
- Inserts: are newly recognised. They do not form an integral part of a syntactic
structure, inserted freely into the text. E.g.: hm, yeah, bye
Lexical words are predominant in the news text, but the distribution of lexical words is
more equal in a conversational passage, which also includes a fair number of inserts.
Inserts do not usually appear in academic prose (Biber et al., 1999: 61).
1.1.1.3. Closed system versus open classes
Both Biber et al. (1999) and Celce – Murcia and Larsen – Freeman (1983) provide a
clear description of closed system and open classes. Words, according to them, are
divided into either of these two classes:
- Closed systems: contain a limited number of members, and new members are not
easily added. These are mainly function words.
- Open classes: membership is indefinite and unlimited. These are generally lexical
words.
Every lexical item from either closed systems or open classes belongs to a part of speech.
They are nouns, auxiliary verbs, verbs, adjectives, adverbs, determiners, intensifiers, or
10
prepositions. The major parts of speech (nouns, verbs, adverbs and adjectives) constitute
open lexical categories. The other parts of speech (e.g., determiners, intensifiers,
prepositions, and auxiliary verbs) constitute closed lexical categories, since they contain
far fewer items than the open ones and they do not readily add new items or discard old
ones (Celce – Murcia and Larsen – Freeman, 1983:49).
Biber et al. (1999:56) also state that the size of function words in closed categories does
not increase very quickly, meanwhile new lexical words in open categories may be
instantaneously created by using the regular word formation processes of the language.
1.1.2. Lexical relations
There are some ways in which lexical units, especially lexical items, are related to each
other. However, regarding the relationship between lexical relations and the nature of
words in the corpus of texts that are going to be studied, there are three main kinds of
relatedness in terms of word meaning: collocation, polysemy and homonymy.
1.1.2.1. Collocation
“Collocations are the associations between lexical words so that the words co-occur
more frequently than expected by chance” (Biber et al., 1999:988). A collocation is an
association of words, and some words are more firmly associated with each other than
others. Collocations, according to Biber et al. (1999:988), do not only depend on the
meaning of the associated words themselves, but largely depend on the contexts in which
they occur. The individual words in collocations retain their own meaning, and they
obtain their extended meaning through associating with other words. Some examples are:
make a laugh, but not *do a laugh,
big problem, but not *large problem.
Therefore, a laugh is a collocate for make, problem is a collocate for big. From these
combinations, there is an understanding that words with similar meanings can be
distinguished by their preferred collocations; make, rather than do, prefers to collocate
with a laugh and big, rather than large, prefers to collocate with problem.
11
1.1.2.2. Polysemy and homonymy
Polysemy and homonymy are closely-related concerning lexical relations, however, it is
not easy to distinguish these concepts. Theoretical linguistics distinguishes between two
kinds of lexical ambiguity.
Polysemy
A word is polysemous (or polysemic) when it has two or more related meanings
(Finegan, 2000:195). For example, the word plain can have several related meanings as
follows:
(1) “easy, clear” (plain English)
(2) “undecorated” (plain white shirt)
(3) “not good-looking” (plain Jane)
(Finegan, 2000:195)
Apresjan (1974:16) classifies polysemy into two types:
(a) metaphor: senses are related by analogy
E.g.: The word table has different meanings related to each other:
(1) a thin flat piece of stone/metal/wood with four legs
(2) part of a machine tool on which work is operated
(3) a level area, a plateau
(4) the people seated at a table
(5) the food on the table
(Vo Dai Quang, 2003:26)
(b) metonymy: senses related by connectedness. The second meaning is formed on the
basis of the first, and the third is based on the second and so on.
E.g.: “Rabbit” has polysemic senses as “the animal” and “the meat of that animal”; the
meaning of the latter is based on that of the former.
Polysemy exists only in written language, not in speech. A word can only have one
meaning in speech. Therefore in reading texts, polysemy is a common phenomenon and
it causes difficulty for non-native readers.
12
Homonymy
“Words are homonymic when they have the same written or spoken form but different
senses” (Finegan, 2000:196). They are not connected semantically, for instance, “punch
1” means “blow with a fist” while “punch 2” means “a drink”. There are two types of
homonymy according to either word sound form or word meaning. There are several sub-
types of homonymy; however, because of the nature of the written language that this
study is dealing with, some sub-types of homonymy can be summarised as follows (Vo
Dai Quang, 2003):
(a) Homonymy according to sound form:
-Full/absolute homonyms: These homonyms are identical in both pronunciation and
spelling and are of the same part of speech.
E.g.: seal (n): a design printed on paper by means of a stamp vs. seal (n): a sea animal
bank (n): a financial institution vs. bank (n): a sloping side of a river.
-Partial homonyms: are words identical in pronunciation or spelling, and are
homonymous only in some forms of their respective paradigms. They may be of the same
or different parts of speech.
E.g.: still (adj): quiet
still (adv): yet
(b) Homonymy according to types of meaning:
-Lexical homonyms: words of the same part of speech but of different meanings and
there is no semantic relationship between them.
-Grammatical homonyms: words of different parts of speech.
E.g.: light (v) – light (n), asked (simple past) – asked (past participle)
Lyons (1995:58) concludes “homonymy (whether absolute or partial) is a relation that
holds between two or more lexemes, polysemy is a property of a single lexeme.” A
difficulty, however, arises in distinguishing between polysemy and homonymy, i.e., how
to know if the words are separate lexical items rather than a single word with different
senses? According to Finegan (2000:196), to have a clear distinction between polysemy
13
and homonymy must involve several criteria, none of which by itself can be sufficient.
The first criterion according to Lyons (1995:59) is etymology, or a word’s historical
origin. As an example of homonymy, bank meaning “financial institution” is a borrowed
word from Italian, while bank meaning “sloping side of a river” is traced back to a
Scandinavian word. Another criterion to distinguish between polysemy and homonymy is
to judge whether the words are semantically related (Lyons,1995:28). There is usually a
semantic relatedness when metaphorical extension appears, with the case of such words
as foot meaning “terminal part of a body”, but foot also came to mean “the lowest part of
a hill or a mountain”. Both words refer to the lowest part, which suggests they have
commonality and therefore are senses of a word. The same polysemic word, moreover,
may share the same synonyms and antonyms, however, this type of word is limited in
number, i.e., not all words have synonyms and/or antonyms. Let us take a look at this
example:
Word Sense Synonym Antonym
easy, clear
plain simple complex
undecorated
stretch of water
sound ?? ??
noise
Figure 1. Antonymy and synonymy for polysemic and homonymic words
(Finegan, 2000:196)
Homonymy and polysemy are two remarkable phenomena not only in General English
(GE), but also in ESP, which suggests that they are worth analysing for ESP texts, where
a large number of technical words share a commonality in orthography and/or sense. This
account anticipates analysis later in this study when the lexical features of the ESP texts
are studied.
14
1.1.3. Word types, word tokens and lemmas
In a sentence, a word may appear twice or more, such as the word the and is:
The sun is shining and the girl is playing with her toys under the shade of a tree.
Such words in the example are distinct tokens of a single type (Carstairs-McCarthy,
2002:5). Thus, in the above sentence, there are 18 tokens and 15 types (the and is are
repeated). In simpler words, one may say two performances of the same tune, two copies
of the same book, are distinct tokens of one type.
According to Biber et al. (1999), the relationship between the number of different word
forms, or types, and the number of running words, or tokens, is called the type-token
ratio (or TTR):
TTR= (Types/tokens) x 100
Biber et al. (1999) also think that TTR varies with the length of the text: longer texts
have many more repeated words and therefore much a lower TTR and the same
relationship between TTR and text length is found in all registers. Surprisingly, the TTR
in academic prose is somewhat lower than in fiction and news, according to Biber et al.
(1999:53).
Another concept is lemma, which consists of a headword and its inflected forms (Chung
and Nation, 2004:253). In the example below, plays and playing contain the same
headword but with different inflections:
Tom usually plays tennis in the afternoon but he is playing football this afternoon.
Plays and playing, therefore, are both inflected forms of lemma play.
It is noted that in some studies, lemma can be used as counting unit instead of word
types, or word token. Even within the same study, all word type, word token and
lemma can be counting units for different sections; for example, in the study by Chung
and Nation (2004), the unit of counting is lemma. It depends on the purpose and the
scope of a study to decide which should be the counting unit. However, only word types
and word tokens will be the counting units in chapters 2 and 3 of this study.