Tải bản đầy đủ (.pdf) (60 trang)

nghiên cứu so sánh sự do dự và dè dặt được thể hiện thông qua phương tiện ngôn điệu trong tiếng anh và các hình thức diễn đạt tương đương trong tiếng việ

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.73 MB, 60 trang )

Vietnam national university, hanoi
University of languages and international studies
FACULTY of post- graduate studies


NGUYỄN THỊ HIỀN

A COMPARATIVE STUDY ON HOW HESITATION AND
RESERVEDNESS IS EXPRESSED VIA PROSODIC MEANS IN
ENGLISH AND THE EQUIVALENT EXPRESSIONS IN
VIETNAMESE

(Nghiên cứu so sánh sự do dự và dè dặt được thể hiện thông qua phương
tiện ngôn điệu trong tiếng Anh và các hình thức diễn đạt tương đương
trong Tiếng Việt)

M.A. Minor Program Thesis


Field: English Linguistics
Code: 60 22 15





HANOI - 2010
Vietnam national university, hanoi
University of languages and international studies
FACULTY of post- graduate studies



NGUYỄN THỊ HIỀN

A COMPARATIVE STUDY ON HOW HESITATION AND
RESERVEDNESS IS EXPRESSED VIA PROSODIC MEANS IN
ENGLISH AND THE EQUIVALENT EXPRESSIONS IN
VIETNAMESE

(Nghiên cứu so sánh sự do dự và dè dặt được thể hiện thông qua phương
tiện ngôn điệu trong tiếng Anh và các hình thức diễn đạt tương đương
trong Tiếng Việt)

M.A. Minor Program Thesis


Field: English Linguistics
Code: 60 22 15
Supervisor: Nguyễn Hương Giang, M.A




HANOI - 2010

iv
TABLE OF CONTENTS
DECLARATION
i
ACKNOWLEDGEMENTS
ii

ABSTRACT
iii
TABLE OF CONTENTS
iv
LIST OF TABLES
vii
LIST OF FIGURES
vii
PART A: INTRODUCTION

1. Rationale of the study
1
2. Aims of the study
2
3. Scope of the study
2
4. Methods of the study
3
5. Design of the study
3
PART B: DEVELOPMENT

Chapter 1: THEORETICAL BACKGROUND
5
1.1 Literature review
5
1.2 Hesitation and reservedness
6
1.2.1 Definition of hesitation
6

1.2.2 Definition of reservedness
7
1.2.3 Types of hesitation and reservedness in spontaneous speech
7
1.3 Prosody
10
1.3.1 Definition of prosody
10
1.3.2 Prosodic features
11
1.3.2.1 Pitch (or fundamental frequency)
11
1.3.2.2 Loudness (or intensity)
12

v
1.3.2.3 Tempo
12
1.3.2.4 Length (or duration)
13
Chapter 2: HESITATION AND RESERVEDNESS EXPRESSED VIA
PROSODIC FEATURES IN ENGLISH AND VIETNAMESE
14
2.1 Procedures
14
2.1.1 Collecting samples of spontaneous speech
14
2.1.2 Methods
15
2.1.3 Methodological difficulties

15
2.2 Data analysis
15
2.2.1 Prosodic feature analysis of hesitation and reservedness in the English samples
15
2.2.1.1 Pitch contour
16
2.2.1.2 Duration
18
2.2.1.3 Speaking tempo
19
2.2.1.4 Loudness (or intensity)
21
2.2.1.5 Summary
21
2.2.2 Prosodic feature analysis of hesitation and reservedness in the
Vietnamese samples
22
2.2.2.1 Pitch contour
22
2.2.2.2 Duration
24
2.2.2.3 Speaking tempo
25
2.2.2.4 Loudness (or intensity)
25
2.3 Comparison: prosodic cues for hesitation and reservedness in English
and Vietnamese
26
2.3.1 Similarities

26
2.3.2 Differences
27
2.3.3 Summary
28

vi
Chapter 3: IMPLICATIONS FOR TEACHING ENGLISH LANGUAGE
SPEAKING
30
3.1 Hesitation phenomena in English Language Teaching classroom (ELT)
30
3.2 Using filled pauses to gain processing time
32
3.3 Implications for improving students‟ speaking fluency
33
3.4 Summary
36
Chapter 4: SOME KEY FINDINGS

4.1 Hesitation and reservedness in English and Vietnamese speech
37
4.2 Hesitation strategies
38
PART C: CONCLUSION

1. Concluding remarks
39
2. Suggestions for further study
39

REFERENCES
41
APPENDIX
I


vii
LIST OF FIGURES
Figure 1: Praat Editor showing waveform, spectrogram and TextGrid
16
Figure 2: Illustration of pitch contour of “No” on Praat Screen
17
Figure 3: Pitch contour of the sentence “Um…u…no. I don‟t think so. I can‟t
think of anything”
18
Figure 4: Illustration of pitch contour of two hesitation points M and E
23


LIST OF TABLES
Table 1: Summary of vocalized fillers in English and the equivalences in
Vietnamese
9
Table 2: Speaking rate (syllables/second) before and after each pause
20
Table 3: Summary of prosodic features which contribute to the expression of
hesitation
22



1
PART A: INTRODUCTION
1. Rationale of the study
For many people, learning English is uneasy and mastering it is more difficult. Some
people underestimate that a learner only needs to have a treasury of vocabulary or good
knowledge of grammar, he can speak English fluently. It is a concerning problem in
teaching and learning a foreign language at schools, universities and centers because it
causes learners to waste much more time and effort while the achievements are not up to
their expectations. At English lessons, there exist many situations in which teachers ask
students to answer the questions or read a text aloud, the students feel embarrassed and shy
even they only response in whisper or murmur. It is not because they do not have any
vocabulary in their mind or they do not know how to arrange words in a correct grammar.
But the fact that students are afraid of pronouncing words wrongly or making mistakes.
These phenomena directly or indirectly hinder the effectiveness and motivation of learning
a foreign language.
It is common knowledge that among phonetic aspects, prosody is considered a crucial
factor which helps learners master a communicative skill. People usually talk much about
intonation, stress or rhythm in language while prosody is rarely mentioned. However, in
linguistics, intonation, stress and rhythm mainly refer to prosodic components.
The title “a comparative study of how hesitation and reservedness is expressed via
prosodic features and its equivalents in Vietnamese” was chosen for my M.A thesis
because of some following reasons. First of all, prosody is a new area and not many studies
of students have been conducted on the basis of this theory because it is difficult and
challenging. Meanwhile, prosody plays an important role in the comprehension of spoken
language, it helps human in recognition of spoken words, in resolving global and local
ambiguities and in processing discourse structure. Therefore, it will be worth conducting a
study on this area. Secondly, in the world of English language teaching (ELT), however,
the communicative value of hesitation in speech has been largely ignored. The prevailing
view of hesitation seems to be that they are evidence of disfluency and should therefore be
discouraged. This view is inadequate when it assumes that fluency is directly related to

communicative ability while disfuency is inversely related. The present study examines the

2
evidence that speech hesitation sometimes supports and enhances communication and
suggests the ways they may be dealt with in the ELT classroom. Finally, there are few
studies of hesitation across languages from the perspectives of prosody and using English.
I do hope that my work can give more insight into the similarities and differences of
hesitation via prosodic means between two languages in which English is a stress-timed
language and Vietnamese is a tonal language.
2. Aims of the study
The main aims of this study are:
- To explore prosodic features in English which express hesitation and
reservedness,
- To provide a brief account of similarities and differences between hesitation and
reservedness expressed via prosodic features in English and Vietnamese,
- To give some proposals for further study and suggestions for improving speaking
skill.
To fully achieve the stated aims, the study should answer the following basic questions:
- What are the prosodic features used to express hesitation and reservedness in
English spontaneous speech?
- What are the similarities and differences in the expression of hesitation and
reservedness by prosody in English and Vietnamese?
- What tips are utilized to improve speaking fluency?
3. Scope of the study
Many fields relating to hesitation phenomena and prosodic features need to be
explored. However, due to the limited time and available facilities, this thesis only focuses
on the following aspects:
- Hesitation and reservedness in spontaneous speech;
- Typical types of hesitation and reservedness in English and their equivalences in
Vietnamese including silent pauses, filled pauses, repetitions, syllable lengthening.

- Only prosodic features of pitch, length, loudness and tempo are explored in the
expression of hesitation and reservedness.

3
- Only British English is chosen for the standard sound and the Northern dialect in
Vietnamese is utilized as the standard sound in this study.
- English non- major students at Ha Tinh Education and Information Center are chosen in
the survey to find out how frequent hesitation phenomena occur in English language
teaching classroom. Thereby, some tips are suggested to improve speaking fluency with
hesitation strategies.
4. Methods of the study
In order to explore the differences and similarities in expressing hesitation and
reservedness via prosodic features, comparative study (CS) is utilized as the key method of
the study. Here, English is employed as the instrumental language. Besides, systemization
and generalization from previous studies are also integrated as reference for this thesis.
To analyze exactly the expression of hesitation and reservedness via prosody, computer
software such as PRAAT is applied despite the fact that transcribing prosodic features is
really challenging and costly for a M.A student. The information sources for analysis come
from interviews at different situations recorded from Vietnamese channels such as VTV1
and textbooks such as TOEFL, NEW HEADWAY. Other sources are also utilized in this
study.
A survey is also conducted to find out how students express their hesitance and how
they adapt to hesitation strategies to gain speaking fluency. Then, the data are collected,
analyzed and synthesized.
5. Design of the study
This study is completed on the basis of three separating parts: introduction,
development and conclusion.
Part A is “INTRODUCTION” which gives the readers an overview of the reasons for
choosing the topic, the aims, the scope, the methods applied and the design of the study.
Part B entitled “DEVELOPMENT” which plays the most crucial role in the whole

study. It is considered as the backbone of the study. This part consists of three main
chapters. Chapter 1 shows the theoretical background of hesitation, reservedness and
prosody. Chapter 2 explores the similarities and differences of hesitation and reservedness

4
expressed via prosodic features in English and Vietnamese equivalent expressions. Chapter
3 suggests some tips for the teacher to improve the students‟ English speaking fluency.
Part C is “CONCLUSION” in which the author will give the readers some concluding
remarks as well as suggestions for further study.





5
PART B: DEVELOPMENT
Chapter 1: THEORETICAL BACKGROUND
In order to create the basis for analyzing and synthesizing the data in the main part of
the study, it is necessary for the author to have a comprehensive understanding of
theoretical background. In this part, the author will help the readers understand more about
the history of hesitation phenomena research which is also the basis for the author to
conduct the study. Besides, the nature of hesitation and reservedness in spontaneous speech
is revealed with the provision of definitions and types. Prosody is another core point which
the study focuses on so the author tries to clarify its concepts and features as well. Here,
prosody is simultaneously clarified in both English and Vietnamese language. This enables
the author to have a good approach in analyzing the similarities and differences of both
languages in the later chapter.
1.1 Literature review
The existence of hesitation and reservedness phenomena is a universal characteristic of
spontaneous speech in any language. Hence, this phenomenon has really attracted the

attention of many researchers all over the world who inspire to find out what features
contribute to the impression of hesitant speech.
Much contemporary research on hesitation phenomena in speech was derived from the
work of Goldman-Eisler (1968: 48), who argues that the analysis of speech pauses
provides an external window upon the internal constructive processes of speech selection
and organization. Following Goldman- Eisler, a fair amount of work has been done on
hesitations in spontaneous dialogues and monologues. For example, Beattie (1979: 61-78)
suggests that hesitation phenomena can be useful in studying various psycholinguistic
processes.
In this study, I am interested in exploring prosodic features to express hesitations and
reservedness; therefore, the studies of Eklund (2004) and Lovgren & Van Doorn (2005)
provide a useful background when they have shown that pauses and retardations are among
the acoustic correlates of hesitations. Moreover, it is worth mentioning the studies of Rolf
Carlson, Kjell Gustafson, and Eva Strangert (2006:1) when they prove that the total

6
duration increase, the combination of both pause duration and final lengthening are
potential cues to hesitation.
The results of mentioned-above English studies will be extremely valuable for the
present study. However, in Vietnamese language there have not yet been any works about
the expressions of hesitation via prosody for reference. Only some studies related to
Vietnamese tones are considered such as Tiếng Việt mấy vấn đề Ngữ âm - Ngữ pháp - Ngữ
nghĩa by Cao Xuân Hạo (1998) and Ngữ âm Tiếng Việt by Đoàn Thiện Thuật (2003). On
the basis of available materials, this study will give one more insight into hesitation
expressed by prosodic means in both English and Vietnamese.
1.2 Hesitation and reservedness
1.2.1 Definition of hesitation
The term “hesitation” is defined from many different perspectives. First of all, from the
perspective of psychology, the Oxford Advanced Learner‟s Dictionary 5
th

Edition
(1995:559) gives a clear concept: “Hesitation is the status of being slow to speak or act
because one feels uncertain or unwilling, to PAUSE in doubt or being worried about or
shy of doing something”. From the similar point of view, the Macmillan Dictionary for
Advanced Learners 2
nd
Edition (2002), a popular online dictionary for English learners,
also explains that hesitation is a pause before doing something, or a feeling that you should
not do it, especially because you are nervous, embarrassed, or worried. Synonyms or
related words for this meaning of hesitation can be found in words such as “uncertainty,
doubt, reservation, question, reserve”.
With regard to hesitation in spontaneous speech, a lot of definitions are given by
linguists but it is uneasy to have a common definition. Firstly, Fox Tree and Clark
(1997:152) defined hesitation as a phenomenon which occurs when “the speaker does not
immediately find an adequate option for language production and is compelled to
temporarily delay the output to solve his or her difficulties”. Later, Rolf Carlson, Kjell
Gustafson, and Eva Strangert in another study gave similar concept about hesitation
(2006: 21–24): “hesitation is the phenomenon when you are uncertain to what to say or
you have problems in lexical access or in the structuring of utterances or in searching
feedback from a listener”.

7
Obviously, Fox Tree and Clark and Rolf Carlson gave out the same explanations when
hesitation occurs in spontaneous speech. They focus more on the causes of hesitation rather
than their expressions. In this study, the expressions of hesitation are more considered.
1.2.2 Definition of reservedness
The word “reservedness” derives from the English adjective “reserved”, which
means the tendency to avoid showing one‟s feelings or expressing one‟s opinions to other
people (the Oxford Advanced Learner’s Dictionary 5
th

Edition, 1995: 997)). E.g.: “He
answers in a reserved manner”. In another dictionary, reservedness is understood as the
attitude or behavior of someone who tends not to talk about or show their feeling of doubt
about whether something is good or right (The Macmillan Dictionary for Advanced
Learners 2
nd
Edition, 2002). From psychological perspective, the term “reservedness” can
be understood similarly as hesitation when both of them refer to doubt of doing something
or uncertainty to give an utterance. However, in the light of linguistics, the term
“reservedness” is seldom mentioned because it is considered a synonym to hesitation.
1.2.3 Types of hesitation and reservedness in spontaneous speech
In spontaneous speech, when being hesitant, the speaker can use various ways to
convey his message. For example, while studying verbal planning in children‟s speech,
Brian MacWhinney & Harry Osser (1977: 980) discovered 9 types of hesitation appearing
in speech which can be recognized by the hearers. They consist of silent pause, filled
pause, drawls, initial segment phonological repetitions; word- included phonological
repetitions, word repetitions, sentence incompletion, false starts and phonological
corrections. Actually, these nine types of hesitation can be grouped into five: silent pause,
filled pause, drawls, repetitions and false starts. Similarly, Hieke (1981: 157-158) offered
one possible classification of hesitation phenomena including silent pauses, filled pauses,
prospective repeats and syllabic prolongations, false starts. Heike links hesitations to
quality control in the speech production process.
Due to limited time and the context of this study, I only focus on the most common
aspects of hesitation phenomena from the prosodic point of view. Therefore, the following
types of hesitation will be studied into detail including silent pause, filled pause, repetitions
and syllable prolongation (lengthening).

8
1.2.3.1 Silent pause
In the studies of hesitation phenomena, Lounsburry (1954: 99) shows that “hesitation

pauses are indicative of the strength of association between sequential linguistic elements”.
Silence has its own communicative value. It is possible that the speaker deliberately put
pauses into his speech to make the listener‟s job easier, or to aid them to segment speech or
to give them time to parse the speech. We have pauses at the end of syntactical boundaries,
breathing pauses and hesitation pauses. In order to differentiate among these types of
pauses, we can look at the below example in a conversation taken from the study of
Davidson in which silence is inserted into.
A: Well did you want me to just pick you- get into Robinson‟s so you could buy a
little pair of slippers?
(silence)
A: I mean or can I get you something?
(based on Davidson, 1984: 104)
Obviously, the silence follows the proposal or request which the speaker offers to the
hearer. However, the appearance of silence implies that the speaker understands their
hearer to be reluctant, not hearing, or for some other reason slow to respond. Hence, the
speaker‟s silence is an intentional signal or it is a hesitation pause.
1.2.3.2 Filled pause
Filled pauses are hesitation sounds that speakers employ to indicate uncertainty or to
maintain control of a conversation while thinking of what to say next. Filled pauses do not
add any new information to the conversation and they do not alter the meaning of what is
uttered. They are called fillers which are regarded as extra linguistic noise. In English, the
set of filled pauses includes /ah, eh, er, uh, um, erm, hm/. Among them, a nasalized “um”
and an oral “ah” are the most common fillers. Other sounds or non-lexemes can
occasionally be used as a filled pause, and some speakers may adopt the words "well", “I
mean” and "you know". In Vietnamese, the assertion of /ừm/, /à/, /ờ/ is the distinguishing
fillers. The examples of (1) and (2) illustrate the most prevalent forms of filled pauses in
English and Vietnamese.
(1) A: Tomorrow will you go to the cinema with me?
B: I… uh … am busy


9
(2) A: Em còn đứng đấy làm gì nữa?
B: Em nghĩ … ừm… là chúng ta có thể giúp anh ta.
It can be seen that /uh/ in English and /ừm/ in Vietnamese are not uttered randomly by the
speakers. In both different situations, the speakers tend to produce more /uh/ and /ừm/
before answers because they feel hesitant. Below is the summary of filled pauses easily
found in English and the equivalences in Vietnamese.
Table 1: Summary of vocalized fillers in English and the equivalences in Vietnamese
Vocalized fillers in
English
Equivalences
inVietnamese
Examples
Er
À ờ


London is the capital of
….er…England
(London là thủ đô của… à ờ…Anh)
Hmm
ừm, hừm
Hmm. I am not so sure.
(Ừm. Tôi không chắc lắm.)
Uh
ừm
"Uh I don't know how to use it”
(Ừm Tôi không biết dùng nó thế nào
cả).
Um, umn

ừm
"75 divided by 5 is um 15."
(75 chia 5 bằng ừm 15)
Ah
a, à
"Ah well, I will try."
(À, tôi sẽ thử).
1.2.3.3 Repetitions
Repetition is a common phenomenon in spontaneous speech. It is often the case that
repetition to a spoken dialogue system occurs when the users fail to make themselves
understood. Repetition can occur when a unit of speech, such as a sound, syllable, word, or
phrase is repeated, e.g: "to-to-to-tomorrow".
Repetitions in spontaneous speech in most cases involve a first instance of the repeated
word (R1), a possible silent pause (SIL), a second instance of the repeated word (R2), and
continuation of the utterance. An example is given below:
(a) I might (R1) might (R2) have to go to the cla- class.

10
(b) I might (R1) (SIL) might (R2) have to go to the cla- class
In Vietnamese, repetition often occurs in dialogues as a way of expressing the
speaker‟s attitude. Like English language, Vietnamese people can make use of repeating a
word; e.g: “Tôi tôi tôi xin lỗi em. Có lẽ không cần nữa "(From the short story Con
bé và Gã lang thang: 15 by Chiêu Hoàng).
1.2.3.4 Syllable lengthening
Apart from hesitation markers like filled pauses, syllable lengthening is quite common
in spontaneous speech. The speaker can lengthen the syllables or words, e.g: English
speakers often lengthen the words “a:nd” and “we:ll” in their utterance as in the following
example: “Yesterday he came a:nd asked about your book”. Moreover, the most common
instance of lengthening occurs when the particle “the” is pronounced as “thee” in and the
ending vowel sound is drawn out past its usually enunciated duration.

In Vietnamese language, syllable lengthening is quite popular in daily conversations as
in the below example:
Teacher: Tại sao em không làm bài tập?
Student: E:m…xin lỗi thầy nhưng mà:…
The student is quite embarrassed when the teacher asks him the reason why he has not
finished homework. In order to avoid giving a complete answer to the teacher, the student
lengthens the words “em” and “mà” in the same utterance.
1.3 Prosody
1.3.1 Definition of prosody
Prosody is an important component of language and speech so it is necessary for the
readers to catch possible definitions and concepts of this term from different views. In
linguistics, prosody can be defined as the features that do not determine what people are
saying, but rather how they are saying it. We can use the term "prosody" broadly, meaning
“a time series of speech-related information that is not predictable from a reasonable
window (i.e. word-sized or sentence-sized) applied to the phoneme sequence” (Cutler, A.,
Dahan, D., van Donselaar, 1997: 141-201). Clearly, prosody is a parallel channel for
communication, carrying some information that cannot be simply deduced from the lexical
channel. With this understanding, hand gestures, eyebrow and face motions, can be

11
considered prosody, because they carry information that modifies and can even reverse the
meaning of the lexical channel.
In the aspect of speech, prosody only comprises suprasegmental features of
fundamental frequency, duration and intensity that contribute to the melody of speech
production. Most linguists agree that these features are the main components of prosody in
English. However, the terms used may differ, depending upon whether they are being
considered from the speaker‟s point of view (physiological/production), the listener‟s point
of view (perception), or as an acoustic manifestation (measurement).
1.3.2 Prosodic features
Learners who have mastered the pronunciation of the individual sounds of English and

the way sounds combine in syllables have gone a long way toward mastering the sound
system. It is important to learners‟ ability to make themselves understood as any of the other
features of the sound system. What helps people understand is the characteristic “melody” –
the prosodic features of the language –includes variations in pitch, loudness, tempo and
length.
1.3.2.1 Pitch (or Fundamental Frequency)
Pitch is an important component to establish intonation of utterances, especially for
English. It is defined as the frequency of vibration of vocal cords and the relative height of
speech sounds as perceived by a listener. Pitch represents the fundamental frequency (F0)
of its signal which is calculated as the number of repetitions, or cycles, of its waveform per
second, and given in Hertz (Hz.). Pitch varies over an entire phrase or sentence, which is
manifested by different pitch curves. In English, three main levels: fall, rise and level can
combine to formulate patterns of pitch including fall, rise, fall- rise, rise – fall, level. Each
pattern of pitch carries its own function, e.g: fall (the impression of completeness and
finality), rise (a certain degree of doubt/ uncertainty), rise – fall (strong feeling of approval
or certainty), fall – rise (limited agreement or response with reservations or hesitation),
level (a feeling of saying something routine, uninteresting or boring). For example:
- Today we learn English phonetics (With falling tone, the speaker wants to give a
statement)

12
- Today we learn English phonetics (With rising tone, the speaker puts a question to the
hearer).
Different from English language, Vietnamese is a tonal language in which there exist
six tones (Ngo Nhu Binh, 2001: 12-14). Tones have distinctive pitch contours: Ngang has
an almost level contour (e.g: ma); Sắc has high rising contour (e.g: má); Ngã also has an
overall rising pattern, but interrupted by a glottalization in the middle (e.g: mã ); Huyền has
a falling tone (e.g: mà); Nặng has a drop tone interrupted by a glottalisation (e.g: mạ) and
Hỏi is gradually falling then rising in the last third back to the original level (e.g: mả). A
change of pitch in Vietnamese can make words change their meaning but it can not change

the meaning of the whole utterance like that in English. For example: "Đấy là một nhà thơ
lớn" (that is a famous poet) versus "Đấy là một nhà thờ lớn" (that is a big church). In the
first example a level tone is indicated for the underlined word, whereas in the second
example, the underlined word has the falling tone.)
1.3.2.2 Loudness (or intensity)
According to Peter Roach, most people seem to feel that stressed syllables are louder
than unstressed; in other words, loudness is a component of prominence (1983:72-73). If in
a sequence of identical syllables, one is uttered with the intensive loudness, it would be
heard as stressed syllable. Nevertheless, only with changing the loudness of one syllable,
the speaker will encounter some difficulties and the perceptual effect of stress is not strong.
Loudness is a perceptual response to the physical property of intensity.
Unlike English language, Vietnamese is a syllable-timed language in which the rhythm
appears to be fairly even, with each syllable giving the impression of having about the
same duration and force as any other; therefore, Vietnamese try to stress syllables by
increasing only the volume. Loudness in Vietnamese appears in the whole word if the
volume changes.
1.3.2.3 Tempo
Every speaker knows how to speak at different rates, and this linguistic use of speaking
rate is frequently called tempo. In the study of speech rate, it is usual to measure either
syllables per second or phonemes per second. Most speakers seem to produce speech at a
rate of five or six syllables per second, or ten to twelve phonemes per second.

13
Many studies have shown that the speech rate of stress- timed languages (e.g English) is
generally lower than that of syllable-timed languages (e.g Vietnamese) owing to the fact that
the syllable-timed languages tend to have a relatively simple syllable structure.
E.g: ba /ba/ father /fð/
mẹ / me/ mother /mð/
1.3.2.4 Length (or duration)
The amount of time that a sound lasts for is a very important feature of that sound. In the

study of speech, it is usual to use the term “length” for the listener's impression of how long a
sounds lasts for, and duration for the physical, objectively measurable time. For example, I
might listen to a recording of the following syllables and judge that the first two contained
short vowels while the vowels in the second two are long: / bit bet bi:t bæt /; that is a
judgment of length. But if I use a laboratory instrument to measure those recordings and find
that the vowels last for 100, 110, 170 and 180 milliseconds respectively, I have made a
measurement of duration.

14
Chapter 2: HESITATION AND RESERVEDNESS VIA PROSODIC FEATURES IN
ENGLISH AND VIETNAMESE
In this main part of the study, what I would want to do is to apply the comparative study
approach in analyzing English and Vietnamese samples. The analysis is to uncover what
prosodic features contribute to the expression of hesitation and reservedness in English and
Vietnamese spontaneous speech. Here, English is used as an instrumental language.
Prosodic features including pitch, duration, loudness and tempo are tested on PRAAT
software to determine whether they have any influence on hesitant speech. On the basis of
data collection and analysis, the author draws out the similarities and differences between
English and Vietnamese prosodic features in showing hesitation and reservedness.
2.1 Procedures
2.1.1 Collecting samples of spontaneous speech
In this study, the following data are utilized:
 In English: three job interviews are extracted from online TOEFL tests. Each individual
short sequence lasts for about 5-7 minutes. The applicant in the first interview is male; two
applicants in the other interviews are female. Each applicant is in a different mood
(overconfident, shy and technical). However, the study only focuses on the prosodic cues
for hesitation; therefore, psychological factors are omitted.
 In Vietnamese: three interviews are recorded from the program “Gõ cửa ngày mới” on
VTV1 (a national television channel). Each interview lasts for 4- 5 minutes and famous
people are invited for the interview to talk about their career and life. The interviewees

who appear in the first and the last interview are male. The other interview is female.
Most selected samples in both English and Vietnamese are at the scenario of
interviews. The reason is because the nature of an interview is that one asks and the other
answers. Therefore, the speakers are not always equally confident about or committed to
what they are saying. When asked a question, for instance, they can be certain or rather
doubtful about the correctness of their answer, and they may be unable to respond at all,
even though in some cases it might feel as if the answer lies on the tip of the tongue. These
characteristics are appropriate for analyzing hesitation phenomena in spontaneous speech.

15
2.1.2 Methods
After careful listening, the author analyzed the records in both English and Vietnamese
by using automatic speech digitalization technique. PRAAT, a programme especially
developed and designed for speech analysis computer by P. Boersma and D.Weenink at the
Phonetic Sciences Department of the University of Amsterdam was applied. Duration,
fundamental frequency, loudness and tempo were analyzed, whereby the average F0 was
measured (expressed in Hz), and its duration (in milliseconds). Read more about PRAAT
software program at
In order to mark hesitation phenomena in each extract, the transcriptions were also
used. The following conventions of transcription were defined during the analysis. All
texts are written in small letters. “ : ” indicates lengthening of syllable. {.} refers to silent
pause, “* *” refers to a inserted filled pause, “{number}” marks a pause with a duration
of milliseconds (e.g: {102} means a pause with a duration of 102 milliseconds), “//”
indicates a tone unit boundary marker.
2.1.3 Methodological difficulties
A total of 19 minutes 29 second in English speech and 13 minutes 32 second in
Vietnamese speech was collected from six subjects. However, many issues arose during
recording that are worth mentioning here. Firstly, the processing of audio and video data
required very powerful computer resources while the author finds it really unfamiliar.
Even, this is the first time the author has the chance to access to phonetics analysis by

computer software. Besides, transcribing and analyzing audio and video data is extremely
time- consuming. Especially, the author had to calculate the speaking rate. Another
difficulty is that purely automatized speech processing is error-sensitive because it requires
the subjectivity. Finally, the subjects were downloaded again from the available resources
but not directly recorded; therefore, the quality of sound was not good and some errors
during playing the records could happen.
2.2 Data analysis
2.2.1 Prosodic feature analysis of hesitation and reservedness in the English samples
In order to characterize the speech at the prosodic level, prosodic values such as
duration, F0 slopes, speaking tempo and loudness were measured by Praat version 5.1.3.7.

16
All the samples were recorded and saved in a single .wav file for each block for each
participant. Not all the sentences in three English samples were analyzed. The author only
refined 20 typical sentences which contain as much hesitation as possible after listening
carefully to each corpus. These sentences were digitized and then transcribed in Praat using
TextGrids. Figure 1 illustrates the sentence which was entered into Praat using Textgrid for
analysis. In this figure, blue line, grayish image, red speckles and yellow or green line
show pitch contour, spectrum, formant and pulses respectively. We can get the values of
fundamental frequency in Hz, duration in seconds and speaking tempo in syllable/second.
Besides, pauses are represented as blank lines (see Praat screen on Figure 1).

Figure 1: Praat Editor showing waveform, spectrogram and TextGrid
2.2.1.1 Pitch contour
An important feature to take into account when modeling pitch in hesitation
phenomena is F0 slope. For example, syllables at the end of a sentence are mainly
pronounced with a descending pitch slope, but an interrogative sentence ends with a rising
pitch. Therefore, it is reasonable to investigate whether there is a standard pitch slope for
hesitations or whether, on the other hand, it depends on other aspects, such as semantics or
syntax.


17
We will check pitch contour in a typical example extracted from different samples.
Firstly, we begin with the second sample in which the applicant appears to be unconfident
just at the beginning of the interview. Look at the following dialogue:
(1) PP: Welcome. Now I know you live locally, so I trust you didn‟t have to travel too far.
Candidate: No.
(Interview 2, line 1-4)
The candidate responds to the question by an answer “no” which consists of a fall pitch
from rather high to low and then rise to about the middle of the voice. In this context, the
candidate is quite nervous to begin answering the question given by the interviewer. So,
she is reluctant to agree what the interviewer says or in other words, she responds with
reservation. On Praat screen, this is represented in a fall- rise pitch contour (see black
curve in figure 2).

Figure 2: Illustration of pitch contour of “No” on Praat screen.
In another example, hesitation is expressed more clearly when filled pauses and pauses
appear at the same time. Consider the following extract:
(2) PP: Finally, are there any questions you‟d like to ask us?
[She thinks for a while]
C: Um….u…no. I don‟t think so. I can‟t think of anything.
(Interview 2, line 78-80)

18

Figure 3: Pitch contour of the sentence “Um…u…no.
I don’t think so. I can’t think of anything”
On Praat screen, the answer “no” also is represented in a fall- rise pitch contour.
Obviously, the use of the fall- rise tune in this case shows reservation on the part of the
speaker when it is preceded by filled pauses “um” and “u” and long silent pauses. Also, we

can use fundamental frequency (F0) to demonstrate the change of pitch contour. F0 slope
is manipulated in Hz and it ranges from 75-500 Hz. The word “no” /nəu/ is a single
syllable which has an onset. The F0 value of onset (F0 start value) and the F0 value of the
syllable- ending (F0 end value) are measured in Hz. From automatic analysis using Praat
TextGrid, we get F0 start value at 228.1 Hz, then decreasing to 181.3 Hz and rising again
at F0 end value (200.1 Hz). This means that pitch descends and then rises again.
In conclusion, when the speaker wants to implicate a limited agreement or respond
with reservations, he can use a fall-rise pitch to convey his intention.
2.2.1.2 Duration
Duration is the most important cue to the impression of hesitation. Firstly, we pay
attention to the duration of the pauses and determine how it changes on the part of hesitant
speech. As far as duration is concerned, the first parameter we test is the length of pauses
which is measured in millisecond (ms). Length is scaled automatically using the Praat
version 5.1.3.7 duration tier manipulation standard settings. Consider the following
examples in which pauses are represented in milliseconds between brackets:

19
(3) // Yeah, no problem.{120} I left loads of time „cos you know what the trains are like
nowadays. {240}// And I wasn‟t really sure where the Heppleworth site was, but
er…{333} the directions you *er* sent me were crystal clear.// (Interview 1, line 6- 11)
(4)// Um…{5005} u {1831} no. {1221}// I don‟t think so. {2099} // I can‟t think of
anything.// (Interview 2, line 80)
(5) //…{2332} no, sorry {598} that‟s the molecular ion.{2710} The base peak is….{2881}
the most intense peak. {1086} // All the other peaks are relative to this. {927}// I can‟t
believe I got them mixed up.// (Interview 3, line 29-30)
In example 3, there are very short silent pauses (120 ms, 240 ms) which appear when the
applicant has apparently figured out what he wants to convey (subsequent delivery is fairly
fluent), but is just trying to stop for breathing. In other words, these pauses play the
demarcative role between different syntactic components. The pause after the filler /er/
seems to last longer (333 ms) when the applicant is trying to find next words. However, the

duration increase in this case is not significant. In contrast, there are significant pauses of
more than one second in example 4 producing the delay device of „no immediately
forthcoming talk”. Even, the pause after the “um” lasts within five seconds and occurs
when the speaker is struggling with her hesitant response. Similarly, in example 5 the
applicant is finding it difficult to express her idea so she is trying to delay the time for each
next act. Minimum duration of pauses in example 5 is 598 ms, which is longer than the
maximum duration in example 3 (+265 ms). In terms of silent pauses, it can be concluded
that the longer the pause is, the more certain the impression of hesitance exists.
We continue to test in terms of filled pauses and syllable lengthening whether the
duration has any change. In the examples (3), (4) and (5), there exist three filled pauses
“er”, “um” and “u” and one syllable lengthening “no”. The average duration of filled
pauses is 510 ms and the average duration of syllable lengthening is 600 ms. Whereas,
other normal words in the same utterance only get average duration of 100 ms or less. It
means that there is the total duration increase of filled pauses and syllable lengthening in
case of hesitation.
2.2.1.3 Speaking tempo
Speaking tempo (or speaking rate) has also been used to demonstrate its influence on
hesitant speech. Phonetic speaking rate is usually expressed in a syllable- per- second scale

×