Tải bản đầy đủ (.pdf) (73 trang)

Studying the phonetic characteristics of glottalized tones in vietnamese expressive speech

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (3.06 MB, 73 trang )

MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
--------------------------------------NGUYEN THI LAN

Nguyen Thi Lan

INFORMATION TECHNOLOGY

STUDYING THE PHONETIC CHARACTERISTICS OF
GLOTTALIZED TONES IN VIETNAMESE EXPRESSIVE
SPEECH

MASTER THESIS OF SCIENCE
….........................INFORMATION TECHNOLOGY............................

2013B
Hanoi – 2015


MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
--------------------------------------Nguyen Thi Lan

STUDYING THE PHONETIC CHARACTERISTICS OF GLOTTALIZED
TONES IN VIETNAMESE EXPRESSIVE SPEECH

Department :

INFORMATION TECHNOLOGY

MASTER THESIS OF SCIENCE


….........................INFORMATION TECHNOLOGY............................

SUPERVISOR:
Dr Tran Do Dat

Hanoi – 2015


COMMITMENT
I commit myself to be the person who was responsible for conducting this study.
All reference figures were extracted with clear derivation. The presented results are
truthful and have not published in any other person’s work.
NGUYEN Thi Lan

1


ACKNOWLEDGEMENT
This is the second time that I sit here, at Hanoi University of Science and
Technology, with a great honor to write these grateful words to people who have
been supporting me since the first moment I entered the university. The first
acknowledgement was written in my graduation thesis 2.5 years ago and today, this
one just awakes a special emotion in me.
I wish to thank all my professors and colleagues at School of Information and
Communication Technology and MICA International Research Institute, who have
helped me with generous supports. Their advice and knowledge they imparted to me
are gratefully appreciated, inspiring me a lot to finish this thesis.
Special thanks to my supervisor Dr. Tran Do Dat and colleagues of Speech
Communication Department, MICA Institute, including Dr. Do Thi Ngoc Diep,
Nguyen Thi Thu Trang, Nguyen Tuan Ninh, Tran Thi Anh Xuan, Dr. Nguyen Viet

Son, Dr. Nguyen Cong Phuong, Nguyen Duc Anh and Nguyen Tien Thanh, for
their advice and encouragement they gave to me, especially Dr. Mac Dang Khoa
and Dr. Alexis Michaud for their thorough review and invaluable suggestions.
Another thanks for two thesis reviewers including Assoc Prof. Truong Ninh
Thuan (VNU) and Dr. Vu Thi Huong Giang (SOICT, HUST) for their worth
comments which helped the thesis’s presentation become much better.
Special thanks to my family and friends who always stand by me, lifting me up
when I was down. Without them, my life would be nonsense!
NGUYEN Thi Lan

2


CONTENTS
COMMITMENT .................................................................................................................. 1
ACKNOWLEDGEMENT................................................................................................... 2
LIST OF ABBREVIATIONS ............................................................................................. 5
LIST OF TABLES ............................................................................................................... 6
LIST OF FIGURES ............................................................................................................. 7
INTRODUCTION ............................................................................................................... 8
Chapter 1.
1.1

OVERVIEW ............................................................................................... 11

Background knowledge .................................................................................................... 11
1.1.1

Vietnamese phonetics and phonology ..................................................................... 11


1.1.2

The phonetic characteristics of complex lexical tone system in Vietnamese .......... 15

1.2

Glottalized tones in the context of expressive speech: raising issues ............................... 18

1.3

The scope of the thesis ...................................................................................................... 20

1.4

Conclusion ........................................................................................................................ 20

Chapter 2. BUILDING VIETNAMESE ATTITUDINAL SPEECH CORPUS FOR
SENTENCE-FINAL PARTICLES .................................................................................. 22
2.1

Method of using expressive morphemes carrying lexical tones ̶ Sentence-final particles
22

2.2

Designing sample corpus .................................................................................................. 24

2.3

The progress of building the sample corpus ..................................................................... 27


2.4

2.3.1

Elicitation method and speakers ............................................................................. 27

2.3.2

Recording conditions .............................................................................................. 29

2.3.3

Post-processing and annotation ............................................................................. 30

Conclusion ........................................................................................................................ 32

Chapter 3. ANALYSING VARIATION IN REALIZATION OF GLOTTALIZED
TONES BY VARIOUS ATTITUDES .............................................................................. 33
3.1

Analysis Method ............................................................................................................... 33

3.2 A pilot data analysis: important new discoveries and an insight into the use of expressive
morphemes and glottalized tones ................................................................................................ 36
3.2.1

Comparison of attitudes: Surprise and Declaration............................................... 37

3.2.2


Comparison of attitudes: Irritation and Declaration ............................................. 38

3


3.3 Proposals for a full-scale study and statistical analyses of phonation-types based on EGG
and DEGG signal ........................................................................................................................ 40

3.4

3.3.1

Observation about the irregularities of DECPA .................................................... 41

3.3.2

Building a tool for detection of creaky and pressed voice in Matlab ..................... 44

A full-scale study: detailed analysis results ...................................................................... 47
3.4.1

Surprise and Declaration ....................................................................................... 48

3.4.2

Irritation and Declaration ...................................................................................... 50

3.5


Discussion ......................................................................................................................... 53

3.6

Conclusion ........................................................................................................................ 55

CONCLUSION AND PERSPECTIVES ......................................................................... 56
REFERENCES................................................................................................................... 59
PUBLICATIONS ............................................................................................................... 63
APPENDIX A: CONTEXTS OF EACH COLLECTED SENTENCE ......................... 63
APPENDIX B: FIGURES OF AVERAGED F0&Oq CONTOURS OF EACH
SPEAKER WITH STANDARD DEVIATION FOR THE USED ATTITUDES ........ 67

4


LIST OF ABBREVIATIONS
SFP – Sentence-final particle
EGG – Electroglottography
DEGG – The derivative of the electroglottography signal
IPA – International phonetic association
DECPA – Derivative-Electroglottographic Closure Peak Amplitude
X-SAMPA – The Extended Speech Assessment Methods Phonetic Alphabet
F0 – Fundamental frequency

5


LIST OF TABLES
Table 1-1 Vietnamese consonants .............................................................................12

Table 1-2 Vietnamese vowels/diphthongs ................................................................13
Table 1-3 Phonetic characteristics of Vietnamese initial consonants .......................14
Table 1-4 Phonetic characteristics of Vietnamese final consonants .........................14
Table 1-5 Phonetic characteristics of Vietnamese vowels/diphthongs .....................15
Table 1-6 Summarized description of 8 tones of Vietnamese ..................................16
Table 2-1 Intended attitudes ......................................................................................25
Table 2-2 List of speakers ........................................................................................28
Table 3-1 Statistics of Mechanism I-A/Pressed Voice/Mechanism I-B of tone 6a
with attitude surprise .................................................................................................50
Table 3-2 Statistics of Mechanism I-A/Creaky Voice/Mechanism I-B of tone 3 with
attitude declaration ....................................................................................................52
Table 3-3 Statistics of Mechanism I-A/Pressed Voice/Mechanism I-B of tone 3 with
attitude irritation ........................................................................................................53

6


LIST OF FIGURES
Figure 1-1 Schematic diagram of Hanoi Vietnamese tones (Michaud, 2004a) ........18
Figure 2-1 Speaker F7 (left) and M10 (right) in the recording booth .......................30
Figure 2-2 Sentence and Syllable Level Annotation with SoundForge (above) and
Praat (below) of the corpus .......................................................................................31
Figure 3-1 Visualization of closing instant synchronized with EGG (above) and
DEGG (below) signals (Henrich, 2001) ...................................................................34
Figure 3-2 Visualization of opening instant synchronized with EGG (above) and
DEGG (below) signals (Henrich, 2001) ...................................................................35
Figure 3-3 Example of EGG and DEGG signals with indication of glottis closure
and opening ...............................................................................................................36
Figure 3-4 Two realizations of glottalization on SFP /a6a/ with two attitudes. (a):
declarative/neutral; (b): surprise. Speaker M7 ..........................................................37

Figure 3-5 Average curves of F0 and Oq over 6 tokens of /a6a/, speaker M7 .........38
Figure 3-6 Two realizations of glottalization on SFP /ɗa3/ of two attitudes. (a):
declarative/neutral; (b): irritation. Speaker M6.........................................................39
Figure 3-7 Average curves of F0 and Oq over 6 tokens of /ɗa3/, speaker M6 .........40
Figure 3-8 Determining mechanisms of voice based on DECPA and F0 parameters
(each point of F0&Oq contour corresponds with a cycle on DEGG signal) ............42
Figure 3-9 Determining the duration of pressed voice based on local dipping of Oq
(each point of F0&Oq contour corresponds with a cycle on DEGG signal) ............44
Figure 3-10 The tool for detection integrated three analysis modules......................45
Figure 3-11 Some visually illustrative figures of creaky voice from the detection
tool .............................................................................................................................47
Figure 3-12 Some visually illustrative figures of pressed voice from the detection
tool .............................................................................................................................47
Figure 3-13 Averaged F0 contours of SFP /a6a/ of 10 male speakers: surprise and
declaration .................................................................................................................49
Figure 3-14 Averaged Oq contours of SFP /a6a/ of 10 male speakers: surprise (left)
and declaration (right) ...............................................................................................49
Figure 3-15 Averaged F0 contours of SFP /ɗa3/ of 10 male speakers: irritation and
declaration .................................................................................................................51
Figure 3-16 Averaged Oq contours of SFP /ɗa3/ of 10 male speakers: declaration .51
Figure 3-17 Averaged Oq contours of SFP /ɗa3/ of 10 male speakers: irritation.....53
Figure 3-18 Proposed model for combination of speaker attitude, voice quality and
glottalized tone in Vietnamese expressive speech processing ..................................54

7


INTRODUCTION
Nowadays, using speech in human-machine interaction is gradually becoming the
major trend which promises to replace traditional communication methods: mouse,

keyboard, screen, for example. However, a high-quality human-machine interaction
system that can completely behave as a human being, currently, is still just beyond
our reach. One of the primary reasons is because of the lack of advanced techniques
that enable precisely processing (either synthesis or recognition) the expression of
human utterances.
The expression, in other words, refers to attitudinal or emotional aspects when
someone speaks, which hereby can convey much linguistic information. In this
perspective, the attitudinal aspects in speaker utterances, also called speaker
attitudes are of no small importance. If speaker attitudes play such an important role
in the interactions between humans, they need to be taken into account in the
interaction between humans and machines (Picard, 1997). Attitudinal information in
a spoken utterance can be lexically encoded but can also be conveyed by intonation,
including modifications of voice quality (Seibert, 2003).
However, the modification of those features in Vietnamese is quite complex since
it has the interplay between intonation and tones; especially, the complexity even
becomes much more complicated when dealing with glottalized tones which are
tone ngã and tone nặng. Furthermore, in expressive speech, how the interplay can
be expressed, what its realization will be and with which mechanisms, are several
among many questions set out.
Among eight tones in Vietnamese, tone ngã and tone nặng are considered the
most complicated since they have glottalization phenomenon accompanied. In most
cases, with simpler tones, the interaction between intonation and tone simplifies to
be described by the changing in fundamental frequency, intensity or duration
parameters, whereas with these two glottalized tones, these parameters are exactly

8


not sufficient since their glottalization phenomenon can vary a lot depending on
context. Obviously, there have been many researches that tried to approach this but,

actually, they seem to avoid the most complicated aspect which is glottalization
phenomenon in Vietnamese.
Therefore, towards application in Vietnamese speech processing, the ultimate
objective is to provide sufficient detail of the interplay between glottalized tones
and intonation for both automatic speech recognition system and text-to-speech
system in encoding and decoding attitudinal information in speaker’s utterances.
Specific contents of the thesis are as follows:
Chapter 1 presents overview of phonetic and phonology, tone and the expression
of attitudes in Vietnamese as well as existing issues that need to be dealt with and
thesis’s approach.
Chapter 2 and 3 show proposed methods for data acquisition and analysis which
was based on EGG and DEGG signal in order to clarify the interaction mechanism
between glottalized tones and expressive speech intonation.
Finally, Chapter 4 gives some conclusions and perspectives for expanding the
study to cover wider range of speaker attitudes and more tones in Vietnamese.
The obtained results include:
 Thesis Report
 Attitudinal Corpus: recorded with 10 males and 10 females
 Method and tool for detection and quantification of Creaky and Pressed
Voice in Surprise/Irritation/Declaration Attitude
 1 International Conference Paper: INTERSPEECH 2013

9


 1 National Journal Paper: Journal of Science & Technology of Technical
Universities in Vietnam, 101 (2014)

10



Chapter 1.

OVERVIEW

Similar to any other language, Vietnamese has a rich system of consonants and
vowels together with various regulations of forming meaningful words. However,
one of the special characteristics which make it even more attractive in the eyes of
researchers is that it has a complex lexical tones system. So, why it is evaluated to
be complex and why the topic focusing on studying its tones system was chosen as
major point of the thesis. Furthermore, the author also conducted a research on
expressive speech and emphasized that the relationship between tonal realization
and attitudinal expression in Vietnamese should be taken seriously, is this a unique
point that distinguishes Vietnamese from others? In this part, a brief introduction
will be presented to bring you a clear look of Vietnamese phonetics and phonology.
Additionally, the section of raising issues will clarify the questions above as well as
our interests.
1.1 Backgroundknowledge
1.1.1 Vietnamese phonetics and phonology
There has been many works involving in studying Vietnamese phonology system
for years such as (Doan, 1977), (Nguyen, Edmondson, & Jerold, 1998), (HwaFroelich, Hodson, & Edwards, 2002), (Nguyen, Carre, & Castelli, 2008), (Michaud
& André-Georges, 2010) and (Hajek, 2008). Among these, there exists different
concepts in establishing Vietnamese phonology system, but in general, the list of
consonants and vowels in Vietnamese can be summarized respectively as in Table
1-1 and Table 1-2 in both IPA-symbol system and X-SAMPA-symbol system
(Doan, 1977):
Where:
1

: /zi/ if initial followed by consonant, ê or nothing


2

: final only for this phoneme

3

: final except after u, o, ô

11


4

: ngh - initial only (before i, e, ê); ng – initial except before i, e, ê

5

: gh - initial before i, e, ê; g – initial except before i, e, ê

6

: initial except before i, e, ê, y; final after u, o, ô

7

: initial before i, e, ê, y
Table 1-1 Vietnamese consonants

12



Table 1-2 Vietnamese vowels/diphthongs

Specifically, there are totally 24 phonemes of consonants for both initial and final
positions; some of them are either initial only or final only while others can be both
or several phonemes just follow certain vowels. Besides, there are only 9 long
vowels, 4 short vowels and 3 diphthongs which are combination of single vowels.
Table 1-3 and Table 1-4 describe phonetic characteristics of these consonants. In
these tables, the format to represent phonemes is “IPA-symbol (X-SAMPAsymbol)”, where the (XSAMPA symbol) part disappears if it is the same as the IPAsymbol. For two variants of /ɲ/ and /k/, final consonants after /u ɔ o/, /ɲm/ is labialvelar nasal while /kp/ is voiceless labial-velar plosive (Hajek, 2008) (Doan, 1977).

13


Table 1-3 Phonetic characteristics of Vietnamese initial consonants

Green bold consonants: Not exist in Northern dialect. Besides, for this dialect:
- ch- /c/ and tr- /ʈ/ are pronounced alike
- d-, gi- /z/ and r- /ʐ/ are pronounced alike
- x- /s/ and s- /ʂ/ are pronounced alike
Table 1-4 Phonetic characteristics of Vietnamese final consonants

Table 1-5 presents the phonetic characteristics of 16 vowels and diphthongs in
Vietnamese. Similar to other languages, they are distinguished from each other
based on which part of the tongue is involved (front, central, back) and how high
the tongue is when the sound is produced (high, mid, low).

14



Table 1-5 Phonetic characteristics of Vietnamese vowels/diphthongs

Above is a brief introduction on Vietnamese phonetics and phonology, the next
session will present one of the problems that is always a challenge to anyone who
want to approach Vietnamese – Vietnamese tones system.
1.1.2 The phonetic characteristics of complex lexical tone system in
Vietnamese
Vietnamese is a tonal language, that is the meaning of each word depends on the
"tone" in which it is pronounced. Many other languages also use tones, such as
Mandarin and Thai. However, it can be said that Vietnamese tone system is
relatively complex in comparison with the others since it has a six-tone paradigm
for sonorant-final syllables, and a two-tone paradigm for obstruent-final syllables
(Michaud, 2004a). The experiment in warrants the conclusion that rising (5b) and
drop (6b) tones (i.e. the tones of syllables ending in /p/, /t/ or /k/ - checked syllables)
are not glottalized, either in final or non-final position. Therefore, it could be said
that there are 8 different tones in Vietnamese language. The work on oral flow
(Michaud, Vu, Angelique, & Bernard, 2006) brings

out a

clear

difference

between these two sets of rhymes: tone 6a (drop tone in unchecked syllables)
has low oral airflow; tone 5b and 6b have relatively high oral airflow, getting
close to the range of breathy voice.

15



Table 1-6 Summarized description of 8 tones of Vietnamese

Specifically, phonetically detailed description of each tone which is summarized
from (Thompson, 1987)(Mixdorff, Nguyen, Fujisaki, & Luong, 2003)(Nguyễn,
1997)(Michaud, 2004a) is as follows:
Tone 1 – level tone (“ngang”) is modal and sometimes lax and its contour is
nearly level in non-final syllables not accompanied by heavy stress, although even
in these cases it probably trails downward slightly.
Tone 2 – falling tone (“huyền”) is lax, starts quite low and trails downward toward
the bottom of the voice range. It is often accompanied by a kind of breathy voicing
(voiceless + modal), reminiscent of a sigh. For some speakers it is even lax to the
point of breathiness with somewhat lowered subglottal air pressure.
Tone 3 – broken tone (“ngã”) is also high and rising, the F0 contour being similar
to that of tone 5, but it is accompanied by the rasping voice quality (strong creaky

16


voice starting toward the middle of the vowel, which is then lessening as the end of
the syllable is approached) occasioned by tense glottal stricture. In careful speech
such syllables are sometimes interrupted completely by a glottal stop (or a rapid
series of glottal stops). Its trajectory therefore sometimes shows a characteristic
break in the voicing at about half of the total duration of the syllable. Many
speakers begin the vowel with modal voice, followed by strong creaky voice
starting toward the middle of the vowel.
Tone 4 – curve tone (“hỏi”) is tense and drops rather abruptly. It starts with modal
voice phonation, which moves increasingly toward tense voice with accompanying
harsh voice (although the harsh voice seems to vary according to speaker). In final
syllables, and especially in citation forms, this is followed by a sweeping rise

at the end, and for this reason it is often called the ‘dipping’ tone. However,
non-final syllables seem only to have a brief level portion at the end, and this is
exceedingly elusive in rapid speech. Although tone 4 is usually described as a low
falling and then rising tone, not all Vietnamese speakers have the rising part. Curve
and broken tones are both tense but their tension is not alike and is not distributed
across the syllable in the same way.
Tone 5a – rising tone (“sắc”) is high and rising (perhaps nearly level in
rapid speech) and tense. Phonetically, tone 5a is produced with modal voice.
Tone 6a – drop tone (“nặng”) is also tense; it starts somewhat lower than tone 4.
Syllables bearing tone 6a have the same rasping voice quality as tone 3, drop very
sharply and are almost immediately cut off by a strong glottal stop. Tone 6a is
much shorter than other tones with a tendency to go lower.
As for tones 5b and 6b, the orthography identifies tone 5b with tone 5a as sắc and
tone 6b with tone 6a as nặng; which indicates the names that the tones carry in
present-day Vietnamese orthography. However, tones 5b and 6b are

not

glottalized, either in final or non-final position (Michaud, 2004a). Tone 6a is

17


characterized by a gesture of strong constriction that is distinct from creaky voice;
tone 6b drops more sharply than tone 2, but it is never accompanied by the
breathy quality of tone 6a.

Figure 1-1 Schematic diagram of Hanoi Vietnamese tones (Michaud, 2004a)

This section has shown all issues involved in features of Vietnamese tones that

need to be taken into account when approaching the language. The next section will
talk about the expression of expressive speech generally in common languages.
1.2 Glottalizedtonesinthecontextofexpressivespeech:raisingissues
Glottalization is a challenge for speech processing by disrupting F0 estimations
(make it not clear how to measure), raising problem for averaging/ building a
model. Specifically, most models of speech synthesis and recognition system
currently do not take the control of glottalization into account due to its complexity.
In languages such as English: the issue may appear secondary, as glottalization is
not phonological in the standard variety. Glottalization is a characteristic of certain
sociolects: creak in “drawl”; ‘glottaling’ of /t/, which is becoming increasingly
common in familiar speech, used to be stigmatized as “working-class”/vulgar
(Fabricius & Anne, 2002). Among national languages of Europe, only Danish

18


possesses phonological glottalization (stød) (Fischer-Jørgensen & Eli, 1989). There
exist languages in which glottalization is controlled in greater phonological detail,
for instance languages of the Mon-Khmer family of languages, but these languages
are relatively less well-studied, and given the present state of the documentation,
studies of the fine phonetic detail of these phenomena in discourse is seldom
perceived as a priority by linguists (DiCanio & Christian, 2009).
Hanoi Vietnamese has a key role to play here: it has extremely rich glottalization
phenomena; and as the official standard of a country with about 90 million
inhabitants, it receives increasing attention from specialists of speech technology. A
salient aspect of the Hanoi Vietnamese tone system is the use of phonation-type
characteristics (Nguyen et al., 1998)(Brunelle, Nguyen, & Nguyen, 2010)(Kirby,
2010)(Brunelle, 2009a), absent from other dialects (Tran, 1969). Hanoi Vietnamese
makes use of glottalization as part of the lexical specification of some of its lexical
tones. In particular, tones 6a and 3 are glottalized. Tone 3 (also referred to by its

orthographic label, ngã, or the English descriptor ‘broken tone’) is a rising tone with
a strong glottalization in its first half. Tone 6a (orthographic nặng, ‘drop tone’)
starts on a middle pitch and usually falls dramatically because of a strong
glottalization in its second half. It has been reported that glottal constriction for tone
6a is consistently present both in a ‘neutral’ context and in an ‘emphatic/impatient’
context (Michaud & Vu, 2004).
Glottalization in Vietnamese is not only a distinctive characteristic of tone: fine
details in its phonetic realization can convey intonational information. Vietnamese
has salient intonational phenomena (Tran & Castelli, 2008). The surface realization
of tones depends greatly on intonation: phrasing, prominence, and the expression of
attitudes and emotions. Therefore, it appeared worthwhile to investigate how
speaker attitude affects the realization of glottalization, a phonetic dimension which
is cross-linguistically known to convey “paralinguistic” information (Fónagy,
1983)(Gobl & Ní Chasaide, 2003). Specifically, the research issue is: how fine-

19


grained details in the phonetic realization of glottalized tones convey attitudinal
information in Vietnamese expressive speech.
This is a challenge for speech processing: models such as Fujisaki’s (Mixdorff et
al., 2003), which focuses exclusively on F0, would require substantial additions
before they can handle such phenomena. New-generation speech processing for
Vietnamese will require facing the challenge of synthesis/fine tuning of phonation
types.
1.3 Thescopeofthethesis
In view of the context set out above, the goal of the present study is to investigate
the phonetic characteristics of glottalized tones in Vietnamese expressive speech,
focusing on sentence-final particles. Due to limitations of the present study,
applications in speech processing will not be attempted. The aim of the present

study is to provide a sufficiently detailed analysis of production data to pave the
way for fresh work on the synthesis and recognition of attitudes in Vietnamese in
future.
More precisely, we concentrate on studying tone 3 and tone 6a with three
attitudes: Declaration, Surprise and Irritation, since they have the clearest
perception (Mac, 2009). The objective is to answer the question that how these
attitudes can change the realization of glottalization on these two tones and the use
of its special voice qualities. Even so, the process of building speech corpus will not
be limited on these objects only, so that it can serve for further research as well.
1.4 Conclusion
This chapter has presented some overview of phonetics and phonology as well as
the phonetic characteristics of lexical tone system in Vietnamese. After which, the
existing issues and the author’s interests of glottalized tones and expressive speech
were given as the main point of the thesis. In the next chapter, the author proposed
an approach of using expressive morphemes called Sentence-final particles as the

20


objects to study the glotalization in the interaction between lexical tone function and
attitudinal function. This chapter will present the construction of our corpus for this
research.

21


Chapter 2.

BUILDING VIETNAMESE ATTITUDINAL


SPEECH CORPUS FOR SENTENCE-FINAL PARTICLES
As discussed in the last chapter, this chapter will focus on the construction of
speech corpus which serves for investigation of the interplay between glottalized
tones and attitudinal expression in Vietnamese. Besides, several special SFPs which
carry both lexical tones and attitudinal information were used to construct target
sentences which concentrate on basic speaker attitudes and glottalized tones.
There already exists a corpus designed for the study of social attitudes in
Vietnamese (Mac et al., 2009), but it does not contain SFPs. We therefore decided
to record new data. Speech data acquisition is an underestimated challenge (Niebuhr
and Michaud), especially when attempting to capture such elusive aspects of speech
as attitudinal information. Special attention was therefore paid to the elaboration of
materials and recording procedures.
In particular, the research was divided into two phases and corresponding to these
two phases, two different corpora were built. The first phase conducted a pilot study
with a small corpus and four speakers to initially explore hypotheses on SFP,
glottalized tones and speaker attitudes. After that, the second phase, with larger
corpus recorded with 20 speakers, expanded on the pilot study’s obtained results.
Specifically, in the scope of the thesis, we aimed for demonstrating the qualitative
observation results by concentrating on analyses of tone 3, tone 6a, three studied
attitudes and male speakers; the rest part of the built corpus was reserved for further
research. This chapter will present both of these two corpora.
2.1 Methodofusingexpressivemorphemescarryinglexicaltones̶
Sentence-finalparticles
Languages differ in the means that they offer for the expression of attitudes and
emotions. In English, intonation is known to fulfill a considerable range of
functions, including subtle nuances related to attitudes and emotions. Japanese and

22



Cantonese are famous examples of languages that possess morphemes which have
been described as performing functions that intonation does in a language such as
English (Chan & Marjorie, 1999). For instance, in Cantonese, the particle /µEſ/ is
used as an illustration. This particle is suffixed to a declarative sentence to convert
the sentence into a question of disbelief or surprise (Wu 2008, p. 24) or a “query to
the truth of something” (Kwok 1984, p. 88).
The particles specifically called sentence-final particles (hereafter SFPs) constitute
a

marginal

class

of

expressive

words

indicating

speech

act

types,

evidential/epistemic nuances, and affective/emotional colouring. There are about
ten SFPs in Mandarin, thirty in Cantonese (Kwok & Helen, 1984), and about the
same number in Vietnamese (Tran, 2010); SFPs are ubiquitous in casual,

conversational speech. SFPs “often carry much of the meaning and function that
intonation does in non-tone languages” (Chan & Marjorie, 1998); the relationship is
not simply one of functional equivalence between intonation and SFPs, however,
since SFPs also carry intonational information: sentence-level intonational
phenomena are known to cluster on SFPs. One and the same SFP can take on
different nuances (creating different sense-effects) depending on the intonational
realization of the SFP itself (the ‘tune’ that it carries) and of the sentence as a
whole.
In Vietnamese, where they clearly have a tone of their own, SFPs provide an
exemplary illustration of the superposition of tone and intonation. An important
proportion of sentence-level intonation, conveying sentence mode, attitudes... is
concentrated at the end of the utterance, on the SFP(s) (Do, Tran, & Georges, 1998).
This superposition affects F0 (Nguyen & Tran, 2012), but also phonation types. The
purpose of the present study is to investigate how speaker attitude affects the
realization of glottalization for the two glottalized tones 6a and 3 (orthographic
nặng and ngã) carried by SFPs. A pilot study (Nguyen, Michaud, Tran, & Mac,
2013) suggests that glottalization is phased earlier for surprise than for declaration,
and that irritation also tends to be reflected in earlier glottalization, but with an

23


×