Tải bản đầy đủ (.pdf) (14 trang)

Complex valued gaussian process regression for speech separation (tt)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (3.71 MB, 14 trang )

୯ ҥ ύ ѧ ε Ꮲ

ၗૻπำᏢ‫س‬
ᅺγፕЎ

ፄኧࠠଯථၸำӣᘜᔈҔ‫ܭ‬ᇟॣϩᚆ
Complex-valued Gaussian Process Regression
for speech separation

ࣴ ‫ ز‬ғǺLe Dinh Nguyen
ࡰᏤ௲௤ǺЦৎቼ ௲௤

ύ ๮ ҇ ୯ ΋ԭ႟Ϥ ԃ Ϥ Д


NATIONAL CENTRAL UNIVERSITY

Department of Computer Science
Master Thesis

Complex-valued Gaussian Process Regression
for speech separation

ࣴ ‫ ز‬ғ : Le Dinh Nguyen
ࡰᏤ௲௤ǺJia-Ching Wang

ύ ๮ ҇ ୯ 106 ԃ 6 Д






ύЎᄔा
ᇟॣϩᚆӧૻဦೀ౛ύࢂ΋໨‫ڀ‬ԖࡷᏯ‫ޑ܄‬ୢᚒǴ‫ځ‬ӧӚᅿ੿ჴШࣚ‫ޑ‬
ᔈҔύวචΑख़ाբҔǴ‫ٯ‬ӵᇟॣᒣ᛽‫س‬಍‫܈‬ႝߞ೯ૻǶᇟॣϩᚆ‫ޑ‬Ьा
Ҟ኱ࣁவ΋ঁ‫ڀ‬Ԗӭঁว၉‫ޑޣ‬షӝᇟॣ՗ीрঁձว၉‫ޑޣ‬ᇟॣǶҗ‫ܭ‬
ӧ΋૓ԾฅᕉნΠǴᇟॣૻဦ࿶த‫ډڙ‬Ꮣॣ‫ځ܈‬Ѭᇟॣ‫ޑ‬υᘋǴᇟॣϩᚆ
ӢԜᡂԋ΋ঁԖ֎ЇΚ‫زࣴޑ‬ፐᚒǶ
ќ΋Бय़Ǵଯථၸำ(Gaussian Process, GP)ࢂ΋ᅿ୷‫ܭ‬ਡ‫ڄ‬ኧ‫ޑ‬ᐒᏔᏢ
ಞБ‫ݤ‬Ǵ٠Ъς࿶εໆ‫ޑ‬೏ᔈҔӧૻဦೀ౛΢ǶӧԜࣴ‫ز‬ύǴ‫ॺך‬ගр୷
‫ܭ‬ଯථၸำӣᘜ(Gaussian Process Regression, GPR)‫ޑ‬Б‫ٰݤ‬ኳᔕషӝᇟॣ
ૻဦᆶଳృᇟॣϐ໔‫ߚޑ‬ጕ‫ࢀ܄‬৔Ǵ೏ख़ࡌ‫ޑ‬ᇟॣૻဦёҗGPኳࠠ‫ޑ‬ѳ֡
‫ڄ‬ኧ‫؃‬ளǶኳ္ࠠ‫ޑ‬ຬୖኧ(Hyper-parameter)җӅ೫ఊࡋ‫(ݤ‬Conjugate
Gradient Method)ٰ຾Չന٫ϯǶӧჴᡍ΢٬ҔTIMIT‫ޑ‬ᇟॣၗ਑৤Ǵ‫่ځ‬
݀ᡉҢගр‫ޑ‬Б‫ݤ‬Ԗၨӳ‫߄ޑ‬౜Ƕ

!

i


Abstract
Speech separation is a challenging signal processing which plays a
significant role in improving the accuracy of various real-world applications,
such as speech recognition system and telecommunication. Its main goal is to
isolate or estimate the target voice of each speaker from a mixed speech talked
by various speakers at the same time. Due to the fact that speech signals
collected in the natural environment are frequently corrupted by noise data,
speech separation has become an attractive research topic over the past several
decades.
In addition, Gaussian process (GP) is a flexible kernel-based learning

method which has found widespread application in signal processing. In this
thesis, a supervised method is proposed for handling speech separation problem.
In this work, we focus on modeling a nonlinear mapping between mixed and
clean speeches based on GP regression, in which reconstructed audio signal is
estimated by the predictive mean of GP model. The nonlinear conjugate gradient
method was utilized to perform the hyper-parameter optimization. An
experiment on a subset of TIMIT speech dataset is carried out to confirm the
validity of the proposed approach.

ii


Acknowledgements
The work presented in this thesis has been carried out at the Department of
Computer Science and Information Engineering in National Central University,
Taiwan during the years 2015-2017.
First of all, I wish to express my deepest gratitude to my research advisor,
Professor Jia-Ching Wang, for guiding and encouraging me in my research. The
fact that the thesis is finished at all is in great part of his endless enthusiasm for
talking about my work.
I also specially thank to Ms. Sih-Huei Chen. She greatly supported me for
theoretical and helped me take my initial thesis proposal and develop it into a
true body of work, resulting in several conference and workshop papers
together.
I would like to thank students in Laboratory for lots of interesting
discussions, various help, and making life at the laboratory so enjoyable.
Especially, I would like to thank to Ms. Sih-Huei Chen for discussing and coworking in the research, to Mr. Tuan Pham for helping me familiar with source
separation.
The financial support provided by National Central University fellowship
program and advisor Professor Jia-Ching Wang is gratefully acknowledged.

In addition, I wish to thank my family for their support in all my efforts.

iii


Table of Contents
Chapter 1 Introduction ........................................................................................ 1
1.1

Motivation................................................................................................. 1

1.2

Aim and Objective .................................................................................... 3

1.3

Thesis Overview ....................................................................................... 4

Chapter 2 Background knowledge ...................................................................... 5
2.1

Gaussian Process ...................................................................................... 5

2.1.1 Introduction ........................................................................................ 5
2.1.2 Covariance functions .......................................................................... 8
2.1.3 Optimization of hyper-parameters ................................................... 10
2.2

Short-time Fourier transform .................................................................. 12


2.2.1 Introduction ...................................................................................... 12
2.2.2 Spectrogram of STFT ....................................................................... 14
2.2.3 Inverse short-time Fourier transform ............................................... 16
2.3

Overlap-add method ............................................................................... 17

2.4

Complex-valued Derivatives: ................................................................. 22

2.4.1 Differentiating complex exponentials of a real parameter ............... 22
2.4.1.1 Differentiating complex exponentials ...................................... 22
2.4.2 Differentiating function of a complex parameter ............................. 23
Chapter 3 Employed systems ............................................................................ 26
3.1 System overview: .................................................................................... 26
3.1.1 Real-valued GP-based system for source separation ....................... 26
3.1.2 Complex-valued GP-based system for source separation ................ 28
iv


3.2 GP regression-based source separation: .................................................. 29
3.2.1 Real-valued GPR-based source separation ...................................... 29
3.2.2 Complex-valued GPR-based source separation ............................... 31
Chapter 4 Experiments ...................................................................................... 34
4.1 Real-valued GP regression-based model for source separation.............. 34
4.2 Complex-valued GP regression-based model for speech enhancement . 37
Chapter 5 Conclusions and future work ........................................................... 40
Bibliographies………………………………………………………………….41


v


List of Figures
Figure 1.1 Cocktail party problem ...................................................................... 1
Figure 1.2 An example of single channel source separation .............................. 2
Figure 2.1 GP model for regression .................................................................... 8
Figure 2.2 GP model for regression .................................................................. 12
Figure 2.3 Windows overlapping ...................................................................... 13
Figure 2.4 STFT of signal ................................................................................. 14
Figure 2.5 (2-D) presentation of a spectrogram ................................................ 16
Figure 2.6 ISTF process .................................................................................... 17
Figure 2.7 A general diagram of OLA analysis and synthesis system ............. 18
Figure 2.8 Linear convolution ........................................................................... 18
Figure 2.9 OLA overview ................................................................................. 20
Figure 2.10 An example of OLA ........................................................................ 21
Figure 3.1 Real-valued GPR-based system....................................................... 27
Figure 3.2 Complex-valued GPR-based system ............................................... 28
Figure 4.1 Spectrograms of mixture, 1 source and 1 de-noised speech............ 37

vi


List of tables
Table 2.1 List of common Kernel functions ..................................................... 10
Table 4.1 Source separation performance using 512-points STFT .................. 36
Table 4.2 Source separation performance using 1024-points STFT ................ 36
Table 4.3 SNR and SegSNR in dB averaged over the white noise .................. 38
Table 4.4 SNR and SegSNR in dB averaged over the babble noise ................. 38


vii


List of symbols and abbreviations
Symbols
È
f*
x*
cov( f* )

ld

s
θ
I



z
zR
zI

Ñ

՜

Joint distribution

՜


Test input

՜

Characteristic length-scale

՜

Set of hyper-parameters

՜

Derivative function

՜

Predictive mean

՜

Predictive covariance

՜

՜

Variance

Identity matrix


՜

Complex number

՜

Imaginary part of z

՜
՜

Real part of z

Gradient

viii


Abbreviations
DNN
GP
GPR
NMF
SCSS
STFT
DFT
STFTM
FT
FFT

iSTFT
iFFT
SDR
SAR
SIR
SNR
SegSNR
i.i.d

՜

Deep neural networks

՜

Gaussian process regression

՜

Gaussian process

՜

Nonnegative Matrix Factorization

՜

Short-time Fourier transform

՜


STFT magnitude

՜

Fast Fourier transform

՜

Inverse Fast Fourier transform

՜

Source-to-artifacts ratio

՜

Signal-to-noise

՜

Independent and identically distributed

՜

Single-channel speech separation

՜

Discrete Fourier transform


՜

Fourier transform

՜

Inverse Short-time Fourier transform

՜

Source-to-distortion

՜

Source-to-interference ratio

՜

Segmental signal-to-noise ratio

ix



×