Email:
C9-411 Dai Co Viet str. 1, Hanoi
.c
om
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
ng
th
an
co
ng
Audio and speech signal
processing
cu
u
du
o
Tien Pham Van, Dr. rer. nat.
Hanoi University of Science and
Technology
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Email:
C9-411 Dai Co Viet str. 1, Hanoi
cu
u
du
o
ng
th
an
co
ng
• Concepts
• Signal characteristics
• Compression techniques
.c
om
Agenda
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Email:
C9-411 Dai Co Viet str. 1, Hanoi
The Nature of Sound
ng
.c
om
• Sound is a physical phenomenon produced by the
vibration of matter and transmitted as waves.
cu
u
du
o
ng
th
an
co
• However, the perception of sound by human beings is a
very complex process. It involves three
systems:
- the source which emits sound;
- the medium through which the sound
propagates;
- the detector which receives and interprets the
sound.
3
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Email:
C9-411 Dai Co Viet str. 1, Hanoi
th
an
co
ng
.c
om
• Sounds we heard everyday are very complex.
Every sound is comprised of waves of many
different frequencies and shapes. But the
simplest sound we can hear is a sine wave.
cu
u
du
o
ng
• Sound waves can be characterised by the
following attributes:
Period, Frequency, Amplitude, Bandwidth,
Pitch, Loudness, Dynamic.
4
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Email:
C9-411 Dai Co Viet str. 1, Hanoi
Pitch and Frequency
th
an
co
ng
.c
om
• Period is the interval at which a periodic signal repeats
regularly.
• Pitch is a perception of sound by human beings. It
measures how ‘high’ is the sound as it is perceived by a
listener.
• Frequency measures a physical property of a wave. It is
the reciprocal value of period f = 1/P .
The unit is Herts (Hz) or kiloHertz (kHz).
u
du
o
ng
Infra-sound 0 – 20 Hz
Human hearing range 20 – 20 kHz
Ultrasound 20 kHz – 1 GHz
Hypersound 1 GHz – 10 THz
cu
•
•
•
•
• Musical instruments are tuned to produce a set of fixed
pitches.
5
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Email:
C9-411 Dai Co Viet str. 1, Hanoi
.c
om
The characteristics of sound waves
Time for one cycle
co
ng
Amplitude
distance
along wave
Cycle
cu
u
du
o
ng
th
an
pitch
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Email:
C9-411 Dai Co Viet str. 1, Hanoi
Loudness and Amplitude
.c
om
• The other important perceptual quality is loudness or volume.
co
ng
• Amplitude is the measure of sound levels. For a digital sound,
amplitude is the sample value.
cu
u
du
o
ng
th
an
• The reason that sounds have different loudness is that they carry
different amount of power.
• The unit of power is watt. The intensity of sound is the amount
of power transmitted through an area of 1m2 oriented
perpendicular to the propagation direction of the sound.
• If the intensity of a sound is 1watt/m2, we may start feel the
sound. The ear may be damaged.
7
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Email:
C9-411 Dai Co Viet str. 1, Hanoi
co
ng
.c
om
• This is known as the threshold of feeling. If the
intensity is 10-12watt/m2, we may just be able
to hear it. This is know as the threshold of hearing.
cu
u
du
o
ng
th
an
• The relative intensity of two different sounds is
measured using the unit Bel or more commonly
deciBel (dB). It is defined by relative intensity in
dB = 10 log(I2/I1)
• Very often, we will compare a sound with the threshold
of hearing.
8
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
u
du
o
ng
th
an
co
ng
.c
om
160 dB Jet engine
130 dB Large orchestra at fortissimo
100 dB Car on highway
70 dB Voice conversation
50 dB Quiet residential areas
30 dB Very soft whisper
20 dB Sound studio
cu
•
•
•
•
•
•
•
Email:
C9-411 Dai Co Viet str. 1, Hanoi
9
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Email:
C9-411 Dai Co Viet str. 1, Hanoi
.c
om
Dynamic and Bandwidth
ng
th
an
co
ng
• Dynamic range means the change in sound levels.
• For example, a large orchestra can reach 130dB at
its climax and drop to as low as 30dB at its softest,
giving a range of 100dB.
• Bandwidth is the range of frequencies a device can
produce, or a human can hear
cu
u
du
o
e.g. FM radio:
Children’s ears:
Older ears:
50Hz – 15kHz
20Hz – 20kHz
50Hz – 10kHz
.
10
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Email:
C9-411 Dai Co Viet str. 1, Hanoi
.c
om
Computer Representation of Sound
cu
u
du
o
ng
th
an
co
ng
• Sound waves are continuous while computers are
good at handling discrete numbers.
• In order to store a sound wave in a computer,
samples of the wave are taken.
• Each sample is represented by a number, the ‘code’.
• This process is known as digitisation.
• This method of digitising sound is know as pulse
code modulation (PCM).
11
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Email:
C9-411 Dai Co Viet str. 1, Hanoi
Piano
Pan flute
cu
u
du
o
ng
th
an
co
ng
.c
om
Example waveforms
Snare drum
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Email:
C9-411 Dai Co Viet str. 1, Hanoi
.c
om
Capture and playback
of digital audio
Digital to
Analogue
Converter
th
an
co
ng
Air pressure
variations
DAC
Analogue
to Digital
Converter
cu
u
du
o
ng
Captured via
microphone
Converts
back into
voltage
ADC
CuuDuongThanCong.com
Signal is
converted into
binary
(discrete form)
0101001101
0110101111
Air pressure
variations
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Email:
C9-411 Dai Co Viet str. 1, Hanoi
.c
om
The Analogue to Digital
Converter (ADC)
co
ng
An ADC is a device that converts analogue signals into digital
signals
an
An analogue signal is a continuous value
th
It can have any single value on an infinite scale
du
o
ng
A digital signal is a discrete value
u
It has a finite value (usually an integer)
cu
An ADC is synchronised to some clock
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Email:
C9-411 Dai Co Viet str. 1, Hanoi
.c
om
The Analogue to Digital
Converter (ADC)
th
an
co
ng
It will monitor the continuous analogue signal
at a set rate and convert what it sees into a
discrete value at that specific moment in time
cu
u
du
o
ng
The process to convert the analogue to digital
sound is called Sampling. Use PCM (Pulse Code
Modulation)
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
cu
u
du
o
ng
th
an
co
ng
.c
om
Digital sampling
Sampling frequency
Email:
C9-411 Dai Co Viet str. 1, Hanoi
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Sampling
ng
du
o
u
cu
ng
th
co
Frequency of sampling
Measure in Hertz
The higher sampling rate, higher quality sound but
size storage is big.
Standard Sampling rate:
- 44.1 KHz for CD Audio
- 22.05 KHz
- 11.025 KHz for spoken
- 5.1025 KHz for audio effect
an
.c
om
Two parameters:
Sampling Rate
Email:
C9-411 Dai Co Viet str. 1, Hanoi
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Sampling
Email:
C9-411 Dai Co Viet str. 1, Hanoi
th
ng
du
o
u
8 bits (256 different values)
16 bits (65536 different values)
A higher resolution will give higher quality but will
require more memory (or disk storage)
cu
an
co
ng
.c
om
Size sample
The resolution of a sample is the number of
bits it uses to store a given amplitude value,
e.g.
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Quantisation
Email:
C9-411 Dai Co Viet str. 1, Hanoi
co
ng
.c
om
Samples are usually represented the audio sample as a
integers(discrete number) or digital
cu
u
du
o
ng
th
an
15
0
Sample points
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
.c
om
Calculating the size
of digital audio
Email:
C9-411 Dai Co Viet str. 1, Hanoi
ng
The formula is as follows:
du
o
u
The answer will be in bytes
Where:
sampling rate is in Hz
Duration/time is in seconds
resolution is in bits (1 for 8 bits, 2 for 16 bits)
number of channels = 1 for mono, 2 for stereo, etc.
cu
ng
th
an
co
rate duration resolution number of channels
8
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
.c
om
Calculating the size
of digital audio
Email:
C9-411 Dai Co Viet str. 1, Hanoi
co
ng
Example:
Calculate the file size for 1 minute, 44.1 KHz, 16 bits, stereo
sound
u
du
o
Where:
sampling rate is 44,100 Hz
Duration/time is 60 seconds
resolution is 16 bits
number of channels for stereo is 2
cu
ng
th
an
rate duration resolution number of channels
8
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
.c
om
Calculating the size
of digital audio
Email:
C9-411 Dai Co Viet str. 1, Hanoi
th
an
co
ng
rate duration resolution number of channels
8
8
cu
u
du
o
ng
44100 * 60 * 16 *2
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Email:
C9-411 Dai Co Viet str. 1, Hanoi
Simple Audio Compression Methods
.c
om
• Silence Compression - detect the "silence", similar to run-length
coding
• Adaptive Differential Pulse Code Modulation (ADPCM) e.g., in CCITT
G.721 -- 16 or 32 Kbits/sec.
du
o
ng
th
an
co
ng
– Encode the difference between two or more consecutive signals; the
difference is then quantized --> hence the loss
– Adaptive quantization
– It is necessary to predict where the waveform is headed
– Apple has proprietary scheme called ACE/MACE. A Lossy scheme that
tries to predict where wave will go in next sample. Gives about 2:1
compression.
cu
u
• Linear Predictive Coding (LPC) fits signal to speech model and then
transmits parameters of model. It sounds like a computer talking, 2.4
kbits/sec.
• Code Excited Linear Predictor (CELP) does LPC, but also transmits
error term --> audio conferencing quality at 4.8 kbits/sec.
23
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
th
an
co
ng
.c
om
LPC
Email:
C9-411 Dai Co Viet str. 1, Hanoi
cu
u
du
o
ng
•First, the observation of input and output sequences produces
a model, with a number of poles, or formants .
•A resulting set of coefficients can then describe the behaviour of a
system which is not known yet. It is used for predicting a sample.
This set of coefficients is an all-pole model, a simplified version of
the acoustic model of the speech production system.
•The analysis then estimates the values of a discrete-time signal as
a linear function of previous samples. The spectral envelope is
represented in a compressed form, using the information of
the linear predictive model.
24
CuuDuongThanCong.com
/>
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Email:
C9-411 Dai Co Viet str. 1, Hanoi
Psychoacoustic Model
Human hearing and voice
co
• Low frequencies are vowels and bass
• High frequencies are consonants
ng
.c
om
– Frequency range is about 20 Hz to 20 kHz, most sensitive at 1 to 5 KHz.
– Dynamic range (quietest to loudest) is about 96 dB
– Normal voice range is about 500 Hz to 2 kHz
th
an
How sensitive is human hearing?
To answer this question we look at the following concepts:
– Threshold of hearing
du
o
– Frequency Masking
ng
Describes the notion of “quietness”
u
A component (at a particular frequency) masks components at neighboring
frequencies. Such masking may be partial.
cu
– Temporal Masking
When two tones (samples) are played closed together in time, one can mask the
other.
25
CuuDuongThanCong.com
/>