Slide đa phương tiện chương 1 audio

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.34 MB, 56 trang )

Email:
C9-411 Dai Co Viet str. 1, Hanoi

.c
om

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

ng

th

an

co

ng

Audio and speech signal
processing

cu

u

du
o

Tien Pham Van, Dr. rer. nat.
Hanoi University of Science and

Technology

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

Email:
C9-411 Dai Co Viet str. 1, Hanoi

cu

u

du
o

ng

th

an

co

ng

• Concepts

• Signal characteristics
• Compression techniques

.c
om

Agenda

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

Email:
C9-411 Dai Co Viet str. 1, Hanoi

The Nature of Sound

ng

.c
om

• Sound is a physical phenomenon produced by the
vibration of matter and transmitted as waves.

cu

u

du
o

ng

th

an

co

• However, the perception of sound by human beings is a
very complex process. It involves three
systems:
- the source which emits sound;
- the medium through which the sound
propagates;
- the detector which receives and interprets the
sound.
3

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

Email:
C9-411 Dai Co Viet str. 1, Hanoi

th

an

co

ng

.c
om

• Sounds we heard everyday are very complex.
Every sound is comprised of waves of many
different frequencies and shapes. But the
simplest sound we can hear is a sine wave.

cu

u

du
o

ng

• Sound waves can be characterised by the
following attributes:

Period, Frequency, Amplitude, Bandwidth,
Pitch, Loudness, Dynamic.
4

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

Email:
C9-411 Dai Co Viet str. 1, Hanoi

Pitch and Frequency

th

an

co

ng

.c
om

• Period is the interval at which a periodic signal repeats
regularly.
• Pitch is a perception of sound by human beings. It

measures how ‘high’ is the sound as it is perceived by a
listener.
• Frequency measures a physical property of a wave. It is
the reciprocal value of period f = 1/P .
The unit is Herts (Hz) or kiloHertz (kHz).

u

du
o

ng

Infra-sound 0 – 20 Hz
Human hearing range 20 – 20 kHz
Ultrasound 20 kHz – 1 GHz
Hypersound 1 GHz – 10 THz

cu

•
•
•
•

• Musical instruments are tuned to produce a set of fixed
pitches.
5

CuuDuongThanCong.com

du
o

ng

th

an

pitch

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

Email:
C9-411 Dai Co Viet str. 1, Hanoi

Loudness and Amplitude

.c
om

• The other important perceptual quality is loudness or volume.

co

ng

• Amplitude is the measure of sound levels. For a digital sound,
amplitude is the sample value.

cu

u

du
o

ng

th

an

• The reason that sounds have different loudness is that they carry
different amount of power.
• The unit of power is watt. The intensity of sound is the amount
of power transmitted through an area of 1m2 oriented
perpendicular to the propagation direction of the sound.
• If the intensity of a sound is 1watt/m2, we may start feel the
sound. The ear may be damaged.

7

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

Email:
C9-411 Dai Co Viet str. 1, Hanoi

co

ng

.c
om

• This is known as the threshold of feeling. If the
intensity is 10-12watt/m2, we may just be able
to hear it. This is know as the threshold of hearing.

cu

u

du
o

ng

th

an

• The relative intensity of two different sounds is
measured using the unit Bel or more commonly
deciBel (dB). It is defined by relative intensity in
dB = 10 log(I2/I1)
• Very often, we will compare a sound with the threshold
of hearing.
8

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

u

du
o

ng

th

an

co

ng

.c
om

160 dB Jet engine
130 dB Large orchestra at fortissimo
100 dB Car on highway
70 dB Voice conversation
50 dB Quiet residential areas
30 dB Very soft whisper
20 dB Sound studio

cu

•
•
•
•
•
•
•

Email:
C9-411 Dai Co Viet str. 1, Hanoi

9

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

Email:
C9-411 Dai Co Viet str. 1, Hanoi

.c
om

Dynamic and Bandwidth

ng

th

an

co

ng

• Dynamic range means the change in sound levels.
• For example, a large orchestra can reach 130dB at
its climax and drop to as low as 30dB at its softest,
giving a range of 100dB.
• Bandwidth is the range of frequencies a device can
produce, or a human can hear

cu

u

du
o

e.g. FM radio:
Children’s ears:
Older ears:

50Hz – 15kHz
20Hz – 20kHz
50Hz – 10kHz

.

10

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

Email:
C9-411 Dai Co Viet str. 1, Hanoi

.c
om

Computer Representation of Sound

cu

u

du
o

ng

th

an

co

ng

• Sound waves are continuous while computers are
good at handling discrete numbers.
• In order to store a sound wave in a computer,
samples of the wave are taken.
• Each sample is represented by a number, the ‘code’.
• This process is known as digitisation.
• This method of digitising sound is know as pulse
code modulation (PCM).

11

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

Email:
C9-411 Dai Co Viet str. 1, Hanoi

Piano

Pan flute

cu

u

du
o

ng

th

an

co

ng

.c
om

Example waveforms

Snare drum

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

Email:
C9-411 Dai Co Viet str. 1, Hanoi

.c
om

Capture and playback
of digital audio

Digital to
Analogue
Converter

th

an

co

ng

Air pressure
variations

DAC

Analogue
to Digital
Converter

cu

u

du
o

ng

Captured via
microphone

Converts
back into
voltage

ADC

CuuDuongThanCong.com

Signal is
converted into
binary
(discrete form)
0101001101
0110101111

Air pressure
variations

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

Email:
C9-411 Dai Co Viet str. 1, Hanoi

.c
om

The Analogue to Digital

Converter (ADC)

co

ng

 An ADC is a device that converts analogue signals into digital
signals

an

 An analogue signal is a continuous value

th

It can have any single value on an infinite scale

du
o

ng



 A digital signal is a discrete value

u

It has a finite value (usually an integer)

cu



 An ADC is synchronised to some clock

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

Email:
C9-411 Dai Co Viet str. 1, Hanoi

.c
om

The Analogue to Digital
Converter (ADC)

th

an

co

ng

It will monitor the continuous analogue signal
at a set rate and convert what it sees into a
discrete value at that specific moment in time

cu

u

du
o

ng

The process to convert the analogue to digital
sound is called Sampling. Use PCM (Pulse Code
Modulation)

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

cu

u

du
o

ng

th

an

co

ng

.c
om

Digital sampling
Sampling frequency

Email:
C9-411 Dai Co Viet str. 1, Hanoi

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

Sampling

ng

du
o

u

cu



ng

th



co



Frequency of sampling
Measure in Hertz
The higher sampling rate, higher quality sound but
size storage is big.
Standard Sampling rate:
- 44.1 KHz for CD Audio
- 22.05 KHz
- 11.025 KHz for spoken
- 5.1025 KHz for audio effect

an



.c
om

Two parameters:
Sampling Rate

Email:
C9-411 Dai Co Viet str. 1, Hanoi

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

Sampling

Email:
C9-411 Dai Co Viet str. 1, Hanoi

th

ng

du

o



u



8 bits (256 different values)
16 bits (65536 different values)
A higher resolution will give higher quality but will
require more memory (or disk storage)

cu



an

co

ng

.c
om

Size sample
The resolution of a sample is the number of
bits it uses to store a given amplitude value,
e.g.

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

Quantisation

Email:
C9-411 Dai Co Viet str. 1, Hanoi

co

ng

.c
om

 Samples are usually represented the audio sample as a
integers(discrete number) or digital

cu

u

du
o

ng

th

an

15

0

Sample points
CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

.c
om

Calculating the size
of digital audio

Email:
C9-411 Dai Co Viet str. 1, Hanoi

ng

 The formula is as follows:

du
o

u



The answer will be in bytes
Where:

sampling rate is in Hz

Duration/time is in seconds

resolution is in bits (1 for 8 bits, 2 for 16 bits)

number of channels = 1 for mono, 2 for stereo, etc.

cu



ng

th

an

co

rate  duration  resolution  number of channels
8

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

.c
om

Calculating the size
of digital audio

Email:
C9-411 Dai Co Viet str. 1, Hanoi

co

ng

 Example:
Calculate the file size for 1 minute, 44.1 KHz, 16 bits, stereo
sound

u

du
o

Where:

sampling rate is 44,100 Hz

Duration/time is 60 seconds

resolution is 16 bits

number of channels for stereo is 2

cu



ng

th

an

rate  duration  resolution  number of channels
8

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

.c
om

Calculating the size
of digital audio

Email:
C9-411 Dai Co Viet str. 1, Hanoi

th

an

co

ng

rate  duration  resolution  number of channels
8

8

cu

u

du

o

ng

44100 * 60 * 16 *2

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

Email:
C9-411 Dai Co Viet str. 1, Hanoi

Simple Audio Compression Methods

.c
om

• Silence Compression - detect the "silence", similar to run-length
coding
• Adaptive Differential Pulse Code Modulation (ADPCM) e.g., in CCITT
G.721 -- 16 or 32 Kbits/sec.

du
o

ng

th

an

co

ng

– Encode the difference between two or more consecutive signals; the
difference is then quantized --> hence the loss
– Adaptive quantization
– It is necessary to predict where the waveform is headed
– Apple has proprietary scheme called ACE/MACE. A Lossy scheme that
tries to predict where wave will go in next sample. Gives about 2:1
compression.

cu

u

• Linear Predictive Coding (LPC) fits signal to speech model and then
transmits parameters of model. It sounds like a computer talking, 2.4
kbits/sec.
• Code Excited Linear Predictor (CELP) does LPC, but also transmits
error term --> audio conferencing quality at 4.8 kbits/sec.

23

CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

th

an

co

ng

.c
om

LPC

Email:
C9-411 Dai Co Viet str. 1, Hanoi

cu

u

du
o

ng

•First, the observation of input and output sequences produces
a model, with a number of poles, or formants .
•A resulting set of coefficients can then describe the behaviour of a
system which is not known yet. It is used for predicting a sample.
This set of coefficients is an all-pole model, a simplified version of
the acoustic model of the speech production system.
•The analysis then estimates the values of a discrete-time signal as
a linear function of previous samples. The spectral envelope is
represented in a compressed form, using the information of
the linear predictive model.
24
CuuDuongThanCong.com

/>

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group
Faculty of Elec. and Telecom, Hanoi University of Science and Technology

Email:
C9-411 Dai Co Viet str. 1, Hanoi

Psychoacoustic Model

Human hearing and voice

co

• Low frequencies are vowels and bass
• High frequencies are consonants

ng

.c
om

– Frequency range is about 20 Hz to 20 kHz, most sensitive at 1 to 5 KHz.
– Dynamic range (quietest to loudest) is about 96 dB
– Normal voice range is about 500 Hz to 2 kHz

th

an

How sensitive is human hearing?
To answer this question we look at the following concepts:
– Threshold of hearing

du
o

– Frequency Masking

ng

Describes the notion of “quietness”

u

A component (at a particular frequency) masks components at neighboring

frequencies. Such masking may be partial.

cu

– Temporal Masking

When two tones (samples) are played closed together in time, one can mask the
other.

25

CuuDuongThanCong.com

/>

Slide đa phương tiện chương 1 audio

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về