Tải bản đầy đủ (.pdf) (6 trang)

An overview of acoustic side channel attack

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.64 MB, 6 trang )

ISSN:2249-5789

Deepa G M et al , International Journal of Computer Science & Communication Networks,Vol 3(1), 15-20

An Overview of Acoustic Side-Channel Attack
Deepa G.M., G. SriTeja & Professor S. Venkateswarlu
Department of Computer Science & Engineering
KL University, Andhra Pradesh, India

Abstract
Need for security is an important aspect in protecting
private information. Side channel attacks are a
recent class of attacks that is very powerful in
practice. Most side-channel attack research has
focused on electromagnetic emanations (TEMPEST)
and power consumption however; one of the oldest
eavesdropping channels is acoustic emanations. This
paper focuses on acoustic side-channel attack and
surveys the methods and techniques employed in the
attack; we will also see some of the different devices
which can be under the threat of such attack and
finally countermeasures against this attack, which
helps in reducing the risk.

1. Introduction

Invasive attack: this attack involves tampering
device to get direct access to internal components.
Semi-invasive attack: this kind of attack involves
access to device but without making any direct
contact with device for example, fault-based attack.


Non-invasive attack: this attack involves close
observation on externally available information
which is often unintentionally leaked. Examples like
electromagnetic attack, power attack and acoustics
attack.
Visible Light

Power consumption

ExecutionTime

Embedded Cryptographic
device

Security is the main concern of privacy and it is as
strong as weakest link. We live in a world in which
all the sensitive data is controlled and distributed
using computer system. There has been put much
effort in protecting this information with a wide array
of cryptographic schemes, protocols and security
systems, but there are still many concerns for systems
in which the physical implementations can be
accessed. For example, system like ATM’s which we
use in everyday life is vulnerable to implementation
attacks through their cryptographic protocols.
Physical attacks on cryptographic embedded
devices take advantage of implementation-specific
characteristics to recover the secret parameters
involved in the computation. The side-channel
attacks are a class of such physical attacks in which

an attacker tries to exploit physical information
leakages from those devices. Different side-channel
leakages can be- power, electromagnetic radiation,
sound/acoustic emanation, light emanation etc.,
shown in fig 1.1. [1] These leakages can be classified
into invasive, semi-invasive and non-invasive attacks.

Electromagnetic radiation

Faulty Output
Sound Emanation

Fig 1.1 Different leakages of cryptographic device

1.1 Known Side-channel Attacks
The most popular side-channel attacks known are as
follow:
Timing Attack: A timing attack is actually a way
of obtaining some user's private information by
carefully measuring the time it takes the user to carry
out cryptographic operations. The objective of this
attack is very simple: to exploit the timing variance in
the operation.
Fault Attack: Fault attacks present practical and
effective attacking against the cryptographic

15


ISSN:2249-5789


Deepa G M et al , International Journal of Computer Science & Communication Networks,Vol 3(1), 15-20

hardware devices such as smart cards. This attack is
usually performed on cryptographic device. Here the
attacker intentionally induce fault in the device to
know the crypto operation and to retrieve secret key
to some extent.Countermeasures like protecting
device from faults, avoiding them perform repeated
operation.
Power Attack: Principle of power attack is that,
the power consumption of a cryptographic device
may provide much information about the operations
that are taking place and parameters involved.
Electromagnetic attack: The components of
electrical devices usually emit some electromagnetic
radiation while operating. EM attack is a noninvasive attack, where electromagnetic radiations
emitted by such device can be used to analyze the
device’s internal operation.
Acoustic Attack: Emanations produced by
electronic devices have been a source for attacks on
the security of systems and acoustic emanation is one
of such kind. This attack is based on the assumption
that the sound of clicks can differ slightly from key to
key and analyzing this difference, we can guess the
text that’s being produced. We will discuss this in
detail in further sections.
This paper outlines the concept of acoustic sidechannel attack. Section 2 gives brief idea about
acoustic attack. Section 3 describes technical
requirements needed to carry out this attack and

section 4 lists the devices which can be attackedusing
acoustic emanation and finally ideas to
countermeasure them.

2. Acoustic Attack
Audio emanated by a device serves as a potential
vulnerability for side channel attacks. The proposed
acoustic attack is mainly based on the hypothesis that
the sound produced by the keys might differ slightly
from key to key, although the clicks of different keys
sound similar to the human. Figure 2.1 shows the
overview of acoustic attack. It has two main phases
called training phase and recognition phase.
Training phase: In this phase, a sequence of
words from a dictionary is tested for their
characteristic sound features and stored in a database.
For obtaining the best results, the setting should be
close to the setting in which the actual attack is
carried out.

The main steps of the training phase are as
follows:
1.
Feature extraction: This technique of
feature extraction is taken from speech
recognition and music processing. The
most interesting features for printed
sounds occur above 20 kHz, and that a
logarithmic scale cannot be assumed for
them. We therefore split the recording

sample into single words based on the
intensity of the frequency band between
20 kHz and 48 kHz, and spread the filter
frequencies linearly over the frequency
range. We subsequently use digital filter
banks
to
perform
sub-band
decomposition on each word [3]. Subband decomposition gives better results
than simple resolution. The output of
sub-band decomposition is smoothed to
make it more robust to environmental
noise. The extracted features are stored
in a database which is further used in
processing.
2.
2. Computation of language model:
To solve the next phase, we will
complement acoustic information with
information about the occurrence
likelihood of words in their linguistic
context (e.g., the sequence “such as the”
is much more likely than “such of the”).
More specifically, we estimate for each
word in our lexicon n-gram probabilities,
i.e., the likelihood that the word occurs
after a sequence of n − 1 given words.
These probabilities make up a
(statistical) language model. Probabilities

are computed based on frequency counts
of n-place sequences (n-grams) from a
corpus of text documents. We need to
extract
these
frequencies
from
asufficiently large corpus, which makes
up the second step of the training phase.
Recognition Phase: this phase uses the
characteristic features of the trained words to
recognize new sound recordings of printed text,
supported by suitable language-correction
techniques. The main steps are as follows:
1.

Select candidate words: We start by
extracting features of target text, as
shown in the first step of the training
phase. Let us compare the features of
recorded attacked data and the
characteristics of words from
database.If the features extracted
from different recordings of the same

16


ISSN:2249-5789


Deepa G M et al , International Journal of Computer Science & Communication Networks,Vol 3(1), 15-20
Data-base

word are always identical then one
would obtain a unique

Acoustic feature extraction

Acoustic feature extraction
Training data

Features
Training Phase

Acoustic feature extraction
Attack data

Features

Select candidate
words

Acoustic feature extraction
Words

Ordered words

Recognition Phase

Figure 2.1 overview of acoustic attack


2.

correspondence between trained
features and target features. However,
measurement
variations,
environmental noise, etc. show that
this is not the case. Multiple
recordings of the same word
sometimes yield different features;
for example, printing the same word
at different places in the document
results
in
different
acoustic
emanations conversely, recordings of
words that differ significantly in their
spelling might yield almost identical
soundfeature. Let the selected trained
word be a random variable
conditioned on the printed word, i.e.,
every trained word will be a
candidate with a certain probability.
Using sufficiently good feature
extraction and distance computations
between
two
features,

the
probabilities of one or a few such
trained words will dominate for each
printed word. The output of the first
recognition step is a list of most
likely candidates, given the acoustic
features of the target word.
Language-based reordering to
reduce word error rate: finally we try
to find the most likely of printed
words. Although always randomly
picking the most likely word based on
the acoustic signal might already
yield a suitable recognition quality,
technologies like Hidden Markov
Model (HMM), in particular language
models and the Viterbi algorithm,
which is regularly used in speech
recognition, to determine the most
likely sequence of printed words.
Intuitively, this technology works
well for us because most errors that

we encounter in the recognition phase
are due to incorrectly recognized
words that do not fit thecontext; by
making use of linguistic knowledge
about likely sound selected, and
unlikely sequences of words, we have
a good chance of detecting correcting

such errors. [4] The use of HMM
technology yields accuracy rates of
70 % on average for words for the
general-purpose corpus, and up to 95
% for the domain-specific corpus.

3.Technical Requirements for Acoustic
Attack
In this section we’ll see what the technical
requirement needs are in acoustic attack.

3.1 Analyzing Audio Frequency
Choosing correct features of keystroke is critical in
differentiating between the keys. Such features
should be consistent for individual keystrokes; it
should appear each time a given key is pressed, it
should also be unique and they should vary fromkey
to key. Experiment work in [6] shows that best
features for speech recognition are in the frequency
domain, not the time domain. Actually, the difference
between frequencies responses of different key
pressed comes from the physical location of keys on
the keyboard.
Now to compare frequency responses of different
keys, we’ll follow variety of ways to compare signals
[6]:
i.

Sum of squared differences. Given two
arrays of FFT'd keystrokes, the

difference of each corresponding FFT
value is squared and added to a
cumulative sum. Lower sums meant a
better match.

17


ISSN:2249-5789

Deepa G M et al , International Journal of Computer Science & Communication Networks,Vol 3(1), 15-20

ii.

Peak alignment. Given two arrays and
one of various peak detection schemes,
the arrays would be aligned at the peaks,
and then sum of squared differences is
performed. The goal is to minimize
needless error resulting from skew.
Sliding. One array was slid over the other
array, and for each slide thesum of
squared differences is taken.
Convolution. The two arrays are
convolved, and the index of the
maximum value is then used as an offset
to and where the two arrays best lined
up. This is nearly a repeat of the peak
alignment method, but with greater
mathematical basis for believing that this

method results in a logical comparison of
signals.
Compare. The compare method is used
to compare unknown key presses to our
known training data. The lowest result
from the compare method with a known
key will show that the unknown key is
the same as that known key.

iii.
iv.

v.

3.2 Triangulation Method
Triangulation method is used for knowing the
position of the key. Given a source of sound i.e. key
press (K5) and two distinct recording devices i.e.
microphones (M1 & M2) in different positions as
shown in figure 3.1, we can measure the difference in
the time it takes for the sound to reach each
microphones. Assuming a goodpositioning of the
microphones, each key will produce a unique
difference in the time to arrival (TTA) in the two
microphones. This difference in TTA will be
proportional to the difference in distances of the key
to each microphone. We can exploit this fact in order
to guess which keys are pressed simply by listening
for them.


3.3 Processing Technology – HMM
This section describes technique based on language
models to further improve thequality of
reconstruction. This technique helps to improve the
word recognition rate.
3.3.1 Hidden Markov models (HMMs)
HMMs are graphical models for recovering a
sequence of random variables which cannot be
observed directly from a sequence of given variables.
The random variables are modeled as hidden states,
the output variables as observed states. HMMshave
been used for many tasks that deal with language
processing such as speech recognition [7, 8, and 9],
handwriting recognition [11] or part-of-speech
tagging [10, 12].
Formally, an HMM of order d is defined by a
five-tuple (Q, O, A, B, I) where Q = (q1, q2... qN) is
the set of (hidden) states, O = (o1, o2... oM) is the set
of observations, A = Qd+1 is the matrix of state
transition probabilities (i.e., the probability to reach
state qd+1 when being in state qd with history q1, . . . ,
qd−1), B = Q × O are the emission probabilities (i.e.,
the probability of observing a specific output oi when
being in state qj ), and I = Qd is the set of initial
probabilities (i.e., the probability of starting in state
qi). Figure 3.2 shows a graphical representation of an
HMM, where white circles represent hidden states
and grey circles represent observed states.
q1


b11

a12

q2

a23

aN-1N

eNM

b22

o1

qN

o2

oM

Figure 3.2. HMM

K5

M1
K5

Figure 3.1 Triangulation Method


M2

We use HMMs in two phases: training phase and
recognition phase. For training phase, the initial
probabilities, which model the probability of starting
in a given state, and the transition probabilities,
which model the likelihood of different words
following each other in an English text, can be
obtained by building a language model from a large
text corpus. To address the second phase,
determining the most likely sequence of hidden states
(i.e., recorded text) we can use the Viterbi algorithm
[13].

18


ISSN:2249-5789

Deepa G M et al , International Journal of Computer Science & Communication Networks,Vol 3(1), 15-20

Language Building Model: A language model of
size n assigns a probability to each sequence of n
words. The probability distribution can be estimated
by computing the frequencies of all n-grams from a
large text corpus. n-grams words with probability 0
will never be selected by the Viterbi algorithm; we
smooth the probabilities by assigning a small
probability to each unseen n-gram. The length of an

n-gram determines how many words of context are
taken into account by the language model. Higher
values for n can lead to better models but also require
exponentially larger corpora for an accurate
estimation of the n-gram probabilities. The higher the
value of n, the larger the likelihood that some ngrams never appear in the corpus, even though they
are valid word sequences and thus may still appear in
the text.
Reordering words based on obtained language
model: Having built thelanguage model, we can
reorder the candidate words using the model to select
the most likely word sequence [4].This task is
addressed by the Viterbi algorithm [13], which takes
as input an HMM (Q, O, A, B, I) of order d and a
sequence of observations a1, . . . , aT ϵ OT . Its state
consists of ψ = T ×Qd. First, the d-th step is
initialized (the earlier are unused) according to the
initial distribution, weighted with the observations:

Ψd,i1,..id= Ii1,..id

𝑘=1,..𝑑 Bik,ak ∀ 1≤ i,j ≤ N

such attacks are mainly for capturing login detail,
passwords and other secret information recovery.
An obvious idea for countermeasuring acoustic
attack is a silent keyboard, which do not produce
more sound. It can be a keyboard made of rubber or
touchpad [13], or a keyboard based on a touchscreen
or touchstream technologies [15]. Nowadays, virtual

keyboards have appeared that can be projected on a
flat surface [16] and printers with acoustic shielding
foam which minimize sound of keys pressed. These
choices are more expensive than the standard
mechanical keyboard. Typing on a standard keyboard
is much comfortable than typing on a touchscreen or
a rubber keyboard.
The above mentioned ways are useful in avoiding
emanation of sound from devices. But there are also
some other methods by which we can prevent this
attacks to take place. They are: Distance- the
recognition rate drops substantially if the distance
between the device and the microphone is increased.
Obstacle- any obstacles between thedevice and
microphone can prevent the sound reaching the
recording device (microphone). Avoiding contact
with microphone: the absence of microphones near
emanation device is sufficient to protect privacy.

5. Conclusion

In the recursion, for increasing indices s, the
maximum of all previous values is taken:

Ψs,i1,..id= Bid,as max𝑖0𝜖𝑄 (Ai0,i1,..id ψs-1,i0,..id-1)
∀s>d, 1≤i,j≤N
The sequence of hidden states finally can be obtained
by back tracking the indices that contributed to the
maximum value in the recursion step.


This paper describes the overview of acoustic sidechannel attack and provides different techniques like
HMM (Hidden Markov Model), triangulation method
reordering words using Viterbi algorithm to
recognize the data that is been recorded. At last, some
of the countermeasures to avoid and overcome the
attack.

Reference
4.Devices under
Countermeasures

Threat

and

its

Secret information leakage caused by emanations
from electronic devices has been a topic of concern
for a long time. Emanations such as sound produced
by electronic devices can be from different sources.
[5] Sound as a wave carries information in the form
of frequency, wavelength and amplitude which can
bemeasured by audio capturing device like
microphone. The powerful acoustic attacks sources
have been keyboard, keypad of ATM machine and
key strokes of printer machine and application of

[1] Side-Channel Attacks: Ten Years after Its Publication
and the Impacts on Cryptographic Module Security Testing

YongBin Zhou, DengGuo Feng State Key Laboratory of
Information Security, Institute of Software, Chinese
Academy of Sciences, Beijing, 100080, China.
[2] Power analysis attack Countermeasures and their
weaknesses. Thomas S. Messerges, Ph.D., Security
Technology Research Laboratory Motorola Labs,
Motorola.
[3] Meinard M¨uller. Information Retrieval for Music and
Motion. Springer, 2007.

19


ISSN:2249-5789

Deepa G M et al , International Journal of Computer Science & Communication Networks,Vol 3(1), 15-20

[4] Acoustic Side-Channel Attacks on Printers. Michael
Backes, Markus D¨urmuth1, Sebastian Gerling1, Manfred
Pinkal3, Caroline Sporleder, Saarland University,
Computer Science Department, Saarbr¨ucken, Germany.
Saarland University, Computer Linguistics Department,
Saarbr¨ucken, Germany.
[5] side-channels, compromising emanations and
surveillance: current and future technologies. Richard
Frankland.
[6] Dmitri Asimov and Rakesh Agarwal, “Keyboard
Acoustic Emnations", IBM.
[7] Lawrence R. Rabiner. “A tutorial on hidden markov
models and selected applications in speech recognition.”

[8] Biing-Hwang Juang and Lawrence R. Rabiner. “Hidden
markov models for speech recognition.”
[9] Frederick Jelinek. “Statistical Models for Speech
Recognition”. MIT Press.

[10] Kenneth W. Church. “A stochastic parts program and
noun phrase parser for unrestricted text”.
[11]R. Nag, Kin HongWong, and Frank Fallside. Script
recognition using HiddenMarkovModels.
[12] Steven DeRose. Grammatical category disambiguation
by statistical optimization. Computational Linguistics.
[13]Hidden
Marckov
Models
/>ts/HiddenMarkovModels.html
[14]
The
virtually
indestructible
/>
keyboard.

[15]
TouchStream
/>
keyboards.

[16]
Canesta
/>

keyboards.

20



×