Tải bản đầy đủ (.doc) (63 trang)

Đánh lừa các hệ thống nhận diện deepfake bằng nhận dạng giả thông qua biến đổi ngữ nghĩa ảnh

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.51 MB, 63 trang )

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

MASTER THESIS
Fooling Deepfake detectors with fake personas using
semantic adversarial examples

NGUYEN HONG NGOC

School of Information and Communication Technology

Supervisor:

Assoc. Prof. Huynh Thi Thanh Binh
Supervisor’s signature

Institution:

School of Information and Communication Technology

Co-supervisor: Prof. Yew Soon Ong
Institution:

Nanyang Technological University, Singapore

May 19, 2022


Graduation Thesis Assignment
Name: Nguyen Hong Ngoc
Phone: +84947265498
Email: ;


Class: 20BKHDL-E
Affiliation: Hanoi University of Science and Technology
Nguyen Hong Ngoc - hereby warrants that the work and presentation in this thesis
were performed by myself under the supervision of Assoc. Prof. Huynh Thi Thanh Binh
and Prof. Yew Soon Ong. All the results presented in this thesis are truthful and are not
copied from any other works. All references in this thesis including images, tables,
figures, and quotes are clearly and fully documented in the bibliography. I will take full
responsibility for even one copy that violates school regulations.

Student

Signature and name

Nguyen Hong Ngoc


Acknowledgement
This Master thesis would not have been possible without the support of many
people. First of all, I would like to acknowledge and give my warmest thanks to
my supervisor, Assoc. Prof. Huynh Thi Thanh Binh, who has given me a lot of
motivation to complete this work.
I also thank Prof. Yew Soon Ong, Doctor Alvin Chan, and especially Doctor Nguyen Thi
My Binh, for being wonderful mentors and for all the support, I could not have made it
without your help and guidance. I would also like to thank my committee members for
your thoughtful comments and suggestion to complete this thesis.
I would also like to give a special thanks to my wife Pham Diem Ngoc and my family as
a whole for their mental support during my thesis writing process, you truly mean the
world to me. Furthermore, in the absence of my friends, Vinh Tong, Quang Thang,
Minh Tam, Thanh Dat, and Trung Vu, I could hardly melt away all the tension from my
work. Thanks for always accompanying me through ups and downs.


Finally, this work was funded by Vingroup and supported by Vingroup
Innovation Foun-dation (VINIF) under project code VINIF.2020.ThS.BK.06. I
enormously appreciate all the financial support from Vingroup, allowing me to
stay focused on my research without worrying about my financial burden.


Abstract
Recent advances in deep generative modeling techniques such as Generative Adversarial
Networks (GANs) can synthesize high-quality media content (including images, videos, and
sounds). This content, collectively known as deepfake, can be really difficult to dis-tinguish
from real ones due to their extremely realistic looks and high resolution. The initial purpose
of synthesizing media content is to provide more examples for training deep models, thus,
improving the performance and robustness of these models. However, nowadays,
deepfakes are also being abused for many cybercrimes such as fake personas, online
frauds, misinformation, or producing media featuring people without their consent.
Deepfake has become an emerging threat to human life in the age of social networks.

To fight against and prevent these aforementioned deepfake abuses, forensic systems
with the ability to detect synthetic content, have recently been exclusively studied by
the re-search community. At the same time, anti-forensic deepfakes are being
investigated to understand the gaps in these detection systems and pave the way for
improvement. In the scope of this Master thesis, I investigate the threat of anti-forensic
fake personas with the use of semantic adversarial examples, where a fraudster
creates a fake personal profile from multiple anti-forensic deepfakes portraying a single
identity. To comprehensively study this threat model, three approaches that an attacker
may use to conduct such attacks are considered, encompassing both white- and blackbox scenarios. A range of defense strategies is then proposed with the aim to improve
the robustness of current forensic sys-tems against such threats. Experiments show
that while the attacks can bypass current detection, the proposed defense approaches
that consider the multi-image nature of a fake persona can effectively mitigate this

threat by lowering the attack success rate. The re-sult of this thesis can help strengthen
the defense in the fight against many cybercrimes utilizing deepfakes.

Student

Signature and Name

Nguyen Hong Ngoc


TABLE OF CONTENTS

CHAPTER 1. INTRODUCTION ................................................................
1
1.1
Deepfake ............................................................................................... 1
1.2

Applications of deepfake .......................................................................... 3
1.2.1

Image editing............................................................................... 3

1.2.2 Digital cinematic actors ................................................................. 4
1.2.3 Generating training examples .......................................................... 5
1.3

Deepfake abuses ..................................................................................... 6
1.3.1


Disinformation ............................................................................. 6

1.3.2 Fake personas/identities ................................................................. 7
1.4 Forensic and anti-forensic deepfake ............................................................ 8
1.5 Research challenge: Anti-forensic deepfake personas..................................... 10
1.6

Motivations............................................................................................ 11

1.7

Thesis methodology................................................................................. 11

1.8

Contributions ......................................................................................... 12

1.9

Thesis organization.................................................................................. 13

CHAPTER 2. BACKGROUND ...................................................................

14

2.1 Deepfake generators ................................................................................ 14
2.1.1

Autoencoder ................................................................................ 14


2.1.2 Generative Adversarial Networks..................................................... 15
2.2

Semantic modification for GAN................................................................. 16

2.3 Deepfake forensic systems ........................................................................ 17


2.4 Attacks to deepfake forensic systems .......................................................... 18
2.4.1 Spatial transformations .................................................................. 18
2.4.2 Pixel-level adversarial examples ...................................................... 19
2.4.3 Semantic adversarial examples ........................................................ 19
CHAPTER 3. ANTI-FORENSIC FAKE PERSONA ATTACK........................

21

3.1

Problem modeling ................................................................................... 21

3.2

White-box approaches.............................................................................. 21
3.2.1 Two-phases approach .................................................................... 22
3.2.2 Semantic Aligned Gradient Descent approach .................................... 23

3.3

Black-box approach................................................................................. 25
3.3.1 Introduction to Evolutionary Algorithms ........................................... 25

3.3.2 Semantic Aligned Evolutionary Algorithm ........................................ 26

CHAPTER 4. DEFENSES AGAINST ANTI-FORENSIC FAKE PERSONAS .. 30
4.1 Defense against Single-image Semantic Attack task....................................... 30
4.2 Defenses against anti-forensic fake persona attack ......................................... 31
4.2.1 Naive Pooling defense ................................................................... 32
4.2.2 Feature Pooling defense ................................................................. 33
CHAPTER 5. EXPERIMENT RESULTS AND ANALYSIS ...........................
5.1

35

Experiment setup .................................................................................... 35
5.1.1

General setup............................................................................... 35

5.1.2 Hyper-parameters setting................................................................ 37
5.2 Single-image Sematic Attack task evaluation................................................ 37
5.2.1

Baseline...................................................................................... 38


5.2.2 Two-phases white-box approach evaluation........................................ 39
5.2.3 SA-GD white-box approach evaluation ............................................. 40
5.2.4 SA-EA black-box approach evaluation.............................................. 40
5.2.5 Comparison between the approaches for SiSA.................................... 41
5.2.6 Visual quality evaluation ................................................................ 42
5.2.7 Computational time evaluation ........................................................ 44

5.3 Anti-forensic fake persona attack evaluation................................................. 45
5.4

Discussions............................................................................................ 46
5.4.1 Visual quality trade-off between approaches ...................................... 46
5.4.2 Query-based defenses .................................................................... 47
5.4.3

Ethical discussions........................................................................ 47

CHAPTER 6. CONCLUSION AND FUTURE WORKS ................................
6.1

48

Contributions ......................................................................................... 48

6.2 Limitations and future works ..................................................................... 48


LIST OF FIGURES

1.1 Examples of deepfake images from website thispersondoesnot exist.com.
These images are generated from StyleGAN2 [2]. . . . . . . . . . . . . .
1.2 Barrack Obama deepfakes video created from a random source video. . .

1
2

1.3 The four types of face manipulation in deepfake. . . . . . . . . . . . . .


3

1.4 Popular Faceapp filters, utilizing deepfake technology to edit images in
various ways such as: older self, cartoon style, adding facial hair or swapping the gender. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 CGI in Rogue One movie to recreate young princess Leia, later improved

4

with deepfakes by fans. . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6 Deepfake video of Donald Trump aired by Fox affiliate KCPQ. . . . . . .

5
6

1.7 With the rising of deepfake technology, any social account could be fake.

7

1.8 Andrew Walz was, according to his Twitter account and webpage, running
for a congressional seat in Rhode Island. In reality, Mr. Walz does not
exist, and is the creation of a 17-year old high-school student. . . . . . . .
1.9 Original deepfake image is detected ‘fake’ by the forensic system. How-

8

ever, after adding specially crafted imperceptible adversarial perturbations, the deepfake image, even though looks the same, is detected ‘real’. .
1.10 Attacker bypasses forensic systems with seemingly legit fake persona pro-

9


file, created by semantically modifying certain attributes of one source
deepfake. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.1 Architecture of an autoencoder, includes an encoder and a decoder. . . . . 15
2.2 Architecture of a Generative Adversarial Network, includes a generator
and a discriminator [23]. . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.3 The modeling of a simple GAN based deepfake generator. The GAN
generator takes latent code z as input and output the deepfake image x. . . 16
2.4 Semantically modifying the attribute smile of a face image using the attribute vector Va = smile. The attribute vector is learned from the latent
space, using the method proposed in [24] . . . . . . . . . . . . . . . . .
17
2.5 Spatial transformation adversarial attack to CNN classifier. The classifier
fails to classify these images after simple rotation and translation. . . . .

18


2.6 The creating of pixel-level adversarial examples which uses gradient backpropagation to update the perturbations. The loss function here is the
19
prediction score Fd(x) of the detector. . . . . . . . . . . . . . . . . . . .
2.7 The creating of semantic adversarial examples based on gradient backpropagation. Different from adversarial examples, the gradient is backpropagated to update perturbation δ, which is added directly to the original latent code z. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
3.1 Two-phases approach illustration. Phase 1: Semantic modifying the orig′

inal deepfake x along the target attributes to create x = G(z + αVA).
Phase 2: Adding pixel-level adversarial perturbation σ to create the anti′
22
forensic deepfake x + σ. . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2 Gradient back-propagation step of Semantic Aligned Gradient Descent
approach, where a perturbation δ is added to the latent code z and updated by gradient descent. This step is similar to the semantic adversarial
example attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24

3.3 Example of semantic aligning the perturbation δ into δ , with the orthogonal threshold h⊥ and only one attribute vector Va is targeted. In the case of
two or more target attributes, the perturbation is projected onto the space
spanned from the target attribute vectors. . . . . . . . . . . . . . . . . .
24
3.4 An example of 1-point crossover in SA-EA. The first half of f and the
second half of m are concatenated to create offspring c. . . . . . . . . . . 27
3.5 An example of average crossover in SA-EA. Offspring c is created by
taking average of f and m. . . . . . . . . . . . . . . . . . . . . . . . . .
27
3.6 An example of random noise mutation in SA-EA. Chromosome c is mu′

tated to chromosome c by adding a noise uniformly sampled in range
∆.......................................
28
4.1 Retraining the deepfake detector with addition of semantic attack images . 30
4.2 Illustration of Naive Max-pooling defense, where m images of the profile are fed into the detector D to get m corresponding prediction scores.
Then, m prediction scores are fed through a max pooling layer to get the
overall score of the profile. . . . . . . . . . . . . . . . . . . . . . . . . .
32
4.3 Illustration of Feature Max-pooling Defense, where m images of the profile are fed into the cnn layer of the detector then into a max-pooling layer
to get the profile feature vector. Lastly, the profile feature vector is fed
into the f c layer to get the prediction. . . . . . . . . . . . . . . . . . . .
33



5.1 Two-phases white-box ASR: (a) against original detector with different
target attributes ; (b) against original and defense detector (average
value across target attributes). . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2 The ASR of SA-GD white-box: (a) against original detector with different target attributes ; (b) against original and defense detector (average
value across target attributes). . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3 The attack success rate of SA-EA black-box: (a) against original detector
with different target attributes ; (b) against original and defense detector
(average value across target attributes). . . . . . . . . . . . . . . . . . . . 41

5.4 The ASR of SA-GD white-box, SA-EA black-box and grid-search ap-proaches
giving the same h⊥ (average value across target attributes). . . . 42

5.5 FIDCelebA score (smaller is better) of each attack approach against the
original and defense detector. Red dash line shows the FID CelebA value of
StyleGAN-generated-images from the input latent codes. . . . . . . . . . 43

5.6 Two-phases approach: samples of inputs and corresponding outputs
with different target attributes (ϵ = 0.25). Inputs are predicted ‘fake’
while outputs are predicted ‘real’. . . . . . . . . . . . . . . . . . . . . . 43
5.7 SA-GD approach: samples of inputs and corresponding outputs with
different values of orthogonal threshold h⊥. Beside the target attribute
age, other attributes such as smile, pose, hairstyle and background are
sometimes changed, more often and more intense when the orthogonal
threshold h⊥ increases. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.8 The P-ASR of two phases approach: (a) against Naive Max-pooling
strat-egy with different ϵ, (b) Naive Max-pooling vs Feature Max-pooling strat-

egy where ϵ = 0.2. (m is the number of images in a profile) . . . . . . . . 45
5.9 Exaggerated examples of how larger perturbation affects the visual quality: two-phases approach generates noisier images while SA-GD/SA-EA
output non-target attributes changes. . . . . . . . . . . . . . . . . . . . . 46



LIST OF TABLES

5.1 Comparison on the accuracy (Acc.) and the average precision (AP) between the defense and the original detector, test sets are from [17] (nocrop evaluation). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38


CHAPTER 1. INTRODUCTION
In this chapter, deepfake technology together with its promising applications, as well as
its malicious abuses, are introduced. The concept of forensic deepfake systems and
anti-forensic deepfake examples are also presented. Lastly, the research challenges of
the thesis are raised and the motivations behind these challenges are discussed.

1.1 Deepfake
Originated from a Reddit user who shared synthetic fake pornography videos featuring
faces of celebrities, the term “deepfakes” refers to high-quality media content
generated from deep-learning generative techniques. Even though the term has only
been popular since 2019, the techniques of image manipulation were developed way
back in the 19

th

century and mostly applied to motion pictures. The technology is
th

steadily improved dur-ing the 20 century, and more quickly with the invention of digital
video [1]. Deepfake technology has been developed by researchers at academic
institutions, beginning in the 1990s, and later by amateurs in online communities. Over
the last few years, deepfake has drastically improved in generation quality due to the

advance in graphic computa-tional power and deep learning techniques.

Figure 1.1: Examples of deepfake images from website thispersondoesnot exist.com.

These images are generated from StyleGAN2 [2].
Nowadays, with the power of artificial intelligence and deep learning, the quality of deepfakes synthetic content is greatly enhanced to a remarkable realistic level. For instance, in
Figure 1.1, these two seemingly normal facial photos of two normal people turn out to be
deepfake images, which are taken from the website thispersondoesnotexist. com. True to
its name, these two people do not exist since these images are generated completely
random by computer, to be more specific, by a deep generative architecture called
StyleGAN2 [2]. Even if we examine these images carefully, it is nearly impossible
1


to tell any difference between these deepfake images and real ones. Not to mention that
the resolution of these deepfakes is also profoundly high with razor-sharp image quality.

Deepfake has gained a lot of attention in 2018 when Jordan Peele and BuzzFeed
cooper-ated to synthesize a fake PSA video delivered by Barrack Obama [3] utilizing
deepfake technology. From an arbitrary source video of a person giving a random
speech, deepfake can swap the face and the voice of the person with the face and
voice of Barrack Obama while the content of the speech is unchanged (Figure 1.2).
Even though the deepfake video was supposed to be for entertainment purposes, the
realism of its visual and audio content had made many wonders about the safety of the
technology and the possibility of abusing deepfake for cybercrimes.

Figure 1.2: Barrack Obama deepfakes video created from a random source video.
Deepfake comes in many forms, from the most common form of image [4]–[6] to video
[7]–[9] and even audio deepfakes [9], [10]. The Barrack Obama deepfake video mentioned above (Figure 1.2) is an example that combines all of these forms together.
Among the subjects of deepfakes, the most widely studied is human facial deepfake as it

could be greatly used for many applications. Within the field of human facial deepfake,
there are four common types of face manipulation techniques, which are (Figure 1.3) [11]:

• Entire face synthesis: Refers to the case where an entire facial image is generated/synthesized by computer techniques. The face image is synthesized from a
random seed and usually belongs to a non-existed person.
• Face identity swap: Deepfakes where a target facial image of a person is
swapped with a source facial image of another person. To be more specific, only
the face identity is swapped while other content in the image is unchanged.

• Facial attributes manipulation: Manipulation of a target facial image to
change certain attributes such as hairstyle, eyeglasses, or even age. For
instance, this ma-nipulation technique can semantically change a facial
image to create an older look of the person.
• Facial expression manipulation: Manipulation of a target facial image to
change the expression of the person such as smile, surprise, angry, etc.
2


Figure 1.3: The four types of face manipulation in deepfake.
Even though each type of face manipulation has its own application, in the
scope of this thesis, I exclusively study face synthesis techniques [11], in which
entire non-existent face images are generated.
1.2 Applications of deepfake
1.2.1 Image editing
One of the most well-known applications of deepfake technology is image editing.
Faceapp ( is a famous software that allows image editing
us-ing deepfake. Faceapp provides dozens of different filters that can be used on
users’ uploaded images to create various effects. These filters usually apply the
aforementioned facial attributes manipulation deepfake, targeting different attributes
that can semantically modify the image in the most realistic way. Figure 1.4

illustrates a few of the most popular filters in Faceapp, including:

• Older filter: creates an image of older self from the input image, allows
users to see what they may look like in the future.
• Genderswap filter: swaps the gender of the person in the input image, allowing
3


users to see what they look like in the opposite gender.
• Cartoon filter: creates a cartoon version of the input image.
• Add facial hair filter: adds facial hair to the input image.

Figure 1.4: Popular Faceapp filters, utilizing deepfake technology to edit images in various ways such as: older self, cartoon style, adding facial hair or swapping the gender.
Facial expression manipulation deepfake can also be used to edit images and videos. People may use expression manipulation to change the expression of a person in images or
videos as their desire. Furthermore, face identity swap deepfake can be used for image
editing, by allowing users to insert their face identity into the images of others.

Compared to traditional image processing techniques (e.g. tools such as OpenCV
and frameworks such as Photoshop), deepfake image editing has the advantage of
being fully automatic, since, with a well-trained deepfake generative model, the
input image through the generator is automatically transformed. In contrast, with
traditional techniques, each input must be manually handled, which often takes a
lot of time and effort. Not only that, deepfake can generate a very natural look to
the image, which with traditional techniques depends a lot on the skills of the editor.

1.2.2 Digital cinematic actors
As mentioned above, one of the biggest applications of deepfakes is for creating digital
actors in the cinematography industry. Although image manipulation appeared a long time
ago in form of computer graphic effects (CGI), more recent deepfakes technology is
promising to generate even better quality in a much shorter time and takes much less

effort. Deepfake technology has already been used by fans to insert faces into existing
films, such as the insertion of Harrison Ford’s young face onto Han Solo’s face in the
4


Figure 1.5: CGI in Rogue One movie to recreate young princess Leia, later
improved with deepfakes by fans.
movie Solo: A Star Wars Story, and similar techniques were used for the acting of
Princess Leia in the movie Rogue One [1]. As in Figure 1.5, CGI is used to recreate
young Princess Leia, which is based on the face expression scanned from another
actress using hundreds of motion sensors. With deepfake technology, instead of the
motion sensors to capture the facial expression, we only need reference videos, which
can either be the original video of the target (in this case, princess Leia from original
movies), or the source video where we want to replace the face of the target in. The
quality of deepfake videos is getting better every day, while the cost of time and
resources to synthesize deepfake is much less than the cost for CGI.

1.2.3 Generating training examples
One other important application of deepfake is to generate more examples for training
deep neural networks. As many of us may know, the capability of artificial neural networks, regardless of size, is highly dependent on the data on which the networks are
trained. If the data is too small or too biased, the performance of the networks in real
life may be significantly affected. For instance, a face recognition system, which is only
trained the data of facial images of young people, will perform poorly when recognizing
the faces of older people. An animal image classifier that is trained only with images of
black cats will likely not be able to correctly classify images of white cats. Since the era
of Artificial Intelligence (AI), biased data is always one of the biggest problems when
train-ing neural networks. The problem also extremely takes a lot of effort to solve,
because the only way to make biased data non-biased is to collect even more data to
improve the diversity of training samples. Collecting data is usually very timeconsuming and often costs a fortune to do.
Deepfake comes in handy as a promising answer to the biased-data problem without costing too many resources. With deepfake, people can easily generate new examples to

improve the diversity of the training dataset. For instance, in the above face recognition
5


system example, we can use deepfake to generate older versions of young
people’s im-ages in the dataset and use those deepfakes to train the model. In the
case where the deep learning system lacks training data, a deepfake generator can
be used to generate more data for training. This solution helps save a lot of time
and money for the system designers compared to manually collecting real data.

1.3 Deepfake abuses
In contrast to promising applications, deepfakes are mostly being abused for
many illegal activities and cybercrimes. Two most dangerous crimes can be
done with deepfakes are disinformation and fake personas/identities.
1.3.1 Disinformation
Deepfake’s remarkable performances in generating photo-realistic content involving faces
and humans have raised concerns about issues such as malicious use of fake media to
spread misinformation [12] and fabricating content of people without their consent [13].

With deepfakes, one can also spread fake news, and hoaxes targeting celebrities
which are backed up by convincingly high-quality images/videos. For instance, the
origin of deepfakes is from synthetic pornographic videos featuring the faces of
celebrities, which can be used to black-mailed or disrepute these people without
their consent. A report published in October 2019 by Dutch cyber-security startup
Deeptrace estimated that 96% of all deepfakes online were pornographic [1].

Figure 1.6: Deepfake video of Donald Trump aired by Fox affiliate KCPQ.
Some people can also use deepfake videos to misrepresent well-known politicians in
videos, targeting their rivals to achieve an advantage in politics. Some incidents have been
recorded in the past such as: in January 2019, Fox affiliate KCPQ aired a deep-fake video

of Donald Trump during his Oval Office address, mocking his appearance and
6


skin color [1] (Figure 1.6). In April 2020, the Belgian branch of Extinction Rebellion
published a deepfake video of Belgian Prime Minister Sophie Wilmes` on Facebook [1].

1.3.2 Fake personas/identities
Deepfake is also being abused a lot to create fake personas/identities and pretend
to be other people. For instance, someone with access to the technology may open
prod-uct/social accounts using the identities of others or even of non-existed people
with the intention to do cybercrimes such as scams and financial frauds. Criminals
can easily pre-tend to be other people online and do crimes without the
consequence of being tracked (see Figure 1.7 and 1.8). With the support of
deepfake, they can even generate photo-realistic ID card images to gain the trust of
others, thus, successfully scamming in online transactions.

Figure 1.7: With the rising of deepfake technology, any social account could be fake.

A famous example of online fake personas deepfake is the case of the Twitter account
Andrew Walz. According to this account, Andrew was a congressional candidate running
for office in Rhode Island, who called himself “Republican” with the tagline “Let’s make
changes in Washington together”. Walz’s Twitter account was complete with his picture and
a prized blue check-mark, showing that he had been verified by Twitter as one of the
accounts of congressional and gubernatorial candidates (Figure 1.8). Andrew Walz, however, was actually the creation of a 17-year-old high-school student. During his holiday
break, this student created a website and Twitter account for this fictional candidate [14].
The Twitter profile picture was downloaded from the website thispersondoesnotexist.com.

These are just a few of many abuses of deepfake, which are increasing in quantity
and quality every day. Even though deepfake has a lot of great applications, we

need to be more aware and cautious of deepfake’s potential threat.
7


Figure 1.8: Andrew Walz was, according to his Twitter account and webpage,
running for a congressional seat in Rhode Island. In reality, Mr. Walz does not
exist, and is the creation of a 17-year old high-school student.
1.4 Forensic and anti-forensic deepfake
Since the advent of deepfake abuses and cybercrimes, there is a wide array of defenses
proposed to mitigate this emerging threat and prevent the risk. These defenses, usually aim
to counter against deepfake by detecting and classifying the deepfake content among real
ones, also known as deepfake forensic/detection systems. In recent years, deep-fake
forensic systems are extensively studied and developed by the research communities.
Most forensic systems can be divided into two main groups:

• The first group of measures seeks to detect fake content based on high-level semantic features such as behavioral cues [13] like inconsistent blinking of the eyes

[15]. These methods have the advantage of fast validation of new
instances but are usually quickly outdated when the deepfake technology
improves over time. Today’s deepfake content has developed into a nearperfection quality and excep-tionally natural looks, which makes these
high-level features highly realistic and non-distinguishable.
• The second group of defenses is based on low-level features underneath
the image pixels by training a convolution neural network (CNN) to classify
images/videos as either fake or real [16]–[19]. These forensic detectors
normally achieve state-of-the-art performance due to the ability to
automatically learn feature extraction of CNN.
On the opposite side of forensic systems, we have anti-forensic deepfake - deepfake
examples that are specially crafted to bypass forensic systems, fooling these detectors
8



to detect synthetic content as real. These anti-forensic deepfakes, also called adversarial examples, are most commonly generated by using gradient back-propagation to add
imperceptible adversarial perturbations to the pixels of the original deepfake [12], [20].
Figure 1.9 illustrates an example of the adversarial example. Despite that to human
eyes, the deepfake image seems to remain unchanged after adding the perturbations,
forensic systems are still fooled and decide that the image is real. Many experiments
have shown that recent deepfake forensic systems are extremely vulnerable to
adversarial examples, revealing a big gap in current detection techniques [12].

Figure 1.9: Original deepfake image is detected ‘fake’ by the forensic system.
However, after adding specially crafted imperceptible adversarial perturbations,
the deepfake image, even though looks the same, is detected ‘real’.

Forensic versus anti-forensic deepfake has been back and forth for years. While
forensic systems get improved over time due to recent advanced classification
techniques and ex-tensive training, anti-forensic is also getting more effective with
new generative networks developed every day. Nonetheless, forensics (defenses)
and anti-forensic (attacks) are two sides of the problem. To make progress on one
side, researchers must have knowledge from both sides. For instance, knowledge
obtained from the attack methods can be used to understand the weakness of the
defenses, thus, proposing counter techniques to mitigate the attacks.
Understanding this relationship between forensic and anti-forensic, in the fight
against deepfake abuses, there are normally two main groups of approaches. The
first group is to explore and discover different types of attacks on the forensic
detectors, since it is crucial to be aware of possible attacks and prepare the
corresponding counter defenses against them. The second group is to propose
techniques that focus directly on improving the forensic systems, either it is to boost
the performance of the deepfake detectors in general or may simply gain
robustness against a certain type of attack. Either way, both groups of approaches
are equally important and must be conducted simultaneously for the best efficiency.

9



×