Tải bản đầy đủ (.pdf) (80 trang)

Authentication via deep learning facial recognition with and without mask and timekeeping implementation at working spaces

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.36 MB, 80 trang )

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY

LÊ ĐỨC HUY

AUTHENTICATION VIA DEEP LEARNING FACIAL
RECOGNITION WITH AND WITHOUT MASK AND
TIMEKEEPING IMPLEMENTATION AT WORKING
SPACES

Major: Computer Science
Major code: 8480101

MASTER’S THESIS

HO CHI MINH CITY, July 2023


VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY

LÊ ĐỨC HUY

AUTHENTICATION VIA DEEP LEARNING FACIAL
RECOGNITION WITH AND WITHOUT MASK AND
TIMEKEEPING IMPLEMENTATION AT WORKING
SPACES

Major: Computer Science
Major code: 8480101


MASTER’S THESIS

HO CHI MINH CITY, July 2023


THIS THESIS IS COMPLETED AT
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY – VNU-HCM

Supervisors:
Assoc. Prof. Quản Thành Thơ.
Dr. Nguyễn Tiến Thịnh.

Examiner 1: Assoc. Prof. Đỗ Văn Nhơn.

Examiner 2: Assoc. Prof. Bùi Hoài Thắng.

This master’s thesis is defended at HCM City University of Technology,
VNU- HCM City on July 10th, 2023.

Master’s Thesis Committee:
1. Chairman: Assoc. Prof. Phạm Trần Vũ.
2. Secretary: Dr. Nguyễn Lê Duy Lai.
3. Examiner 1: Assoc. Prof. Đỗ Văn Nhơn.
4. Examiner 2: Assoc. Prof. Bùi Hoài Thắng.
5. Commissioner: Dr. Mai Hoàng Bảo Ân.

Approval of the Chairman of Master’s Thesis Committee and Dean of Faculty of
Computer Science and Engineering after the thesis being corrected (If any).

CHAIRMAN OF THESIS COMMITTEE


HEAD OF FACULTY OF COMPUTER
SCIENCE AND ENGINEERING


I

VIETNAM NATIONAL UNIVERSITY - HO CHI MINH CITY
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY

SOCIALIST REPUBLIC OF VIETNAM
Independence – Freedom - Happiness

THE TASK SHEET OF MASTER’S THESIS
Full name: Lê Đức Huy

Student ID: 2170306

Date of birth: 22/12/1996

Place of birth: Ho Chi Minh City

Major: Computer Science

Major ID: 8480101

I. THESIS TITLE (In Vietnamese): HỆ THỐNG NHẬN DIỆN KHN MẶT
CĨ VÀ KHƠNG CĨ KHẨU TRANG DỰA TRÊN NỀN TẢNG HỌC SÂU
TRONG THỊ GIÁC MÁY TÍNH VÀ ỨNG DỤNG TRONG VIỆC CHẤM
CÔNG TẠI CÁC DOANH NGHIỆP HIỆN NAY.

II. THESIS

TITLE

(In

English):

AUTHENTICATION

VIA

DEEP

LEARNING FACIAL RECOGNITION WITH AND WITHOUT MASK
AND TIMEKEEPING IMPLEMENTATION AT WORKING SPACES.
III. TASKS AND CONTENTS:
-

Conduct research on modern machine learning and deep learning
architectures.

-

Conduct research on applied techniques involved in biometric authentication
in all walks of life.

-

Establish a face recognition model to turn theory into reality.


-

Discover datasets in search of appropriateness to train the model.

-

Conduct an end-to-end model training.

-

Evaluate the model based on evaluation metrics such as Loss and Precision.

-

Conceptualize the idea, draw the flow diagram and design the time-keeping
application using the face recognition method.

-

Build up the application and put it under quality assurance.

IV. THESIS START DATE: 06/02/2023
V. THESIS COMPLETION DATE: 09/06/2023


II

VI.


SUPERVISORS: Assoc. Prof. Quản Thành Thơ, Dr. Nguyễn Tiến Thịnh.

Ho Chi Minh City, date ... ... ...
SUPERVISOR 1
SUPERVISOR 2
(Full name and signature) (Full name and signature)

Quản Thành Thơ

CHAIR OF PROGRAM COMMITTEE
(Full name and signature)

Nguyễn Tiến Thịnh

DEAN OF FACULTY OF COMPUTER SCIENCE AND ENGINEERING
(Full name and signature)


III

ACKNOWLEDGEMENTS

For this honor chance, I am truly elated to express my deepest gratitude to my
supervisor, Assoc. Prof. Quan Thanh Tho and Dr. Nguyen Tien Thinh for all the
advice and elucidation on the road to reach the final target in this Master of Computer
Science program. His professions and characteristics paved the way and encouraged
me to fulfill the needs of research and complete this thesis writing.

I am forever grateful for my family, the belief of my parents and sister keeps my spirit
strong and stays motivated throughout even the most difficult time. And also without

the continuous support from all my friends, I could not stand where I am as of now.


IV

ABSTRACT
Face recognition has so far played a crucial role in authentication perspective where
it is taken as the most secure and effective way of biometry. However, masked faces
post the novel of Covid-19 brought a huge challenge to the existing techniques in
which part of the face is covered and occlusion ever since becomes a heated topic of
research once again. In regard to the motivation behind this thesis to contribute to the
security matter in the face recognition industry, I have been doing the research
following the issue path left uncovered by many state-of-the-art machine learning
models and coming up with these proposals to alleviate and somewhat enhance the
advanced procedure for face verification under masks considering these specific
elements:
• The first ever use of the Siamese Neural Network (SNN) in human face
recognition still wearing masks instead of reusing pre-trained state-of-the-art
models. The datasets for training and testing are well collected with the
datasets MLFW (Masked Labeled Faces in the Wild) to produce the final
output of SNN that fulfills the expectation towards the high accuracy in the
first place.
• The advantage and effectiveness of using ensemble learning to separate the
tasks of training the models upon different purposes: face with mask and
without mask. It also lights up the capabilities of ruling out security breaches
of a single model of mixed datasets.
• Emulate and visualize the result in such a way that mimics the real
circumstances regarding time-keeping in production and enterprises in the
long run. For the deployment pipeline, I have employed the leading
infrastructure of Flask and Streamlit from what they already achieve with

Python web application as it stands.


V

TĨM TẮT LUẬN VĂN
Nhận diện khn mặt cho đến nay đã đóng một vai trị vơ cùng quan trọng trong việc
xác thực và được xem là phương thức bảo mật sinh trắc học an toàn và hiệu quả nhất.
Tuy nhiên, việc nhận diện những khuôn mặt đeo khẩu trang sau đại dịch Covid-19 đã
mang đến một thách thức to lớn đối với các kỹ thuật hiện có và trở thành một đề tài
được giới chuyên môn hết sức quan tâm. Nhằm gây dựng sự đóng góp cho vấn đề
bảo mật trong bài tốn nhận diện khn mặt, học viên đã thực hiện nghiên cứu theo
định hướng giải quyết vấn đề cịn tồn đọng của các mơ hình học máy tiên tiến nhất
hiện nay và đưa ra giải pháp để đáp ứng việc xác minh khuôn mặt kể cả khi đeo khẩu
trang dựa trên các yếu tố cụ thể như sau:
• Lần đầu áp dụng Siamese Neural Network (SNN) cho bài tốn nhận diện
khn mặt có đeo khẩu trang thay vì sử dụng lại các mơ hình hiện đại đã được
huấn luyện từ trước. Các tập dữ liệu huấn luyện và kiểm tra được thu thập dựa
trên tập MLFW (Masked Labeled Faces in the Wild) để mơ hình SNN có thể
cho ra kết quả xác thực với độ chính xác đáp ứng được kỳ vọng đã đặt ra ngay
từ khi bắt đầu triển khai.
• Tận dụng ưu điểm và tính hiệu quả của việc áp dụng ensemble learning để
phân chia nhiệm vụ huấn luyện cho các mơ hình phục vụ nhu cầu các bài tốn
khác nhau: nhận diện được khn mặt khi có đeo hoặc khơng đeo khẩu trang
riêng biệt. Điều này cũng giúp cho tính bảo mật được đảm bảo so với việc
huấn luyện một mơ hình chung cho tác vụ nhận diện khn mặt kể cả khi có
đeo và khơng đeo khẩu trang.
• Mơ phỏng và trực quan hóa kết quả nhằm đánh giá thực tế khả năng chấm
công trong môi trường doanh nghiệp trong tương lai. Đối với bước đầu trong
công tác triển khai, học viên đã sử dụng cơ sở hạ tầng công nghệ tiên tiến của

Flask và Streamlit kế thừa từ những gì họ đã xây dựng trên nền tảng lập trình
trang web Python như hiện tại.


VI

THE COMMITMENT OF THE THESIS’S AUTHOR

I hereby confirm that this thesis and the work presented in it is entirely my own.
Where I have consulted the work of others is always clearly stated. All statements
taken literally from other writings or referred to by analogy are marked and the source
is always given. This paper has not yet been submitted to another examination office,
either in the same or similar form.

I agree that the present work may be verified with anti-plagiarism software.

THE THESIS’S AUTHOR

Lê Đức Huy


VII

TABLE OF CONTENTS
CHAPTER 1: INTRODUCTION ..............................................................................1
1.1.

Introduction ................................................................................................1

1.2.


Problem statement ......................................................................................1

1.3.

Objectives and missions .............................................................................3

1.4.

Scope of the thesis .......................................................................................4

1.5.

Thesis contributions ...................................................................................5

1.6.

Thesis structure ..........................................................................................5

CHAPTER 2: BACKGROUND KNOWLEDGE ......................................................7
2.1.

Face recognition ..........................................................................................7

2.2.

Convolutional Neural Network (CNN) .....................................................8

2.2.1.


Convolution ......................................................................................... 11

2.2.2.

Pooling ................................................................................................ 13

2.2.3.

Cross-Entropy Loss ............................................................................. 14

2.3.

Siamese Neural Network (SNN) ..............................................................15

2.3.1.

Overall of Siamese Neural Network ................................................... 15

2.3.2.

Loss function of SNN ........................................................................... 16

2.3.3.

Discussion on SNN .............................................................................. 17

2.3.4.

Evaluation metrics .............................................................................. 18


2.4.

Ensemble learning ....................................................................................20

CHAPTER 3: RELATED WORKS .........................................................................29
3.1.

Global feature support .............................................................................31

3.1.1.

Appearance based ............................................................................... 31

3.1.2.

Model based ........................................................................................ 31

3.2.

Local feature support ...............................................................................32


VIII

3.2.1.

Learning based .................................................................................... 32

3.2.2.


Hand-crafted based ............................................................................. 32

3.3.

One-shot learning .....................................................................................33

3.4.

Discussion ..................................................................................................34

CHAPTER 4: THE PROPOSED MODEL AND IMPLEMENTATION ..............36
4.1.

Reference model .......................................................................................36

4.2.

Datasets and pre-process .........................................................................37

4.2.1.

Labeled Faces in the Wild (LFW) datasets ......................................... 37

4.2.2.

Masked Labeled Faces in the Wild (MLFW) datasets ........................ 38

4.3.

Application architecture ..........................................................................40


4.3.1.

New hire model training ..................................................................... 40

4.3.2.

Verification process ............................................................................ 41

4.3.3.

Admin portal ....................................................................................... 42

4.3.4.

Database management server (DBMS) .............................................. 43

4.3.5.

Streamlit User Interface ...................................................................... 44

4.3.6.

Flask .................................................................................................... 45

4.4.

Proposed Model ........................................................................................45

4.4.1.


Motivation and idea ............................................................................ 45

4.4.2.

Parameters configuration ................................................................... 47

4.4.3.

Experimental results and Discussion .................................................. 48

4.5.

Ablation study ...........................................................................................52

4.6.

Time-keeping application ........................................................................53

4.7.

Multiple models for recognizing employees ...........................................55

CHAPTER 5: CONCLUSION .................................................................................56
REFERENCES ........................................................................................................58


IX

TABLE OF FIGURES


Figure 1.1: The face recognition and time-keeping application pipeline architecture2
Figure 2.1: The flowchart of Face Recognition .......................................................... 8
Figure 2.2: Human brain processes the image and recognizes ................................... 9
Figure 2.3: The process of extracting hidden attributes from of the face ................. 10
Figure 2.4: The sample calculation of convolution................................................... 11
Figure 2.5: The sample convolutional neural network for image classification ....... 12
Figure 2.6: A depiction of shared weights in convolutional neural network ............ 13
Figure 2.7: A sample calculation of max pooling ..................................................... 14
Figure 2.8: The sample Siamese Neural Network for face recognition .................... 16
Figure 2.9: A confusion matrix and its actual denotation ......................................... 19
Figure 2.10: Ensemble learning ................................................................................ 21
Figure 2.11: Bagging concept ................................................................................... 22
Figure 2.12: Boosting concept .................................................................................. 25
Figure 2.13: Stacking concept ................................................................................... 27
Figure 3.1: The technique taxonomy for Face Recognition ..................................... 30
Figure 4.1: The complete reference model for Face Recognition ............................ 37
Figure 4.2: Labeled Faces in the Wild datasets ........................................................ 38
Figure 4.3: MLFW is constructed by adding mask to the images in LFW with
perturbation for achieving diverse generation effect ................................................ 40
Figure 4.4: Training model process .......................................................................... 40
Figure 4.5: Verification process ................................................................................ 41
Figure 4.6: The admin portal for time-keeping boards and visualizations ............... 42
Figure 4.7: Timesheet in the application ................................................................... 54
Figure 4.8: Face ID with a Mask in an iPhone.......................................................... 55


X

TABLE OF TABLES

Table 1.1: The output use cases of the face recognition model ..................................3
Table 2.1: The development of Bagging concept .....................................................23
Table 2.2: The development of Boosting concept ....................................................25
Table 3.1: The summary of recent works relating to one-shot learning ...................33
Table 4.1: The originally given parameters ..............................................................47
Table 4.2: Parameters configuration .........................................................................48
Table 4.3: Overview of training, validation and testing image set ...........................49
Table 4.4: Summary of performance outcome on different face recognition
baselines. “#Models” is the number of models used in the method for evaluation ..50
Table 4.5: Comparison of model-training and model-testing time in seconds of each
epoch for different face recognition models .............................................................51
Table 4.6: PDSN model experiment .........................................................................53


1

CHAPTER 1: INTRODUCTION
1.1.

Introduction

After Covid-19 pandemic, the biggest hit in daily life for over three years now, the
world is gradually healing but still the virus is a vicious threat and no one can be able
to predict whenever a new mutant suddenly appears. With the challenge being said,
business enterprises are now eager to make ways for adapting post-Covid 19 social
distancing to some extent, ranging from wearing masks to contactless authentication
methods in public places1. In the midst of Covid-19 resurgence, we also faced the
hindrance of timekeeping handled manually by online spreadsheets and it caused a
huge delay in terms of regular reports2. These two add up to the existing difficulties
that urged scientists to deep dive into the world of Artificial Intelligence (AI) and

Machine Learning to mitigate and in the positive manner, contributing to the major
accomplishment of AI in all walks of life. In order to tackle the issues, one would see
the potential of face recognition using the canonical Siamese Neural Network [1] – a
biometric authentication method being integrated with a time-tracking system but the
problem occurs when a subject is wearing a mask. Recent studies indicate promising
results in both face mask detection and masked face recognition using the
DeepMaskNet [2]. This, at the first glance, greatly serves the purpose of making these
obstacles fade away but based on the above specific problem, nothing can be
concluded unless putting it under real experiment, practicality and questions are still
open for true ability of the Siamese Neural Network in terms of recognizing the
similarities between the faces with mask wearing.
1.2.

Problem statement

It is obvious that the input of the face recognition model is the human face either with
mask or without mask that has been captured as an image (in png or jpg format) via

1

/> />2


2

the live camera system. The model will return the output consisting of 2 main
components:
• The face recognition result – assert True or False and if True states the name
of the person.
• Potentially with or without mask result.


The face recognition model is presented in details as below:

Figure 1.1: The face recognition and time-keeping application pipeline
architecture
In order to ensure the quality of the model, the input image has to cover the whole
face either with mask or without mask. This is an ideal implementation for face
recognition. In fact, the face recognition model has long been researched and
scientists have come up with a lot of proposals for the improvement in accuracy, even
in unconstrained environments [3] but the output most of the time contains the very
crucial part – the assertion of True or False. In some other cases, the percentage of
face matching would be used for measuring, anticipating and predicting the human
faces in the future.


3

Take the below motivating example starting with a human face image captured from
the live camera and an output for each step:

Table 1.1: The output use cases of the face recognition model
Input

Output at step 1

Output at step 2

Output at step 3

Human face image


Mr. John

Access Granted

Timein: June

(with mask)

McCarthy: True

10th, 2023

(with mask)

07:49:05AM

Human face image

Mr. Andrew Le:

(without mask)

True (without

10th, 2023

mask)

09:16:23AM


Human face image

Unable to

(with/without mask)

recognize

1.3.

Access Granted

Access Denied

Timein: June

Timein: No
record found

Objectives and missions

This thesis opted for the comprehension of face recognition model based on machine
learning and deep learning with details as follows:
• Understand the fundamental principles of machine learning and deep learning.
• Identify the problems with face recognition (especially with masks) and the
ways to get it resolved based on recent face recognition articles.
• Analyze all the procedures, assess the feasibility of each solution and draw a
conclusion on the pros and cons of the proposed solution.
• Research on the popular and appropriate human face datasets (with and

without mask) and collect in advance for later usage.
• Put the face recognition model in real test, understand and suggest an
enhancement for better accuracy and performance.
• Get the most out of experience regarding the algorithmic logics, deep learning,
face measurement and recognition strategy in the thesis and self-open


4

opportunities for future research and potentially deploy the product for mass
usage.

Tasks involved in the thesis:
o Collect all selective papers of face recognition in recent years.
o Research on the past and current obstacles or unsolved cases for face
recognition with masks.
o Conduct the experiment with some face recognition methods, especially for
those with masks on and propose the most appropriate one for both face mask
and without mask recognition based on the feasibility of timeline and scope.
o Identify the output and the expectation of the model to support a collection of
relevant datasets.
o Establish the model using existing framework, libraries, tools and put it down
to testing and validating the result.
o Provide the final conclusion and further steps need to be taken in the long run.
1.4.

Scope of the thesis

There should be a long line of face recognition researches and applications hence the
scope of the thesis has been inquired as below:

• Build up the face recognition model using the Convolutional Neural Network
for recognizing the human face with and without mask (stating the name of
the user).
• The used datasets have to be popular for evaluation and include a variety of
facial components.
• Machine learning technology: representation using Euclidean distance,
evaluation using Precision, Recall, F1, optimizing using Adam, Stochastic
Gradient Descent and relevant parameters configuration.
• The basic Python UI web application for user experience using Flask and
Streamlit.


5

1.5.

Thesis contributions

In this thesis, the author proposes a solution with the machine learning model so that:
• It can be applied for face recognition with both masked and unmasked human
face datasets.
• The model acts as a baseline to improve security for face recognition with
ensemble learning.
• The quality of the model behind a ubiquitous Python web application is proved
to be advanced and fulfill the needs of enterprises.
1.6.

Thesis structure

The structure of the thesis consists of 5 chapters:

-

Chapter 1: Introduction. The first chapter will be dedicated for the overview
of face recognition and its current implementation across different sectors of
information technology. Coming up in the important part of this chapter is to
reinstate the issue with the time-keeping at firms nowadays, pushing
developers to take part in multiple steps to come up with the use of advanced,
newly introduced neural networks and deep learning methods. Also, the plan,
orientation, target and milestone are indispensable segments and should be
briefly established in this chapter.

-

Chapter 2: Background knowledge. This is where the author will put all
related knowledge about neural networks, deep learning methods as well as
Python web-based development, database management system involvement
in order to fully support the project.

-

Chapter 3: Related works. Other methods and models will also be taken into
consideration when implementing face recognition for mass usage. However,
the concentration is still narrowed down to the work of highly-reputed authors
and publishers. Citing methods should be strictly applied in this chapter.


6

-


Chapter 4: The proposed model and implementation. Propose the most
appropriate models and methods with a reason stated in this chapter, leaving
room for any breakthrough findings and improvements and the use of
Datasets will depend on how the model is established and trained but a large
number of records and characteristics inside the datasets should be required to
verify the accuracy of the model. A detailed instruction of how to form and
use the datasets needs to be introduced in this chapter. Implementation: This
one should come as the most important part of the project so a deep dive into
the existing model would equip the author with a mechanism and thorough
understanding towards the goal of the project. Coding and compiling will be
captured on how to build the model in this part. Experimental results and
Discussion: This next section will indicate the results of the model applied
and make a comparison between this one and the other relevant models. Loss
functions and optimization should be well-defined and stated, opening up
opportunities for further research.

-

Chapter 5: Conclusion. The author should do the retro for all phases of the
project, especially connoting the merits and demerits throughout the time of
implementation. Other advice or suggestions for a near future improvement of
the project need to be stated.


7

CHAPTER 2: BACKGROUND KNOWLEDGE
This thesis is anchored in two main theories, one of which is the machine learning
inspiring the model to be built upon and the other one vitalizes software development,
specializing in web applications. Based on these fundamentals, the knowledge basis

of the thesis will be expressed following that trail, resulting in the definition of face
recognition in general, the Convolutional Neural Network to extract latent features,
the recently discovered Siamese Neural Network [4] to tackle the problem of limited
time, resources and variations during face recognition and finally the loss function
for mathematical optimization and decision.

2.1.

Face recognition

A facial recognition system is a technology capable of matching a human face from
a digital image or a video frame against a database of faces. Such a system is typically
employed to authenticate users through ID verification services, and works by
pinpointing and measuring facial features from a given image [5].

Facial recognition (Figure 2.1) [6] identifies a human face in a kind of a twodimensional image. Firstly, the face has to be separated from a noisy background.
Then the face will be cropped into a desired size and posed in grayscale. This step is
useful for local support feature since it enables the accurate localization of landmarks
before entering the final step of feature extraction, where multiple neural networks
and filters get involved in representing a complete face to perform a comparison to
the existing data in the database.

Turning into the algorithms to make these steps programmable, there were traditional
mathematical computations to assess the relative positions between eyes, nose,
cheeks, chin, etc but still having difficulties identifying a complete face since
matching capabilities are limited to pre-trained datasets and vulnerable to face
variations as a result of overfitting. This problem has urged data scientists to discover


8


a new type of machine learning methods called deep-learning architectures (which
will be elaborately described in the next section).

Figure 2.1: The flowchart of Face Recognition

2.2.

Convolutional Neural Network (CNN)

In deep learning, the Convolutional Neural Network (CNN, or ConvNet) is a class of
the Artificial Neural Network (ANN), most commonly applied to analyze visual
images [7]. CNNs are well known for the shared-weight architecture of the
convolution kernels or filters that slide along input features and provide translationequivariant responses known as feature maps. Counter-intuitively, most CNNs are
not invariant to translation, due to the downsampling operation that they apply to the
input. CNN has several applications in image and video recognition, naming
recommender systems, image classification, image segmentation, medical image


9

analysis, natural language processing, brain–computer interfaces, and financial time
series.

Thanks to the nature of CNN for latent feature extraction, CNN has been widely used
for image classification problems. Take the below one as an instance on how a human
being is capable of using the neurons in the brain to think, remember and process the
image and make a comparison with an artificial neural network.

Figure 2.2: Human brain processes the image and recognizes

Figure 2.23 illustrates the way a machine can be able to reproduce the process native
to the human brain to identify the correct face among the others. The input parameters
consist of multiple components on the face such as eyes, nose, mouth, etc. However,
these would not be playing the most important role on deciding the truthy of the face
but the hidden attributes ranging from sinus to jaw or hairstyles will be the one in
charge. This takes a step further to enhance the ability of machines onwards since

3

/>

10

human beings over time may suffer from memory loss or keeping their focus only on
the main characteristics of the face. Therefore, these input parameters as stated above
combined with hidden attributes extracted from the machine learning model would
make a strong step up towards accuracy, authority and security, thus encouraging the
effectiveness of face recognition in this thesis.

Figure 2.3: The process of extracting hidden attributes from of the face
Figure 2.3 shows the basic learning process of an artificial neural network4. From
the input values, the neural network performs processing operations to extract the
latent features of the data. These hidden attributes can be taken to become the next
input parameters in subsequent layers until the final result shows up. There will also
be a shared weight available (described in details in the section 2.2.1) for the
Convolutional Neural Network in which reducing the training time for the model
instead of involving a huge number of parameters, thus bringing a huge advantage
over fully-connected layers.

4


/>

11

In a normal artificial neural network, a hidden layer is formed by different neural
nodes in series and through information processing will create neural nodes in the
next layer. This process comes with the aid of fully-connected layers where each node
of the current layer connects with each node of the next layer. However, the problem
occurs when the number of input parameters grows rapidly over time, which in turn
puts the burden on complexity and performance. For instance, when inputting the
image of size 64x64x3 to the network, this requires all pixels to be converted into
nodes, meaning 64 x 64 x 3 = 12288 is the number of nodes for the input at the
moment. In addition, by multiplying the number of nodes to the weight (take 1000
for the weight as an example) for the first hidden layer, it brings the figure for nodes
well surpasses 12 million. That is such a huge number considering the image as
stating is captured in a low resolution compared to the standard nowadays, let alone
this one only has a single hidden layer. With that being said, specific mathematical
calculations need to get involved in mitigating the cumbersome. Outstandings are the
works of convolution and pooling.

2.2.1. Convolution
The Convolutional Neural Network (illustrated in Figure 2.5) starts with an idea of
performing convolution calculation as depicted in the figure below:

Figure 2.4: The sample calculation of convolution
Take the input image as a matrix with size 7x7 consisting of the number either 0 or
1 and the filter matrix of 3x3. The formula to perform the convolution calculation is
as follows:



12

𝑆𝑖𝑗 = (𝐼 ∗ 𝐾)𝑖𝑗 = ∑   ∑   𝐼 (𝑚, 𝑛)𝐾 (𝑖 − 𝑚, 𝑗 − 𝑛)
𝑚

(2.1)

𝑛

where 𝑖, 𝑗 address the position of the result element, 𝑚 and 𝑛 is the size of the input
matrix.

Figure 2.5: The sample Convolutional Neural Network for image classification
One of the most recognizable features of the Convolutional Neural Network is the
use of shared weights. Shared weights is basically defined as the same weight is used
for each kernel and neurons in the first hidden layer will precisely discover the
similarities between different regions or in other words, the latent features of the input
(as shown in Figure 2.6). This action is handled with the purpose of reducing the
number of input parameters while encouraging the finding of different main features
of the image. For instance, a matrix image with size 7x7 in Figure 2.4 with 2 kernels
of 3x3 then each feature map would need 3 x 3 = 9 weights and 25 neurons in the
second layer. Next, 2 feature maps would bring the total of parameters to 2 x 9 = 18
parameters which is significantly less than the same figure for the fully-connected
layer (assume that the number of neurons in the first layer is 10 then the number of
input parameters is 7 x 7 x 10 = 490 >> 18). This comparison indicates that the
convolutional layer needs just a fraction of parameters but retrieves the latent features
as many as the fully-connected layer.



×