Unified deep neural networks for anotomical site classification and lesion segmentation for upper gastrointestinal endoscopy

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.96 MB, 65 trang )

Hanoi University of Science and Technology
School of Information and Communication Technology

Master Thesis in Data Science
Unified Deep Neural Networks for Anatomical Site
Classification and Lesion Segmentation for Upper
Gastrointestinal Endoscopy
NGUYEN DUY MANH

Supervisor: Dr. Tran Vinh Duc

Hanoi 10-2022

Author’s Declaration
I hereby declare that I am the sole author of this thesis. The results in this work
are not complete copies of any other works.

STUDENT

Nguyen Duy Manh

Contents

Contents
Abstract
List of Figures
List of Tables
List of Acronyms

1 Introduction
1.1 General introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .

1
1

1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.3 Main contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.4 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

2 Artificial Intelligence and Machine Learning

3

2.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

2.2 Types of learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

2.2.1 Supervised learning . . . . . . . . . . . . . . . . . . . . . . . .

4

2.2.2 Unsupervised learning . . . . . . . . . . . . . . . . . . . . . .

5

2.2.3 Reinforcement learning . . . . . . . . . . . . . . . . . . . . . .

5

2.3 Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . .

7
7

2.3.1.1 Deep Learning and Neural Networks . . . . . . . . .

7

2.3.1.2 Perceptron . . . . . . . . . . . . . . . . . . . . . . .

9

2.3.1.3 Feed forward . . . . . . . . . . . . . . . . . . . . . .

10

2.3.1.4 Recurrent Neural Network . . . . . . . . . . . . . . .

11

2.3.1.5 Deep Convolutional Network . . . . . . . . . . . . .

11

2.3.1.6 Training a Neural Network . . . . . . . . . . . . . .

11

2.3.2 Convolutional Neural Network . . . . . . . . . . . . . . . . . .
2.3.2.1 Image kernel . . . . . . . . . . . . . . . . . . . . . .
2.3.2.2 The convolution operation . . . . . . . . . . . . . . .

12
13
13

2.3.2.3 Motivation . . . . . . . . . . . . . . . . . . . . . . .
2.3.2.4 Activation function . . . . . . . . . . . . . . . . . . .

14
16

2.3.2.5 Pooling . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.3.3 Fully convolutional network . . . . . . . . . . . . . . . . . . .

18

2.3.4 Some common convolutional network architectures . . . . . .

20

2.3.4.1 VGG...........................

20

2.3.4.2 ResNet . . . . . . . . . . . . . . . . . . . . . . . . .

20

2.3.4.3 DenseNet . . . . . . . . . . . . . . . . . . . . . . . .

21

2.3.4.4 UNet . . . . . . . . . . . . . . . . . . . . . . . . . .

21

2.3.5 Vision Transformer . . . . . . . . . . . . . . . . . . . . . . . .

23

2.3.5.1 The Transformer . . . . . . . . . . . . . . . . . . . .

23

2.3.5.2 Transformers for Vision . . . . . . . . . . . . . . . .

24

2.3.6 Multi-task learning . . . . . . . . . . . . . . . . . . . . . . . .

26

2.3.7 Transfer learning . . . . . . . . . . . . . . . . . . . . . . . . .

27

2.3.8 Avoid overfitting . . . . . . . . . . . . . . . . . . . . . . . . .

29

3 Methodology
3.1 EndoUNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31
31

3.1.1 Overall architecture . . . . . . . . . . . . . . . . . . . . . . . .

31

3.1.2 Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

3.1.3 Segmentation decoder . . . . . . . . . . . . . . . . . . . . . .

33

3.1.4 Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

3.2 SFMNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

3.2.1 Overall architecture . . . . . . . . . . . . . . . . . . . . . . . .

34

3.2.2 Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

3.2.3 Compact generalized non-local module . . . . . . . . . . . . .

37

3.2.4 Squeeze and excitation module . . . . . . . . . . . . . . . . .

37

3.2.5 Feature-aligned pyramid network . . . . . . . . . . . . . . . .

37

3.2.6 Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

3.3 Metrics and loss functions . . . . . . . . . . . . . . . . . . . . . . . .

39

3.4 Multi-task training . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

4 Experiments

42

4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

4.2 Data preprocessing and data augmentation . . . . . . . . . . . . . . .

44

4.3 Implementation details . . . . . . . . . . . . . . . . . . . . . . . . . .

45

4.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . .

46

5 Conclusion and future work

51

References

52

Abstract
Image Processing is a subfield of computer vision concerned with comprehending
and extracting data from digital images. There are several applications for image
processing in various fields, including face recognition, optical character
recognition, manufacturing automation inspection, medical diagnostics, and tasks
connected to autonomous vehicles, such as pedestrian detection. In recent years,
the deep neural network has become one of the most popular image processing
approaches due to a number of significant advancements.
The use of machine learning in biomedical applications can be structured into three
main orientations: (1) as a computer-aided diagnosis to help the physicians for an
efficient and early diagnosis, with a better harmonization and less contradictory
diagnosis; (2) to enhance the medical care of patients with better-personalized therapies; and (3) to improve the human wellbeing, for example by analyzing the
spread of disease and social behaviors in relation to environmental factors [1]. In

this work, I propose to construct the models for the first orientation that is capable
of handling multiple simultaneous tasks pertaining to the upper gastrointestinal (GI)
tract. On a dataset of 11469 endoscopic images, the models were evaluated and
produced relatively positive results.

List of Figures
2.1 Reinforcement learning components . . . . . . . . . . . . . . . . . . .
2.2 Relationship between AI, ML, and DL . . . . . . . . . . . . . . . . .

6
7

2.3 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

2.4 Illustration of a deep learning model [2] . . . . . . . . . . . . . . . . .

9

2.5 Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

2.6 Architecture of a CNN [3] . . . . . . . . . . . . . . . . . . . . . . . .

13

2.7 Example of convolution operation [4] . . . . . . . . . . . . . . . . . .

14

2.8 Sparse connectivity, viewed from below [2] . . . . . . . . . . . . . . .

15

2.9 Sparse connectivity, viewed from above [2] . . . . . . . . . . . . . . .

15

2.10 Common activation functions [5] . . . . . . . . . . . . . . . . . . . . .

16

2.11 Max pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

2.12 Average pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

2.13 Architecture of an FCN [6] . . . . . . . . . . . . . . . . . . . . . . . .

19

2.14 Architecture of VGG16 [7] . . . . . . . . . . . . . . . . . . . . . . . .

20

2.15 A residual block [8] . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

2.16 DenseNet architecture vs ResNet architecture [ 9] . . . . . . . . . . . .

22

2.17 UNet architecture [10] . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.18 Attention in Neural Machine Translation . . . . . . . . . . . . . . . .

24

2.19 The Transformer - model architecture [11] . . . . . . . . . . . . . . .

25

2.20 Vision Transformer architecture [12] . . . . . . . . . . . . . . . . . . .

25

2.21 Common form of multi-task learning [2] . . . . . . . . . . . . . . . . .

26

2.22 The traditional supervised learning setup . . . . . . . . . . . . . . . .

27

2.23 Transfer learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

3.1 Architecture of EndoUNet . . . . . . . . . . . . . . . . . . . . . . . .

31

3.2 VGG19-based shared block . . . . . . . . . . . . . . . . . . . . . . . .

32

3.3 ResNet50-based shared block . . . . . . . . . . . . . . . . . . . . . . .

33

3.4 DenseNet121-based shared block . . . . . . . . . . . . . . . . . . . . .

33

3.5 EndoUNet decoder configuration . . . . . . . . . . . . . . . . . . . .

34

3.6 SFMNet architecture . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

3.7 Grouped compact generalized non-local (CGNL) module [ 13] . . . . .

37

3.8 A Squeeze-and-Excitation block [14] . . . . . . . . . . . . . . . . . . .
3.9 Overview comparison between FPN and FaPN [ 15] . . . . . . . . . .

38
38

3.10 Feature alignment module [15] . . . . . . . . . . . . . . . . . . . . . .

39

3.11 Feature selection module [15] . . . . . . . . . . . . . . . . . . . . . .

39

4.1 Demostration of upper GI . . . . . . . . . . . . . . . . . . . . . . . .

42

4.2 Some samples in anatomical dataset . . . . . . . . . . . . . . . . . . .

43

4.3 Some samples in lesion dataset . . . . . . . . . . . . . . . . . . . . . .

44

4.4 Some samples in HP dataset . . . . . . . . . . . . . . . . . . . . . . .

44

4.5 Image augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

4.6 Learning rate in training phase . . . . . . . . . . . . . . . . . . . . .

46

4.7 EndoUnet - Confusion matrix on anatomical site classification task
on a fold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

4.8 SFMNet - Confusion matrix on anatomical site classification task on
a fold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

4.9 Confusion matrices on lesion classification task on a fold. . . . . . . .

49

4.10 Some examples of the lesion segmentation task . . . . . . . . . . . . .

50

List of Tables
3.1 Detailed settings of MiT-B2 and MiT-B3 . . . . . . . . . . . . . . . .

36

4.1 Number of images in each anatomical site and lighting mode . . . . .

43

4.2 Accuracy comparison on the three classification tasks . . . . . . . . .

47

4.3 Dice Score comparison on the segmentation task . . . . . . . . . . . .

48

4.4 Number of parameters and speed of models . . . . . . . . . . . . . . .

48

List of Acronyms
GI
HP

Gastrointestinal

Helicobacter Pylori

AI

Artificial Intelligence

ML

Machine Learning

DL

Deep Learning

NN Neural Network
DNN Deep Neural Network
CNN Convolutional Neural Network
RNN Recurrent Neural Network
MTL Multi-task Learning
RL

Reinforcement Learning

Chapter 1
Introduction
1.1

General introduction

The upper GI tract comprises the oral cavity, the esophagus, the stomach, and the
duodenum [16]. The upper GI tract’s common diseases include inflammation
(esophagitis, gastritis), peptic ulcers, or malignancies. Upper GI tract cancers, including esophagus and gastric cancer, are at the top of malignancies in both prevalences and mortalities [17]. However, the high miss rate of these lesions during
endoscopy is still a big issue in the world as well as in developing countries [ 18].
Therefore, improving the detection rate will increase the chances of patients receiving timely medical treatment and prolong survival time [19].
Esophagogastroduodenoscopy (EGD) is a diagnostic procedure that visualizes the
upper part of the GI tract down to the duodenum. It is an exploration method that
accurately detects lesions of the GI tract that are difficult to identify with other tools
(biomarkers or imaging). However, the lesions missing rate, defined as a negative
finding on endoscopy in patients with lesions within three years, has been reported
in the literature review. In a study published by Menon et al. [19], this rate was
11.3% and similar for both esophageal and gastric cancers. In another paper by
Shimodate et al. from a Japanese institution [20], they concluded that the miss rate
of gastric superficial neoplasms (GSN) was 75.2%. There are many reasons for this
situation, such as the heterogeneous quality of endoscopy systems, different levels
of experience in technical performance and lesions evaluation of endoscopists, and
lack of patient tolerance of the procedure. Therefore, computer-aided diagnosis is
desirable to help improve the reliability of this procedure.
Deep learning (DL) has gained remarkable success in solving various computer vi-sion
tasks. In recent years, several DL-based methods have been proposed to deal with
EGD-related tasks, such as informative frame screening, anatomical site classi-fication,
gastric lesion detection, and diagnosis. However, previous works often solve these
tasks separately. Therefore, the development of computer-aided systems ca-

1

pable of simultaneously solving all the tasks using separate task-specific DL
models would require many memory and computational resources. This makes it
difficult to deploy such systems on low-cost devices. On the other hand, collecting

and an-notating patients’ data for medical imaging analysis is challenging in
practice. The lack of data can significantly reduce the models’ performance.
In order to address these issues, this paper proposes two models to simultaneously
solve four EGD-related tasks: anatomical site classification, lesion classification,
HP classification, and lesion segmentation. The model includes a shared encoder
for learning common feature representation, followed by four output branches to
solve four tasks. As a result, models greatly benefit from multi-task training since
they can learn a powerful joint representation from an extensive dataset that combines multiple data sources collected solely for each task. Experiments show that
the proposed models yield promising results in all tasks and achieve competitive
performance compared to single-task models.

1.2

Objectives

This work aims to build unified models to tackle multiple tasks relating to the
upper gastrointestinal tract. The tasks include anatomical site classification,
lesion classification, HP classification and lesion segmentation.

1.3

Main contributions

The main contributions of this study are as follows:
• Introduce two unified deep learning-based models to simultaneously solve
four tasks related to the upper GI tract: a CNN-based baseline model and
a Transformer-based model.
• Evaluate the proposed methods on a Vietnamese endoscopy dataset.

1.4

Outline of the thesis

The rest of this thesis is organized as follows:
Chapter 2 presents an overview of the concepts of Artificial Intelligence,
Machine Learning, Deep Learning, and related techniques.
Chapter 3 proposes a model to simultaneously solve tasks related to the upper
gastrointestinal tract.
Chapter 4 presents the content of the experiments and the results obtained.
Chapter 5 concludes the thesis.
2

Chapter 2
Artificial Intelligence and Machine
Learning
2.1

Basic concepts

Over the last few decades, several definitions of Artificial Intelligence (AI) have surfaced. John McCarthy [21] defined AI as follows: it is the science and technology of
creating intelligent machines, particularly intelligent computer programs. Artificial
Intellect is related to the same challenge of using computers to comprehend human
intelligence, but it is not limited to biologically observable approaches.

The ability to simulate human intelligence distinguishes AI from logic programming in computer languages. In particular, AI enables computers to gain human
intelligence, such as thinking and reasoning to solve problems, communicating
via understanding language and speech, and learning and adapting.
Artificial Intelligence is, in its simplest form, a field that combines computer
science and robust datasets to enable problem-solving. In addition, it includes

the subfields of machine learning and deep learning, which are commonly
associated with artificial intelligence.
AI can be categorized in different ways. This thesis divides AI into two
categories based on its strength: weak AI and strong AI.
Weak AI, also known as Narrow AI, is a sort of AI that has been trained to do a
particular task. Weak AI imitates human perception and aids humanity by automating
time-consuming tasks and data analysis in ways that humans cannot always perform.
This sort of artificial intelligence is more accurately described as “Narrow”, as it lacks
general intelligence and instead possesses intelligence tailored to a certain field or
task. For instance, an AI that is great at navigation is typically incapable of playing
chess, and vice versa. Weak AI helps transform massive amounts of data into useful
information by identifying patterns and generating predictions. Most of the AIs that we
see today are weak AIs, with typical examples such as

3

virtual assistants (Apple’s Siri or Amazon’s Alexa), Facebook’s news feed,
spam filtering in email management applications (Gmail, Outlook), autonomous
vehicles (Tesla, VinGroup).
In addition to its strength, weak AI has the potential to wreak damage in the
event of a system failure. For instance, spam filtering systems may misidentify
essential emails and place them in the spam folder, or self-driving car systems
may cause traffic accidents owing to miscalculations, etc.
Strong AI, consisting of both Artificial General Intelligence (AGI) and Artificial
Super Intelligence (ASI). Artificial General Intelligence is a speculative kind of AI
that posits a machine has intelligence equivalent to that of a person and is
capable of self-awareness, problem-solving, learning, and future planning.
Similarly, artificial superintelligence (also known as superintelligence) is a
theoretical kind of artificial intelligence positing that a machine has superior

intelligence and capacities to those of human brains.
Strong AI is currently simply a concept with no examples in practice.
Nonetheless, academics continue to conduct research and hunt for
development avenues for this form of AI.
Machine Learning (ML) is a branch of AI and computer science that focuses on
the use of data and algorithms to imitate the way that humans learn, gradually
improving its accuracy [22].

2.2

Types of learning

Given that the focus of the field of Machine Learning is “learning”, Artificial Intelligence/Machine Learning employs three broad categories of learning to
acquire knowledge. They are Supervised Learning, Unsupervised Learning, and
Reinforce-ment Learning.

2.2.1

Supervised learning

In supervised learning, the computer is given labeled examples so that for each
input example, there is a matching output value. This strategy is intended to assist
model learning by comparing the output value created by the model with the real
output value to identify errors and then progressively modifying the model to reduce
errors. Supervised learning employs learned patterns to predict output values for
never-before-seen data (not in the training data). For classification and regression
problems, supervised learning proved itself to be accurate and fast.
• Classification is the process of discovering a function that divides a dataset
into classes according to certain parameters. A computer program is trained
4

on the training dataset, and based on this training, it classifies the data
into various classes. Classification has different use cases, such as spam
filtering, customer behavior prediction, and document classification.
• Regression is a method for identifying relationships between dependent
and independent variables. It aids in forecasting continuous variables,
such as Market Trends and Home Prices.
Supervised learning functions by modeling the linkages and dependencies between the
goal prediction output and the input features, such that it is feasible to predict the
output values for new data based on the associations it learned from the datasets.

2.2.2

Unsupervised learning

In contrast, the input data are not labeled in unsupervised learning.
Unsupervised learning describes a class of problems that involves using a
model to describe or extract relationships in data. Machines can learn to
recognize complex processes and patterns without human supervision. This
method is especially beneficial when specialists do not know what to search for
in the data, and the data itself does not offer targets. In actuality, the amount of
unlabeled data is significantly more than the amount of labeled data; hence,
unsupervised learning algorithms play a crucial role in machine learning.
Under many use cases of unsupervised learning, two main problems are often
en-countered: clustering which involves finding groups in data, and density
estimation which involves summarizing data distribution.
• Clustering: k-mean clustering is a well-known technique of this type, where k
is the number of clusters to discover in the data. It shares the same concept
with classification. However, in this case, there are no labels provided, and

the system will understand the data itself and cluster it.

• Density estimation: an example of a density estimation algorithm is Kernel
Density Estimation involves using small groups of closely related data
samples to estimate the distribution for new points in the problem space.
Due to its complexity and implementation difficulty, this sort of machine learning
is not as popular as supervised learning, although it enables the solution of
issues humans would ordinarily avoid.

2.2.3

Reinforcement learning

Reinforcement learning (RL) describes a class of problems where an agent operates in
an environment and must learn to operate using feedback. According to [23],
5

reinforcement learning is learning what to do — how to map situations to actions—
to maximize a numerical reward signal. The learner is not told which actions to take
but instead must discover which actions yield the most reward by trying them.
Reinforcement learning has five essential components: the agent, environment, state,
action, and reward. The RL algorithm (called the agent) will periodically improve by
exploring the environment and going through the different possible states. To maximize
the performance, the ideal behavior will be automatically determined by the agents.
Feedback (the reward) is what allows the agent to improve its behavior.

Figure 2.1: Reinforcement learning components
The idea can be translated into the following steps of an RL agent:
1. The agent observes an input state

2. An action is determined by a decision-making function (policy)
3. The action is performed
4. The agent receives a scalar reward or reinforcement from the environment
5. Information about the reward given for that state/action pair is
recorded In RL, there are two types of tasks: episodic and continuous
• Episodic task is the task that has a terminal state, this creates an episode:
a list of states, actions, rewards, and new states. Video games are a
typical example of this type of task
• Continuous task: opposite to episodic task, this one has no terminal state
and will never end. In this case, the agent has to learn how to choose the
best actions and simultaneously interact with the environment. For
example, a personal assistance robot does not have a terminal state.
Two of the most used algorithms in RL are Monte Carlo and Temporal Differ-ence
(TD) Learning. The Monte Carlo method involves learning from experience.
6

It learns through sequences of states, actions, and rewards. Suppose our agent
is in state s1, takes action a1, gets a reward of r1, and is moved to state s2.
This whole sequence is an experience. TD learning is an unsupervised method
for predicting the expected value of a variable across a sequence of states. TD
employs a mathemati-cal trick to substitute complicated thinking about the
future with a simple learning procedure that yields the same outcomes. Instead
of computing the whole future reward, TD attempts to forecast the mix of
immediate reward and its prediction of future reward at the next moment.

2.3
2.3.1

Techniques

Deep Learning

Since AI has been around for a long, it has a vast array of applications and is
divided into numerous subfields. Deep Learning (DL) is a subset of ML, which is
itself a branch of AI.
The figure below is a visual representation of the relationship between AI, ML,
and DL.

Figure 2.2: Relationship between AI, ML, and DL

2.3.1.1

Deep Learning and Neural Networks

In recent years, Machine Learning has achieved considerable success in AI
research, enabling computers to outperform or come close to matching human
performance in various domains, including facial recognition, speech
recognition, and language processing.
Machine Learning is the process of teaching a computer how to accomplish a task
7

instead of programming it step-by-step. Upon completion of training, a Machine
Learning system should be able to make precise predictions when presented
with data.
Deep Learning is a subset of Machine Learning, capable of being different in
some important respects from traditional Machine Learning, allowing computers
to solve a wide range of unsolvable complex problems. As an example of a
simple Machine Learning task, we can predict how ice cream sales will change
based on the outdoor temperature. Making predictions using only a few data

features in this way is relatively simple and can be done using a Machine
Learning technique called linear regression.
However, numerous problems in the real world do not fit into such simplistic
frame-works. Recognizing handwritten numerals is an illustration of one of
these difficult real-world issues. To tackle this issue, computers must be able to
handle the wide diversity of data presentation formats. Each digit from 0 to 9
can be written in an unlimited number of ways; the size and shape of each
handwritten digit can vary dramatically depending on the writer and the context.
Allowing the computer to learn from previous experiences and comprehend the
data by interacting with it via a system consisting of many layers of concepts is
an effective method for resolving these issues. This strategy enables computers
to tackle complex problems by constructing them from smaller ones. If this
hierarchy is represented by a graph, it will be formed by many layers and
defined by deep [2]. That is the idea behind neural networks.
A neural network is a model made up of many neurons. Each neuron is an
information-processing unit capable of receiving input, processing it, and giving
appropriate output. Figure 2.3 is the visual representation of a neural network.

Figure 2.3: Neural Network
All neural networks have an input layer, into which data is supplied before passing
8

through several layers and producing a final prediction at the output layer. In a
neural network, there are numerous hidden layers between the input and output
layers; thus, the term Deep in Deep Learning and Deep Neural Networks refers
to the vast number of hidden layers – typically greater than three – at the core
of these neural networks.
Neural Networks enable computers to learn multi-step programs. Wherein each
network layer is analogous to the computer’s memory after running another set

of instructions in parallel.
Figure 2.4 illustrates the process of a deep learning model recognizing an image of
a person. For a computer, an image is a set of pixels, and mapping a collection of
pixels to an object’s identity is an extremely complex process. Therefore,
attempting to learn or evaluate this mapping directly appears overwhelming.
Instead, deep learning overcomes this challenge by decomposing the intended
complex mapping into a series of layered simple mappings, each of which is
defined by a distinct model layer. Each layer of the network represents the different
features from the low level (edges, corners, contours) to the higher level features.

Figure 2.4: Illustration of a deep learning model [2]

2.3.1.2

Perceptron

In ML, the perceptron is the most commonly used term for all folks. It is a building
th

block of a neural network. Invented by Frank Rosenblatt in the mid of the 19
century, perceptron is a linear ML algorithm used for supervised learning for binary
9

classification.
The perceptron consists of three parts:
• Input nodes (or one input layer): this is the fundamental component of perceptron that accepts the initial data for subsequent processing. Each input
node carries a real-valued integer.
• Weight and bias: weight shows the strength of the particular node, and
bias is a value to shift the activation function curve up or down

• Activation function: this component is used to map the input between the
required values like (0, 1) or (-1, 1)

Figure 2.5: Perceptron
The perceptron works on these simple steps:
1. All the inputs x are multiplied with their weights w
2. Add all the multiplied values and call them weighted sum
3. Apply that weighted sum to the activation function
Perceptron is also one of the most straightforward neural network neuron
represen-tations.
2.3.1.3

Feed forward

First appeared in the 50s, the feedforward neural network was the first and
simplest type of artificial neural network. In this network, the information is
processed in only one direction - forward - from the input nodes, through hidden
nodes, to the output nodes.
10

Unified deep neural networks for anotomical site classification and lesion segmentation for upper gastrointestinal endoscopy

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về