Tải bản đầy đủ (.pdf) (457 trang)

Artificial neural networks and deep learning

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (5.01 MB, 457 trang )

Artificial Neural Networks and Deep Learning
Christian Borgelt
Bioinformatics and Information Mining
Dept. of Computer and Information Science
University of Konstanz, Universit¨atsstraße 10, 78457 Konstanz, Germany


eng.html
Lecture in ZEuS:
Exercises in ZEuS:
Lecture in ILIAS:

Christian Borgelt

/> /> />
Artificial Neural Networks and Deep Learning

1


Schedule and Exercises

Lecture
Exercises 1
Exercises 2

Day
Monday
Tuesday
Thursday


Time
15:15 – 16:45
10:00 – 11:30
10:00 – 11:30

Room
A703
M631
M701

Start
16.04.2018
24.04.2018
26.04.2018

The slides for the lecture (beware of updates!) are available here:
/>
eng.html

The lecture is accompanied by sheets of exercises (one sheet per week),
which should be worked on by you and which will be discussed afterward
in the exercise lesson (conducted by Christoph Doell).
The sheets of exercises can be downloaded as PDF files:
/>
eng.html

The first sheet is already available and is to be prepared for the first exercise lesson!

Christian Borgelt


Artificial Neural Networks and Deep Learning

2


Exam Admission and Exam
Exam admission is obtained via the exercise sheets.
At the beginning of each exercise lesson, a sheet will be passed round
on which you can “vote” for exercises of the current exercise sheet.
Voting for an exercise means to declare oneself willing to present something about it.
A full solution would be perfect, but partial solutions or some approach that was tried
are acceptable. It should become clear that it was actually tried to solve the exercise.
In order to be admitted to the exam, you have to:
• Vote for at least 50% of the exercises.
• Actually present something at least twice.
First exam (written):
Tuesday, 24.07.2018, 11:00 to 13:00 hours, in Room R511.
Second exam (written):
to be determined.
Christian Borgelt

Artificial Neural Networks and Deep Learning

3


Textbooks

This lecture follows the
first parts of these books

fairly closely, which treat
artificial neural networks.

Textbook, 2nd ed.
Springer-Verlag
Heidelberg, DE 2015
(in German)
Christian Borgelt

Textbook, 2nd ed.
Springer-Verlag
Heidelberg, DE 2016
(in English)
Artificial Neural Networks and Deep Learning

4


Contents
• Introduction

Motivation, Biological Background

• Threshold Logic Units

Definition, Geometric Interpretation, Limitations, Networks of TLUs, Training

• General Neural Networks
Structure, Operation, Training


• Multi-layer Perceptrons

Definition, Function Approximation, Gradient Descent, Backpropagation, Variants, Sensitivity Analysis

• Deep Learning

Many-layered Perceptrons, Rectified Linear Units, Auto-Encoders, Feature Construction, Image Analysis

• Radial Basis Function Networks

Definition, Function Approximation, Initialization, Training, Generalized Version

• Self-Organizing Maps

Definition, Learning Vector Quantization, Neighborhood of Output Neurons

• Hopfield Networks and Boltzmann Machines

Definition, Convergence, Associative Memory, Solving Optimization Problems, Probabilistic Models

• Recurrent Neural Networks

Differential Equations, Vector Networks, Backpropagation through Time

Christian Borgelt

Artificial Neural Networks and Deep Learning

5



Motivation: Why (Artificial) Neural Networks?

• (Neuro-)Biology / (Neuro-)Physiology / Psychology:
◦ Exploit similarity to real (biological) neural networks.
◦ Build models to understand nerve and brain operation by simulation.
• Computer Science / Engineering / Economics
◦ Mimic certain cognitive capabilities of human beings.
◦ Solve learning/adaptation, prediction, and optimization problems.
• Physics / Chemistry
◦ Use neural network models to describe physical phenomena.
◦ Special case: spin glasses (alloys of magnetic and non-magnetic metals).

Christian Borgelt

Artificial Neural Networks and Deep Learning

6


Motivation: Why Neural Networks in AI?
Physical-Symbol System Hypothesis

[Newell and Simon 1976]

A physical-symbol system has the necessary and sufficient means
for general intelligent action.
Neural networks process simple signals, not symbols.
So why study neural networks in Artificial Intelligence?
• Symbol-based representations work well for inference tasks,

but are fairly bad for perception tasks.
• Symbol-based expert systems tend to get slower with growing knowledge,
human experts tend to get faster.
• Neural networks allow for highly parallel information processing.
• There are several successful applications in industry and finance.
Christian Borgelt

Artificial Neural Networks and Deep Learning

7


Biological Background

Diagram of a typical myelinated vertebrate motoneuron (source: Wikipedia, Ruiz-Villarreal 2007),
showing the main parts involved in its signaling activity like the dendrites, the axon, and the synapses.
Christian Borgelt

Artificial Neural Networks and Deep Learning

8


Biological Background

Structure of a prototypical biological neuron (simplified)
terminal button
synapse
dendrites


nucleus

cell body
(soma)

axon
myelin sheath

Christian Borgelt

Artificial Neural Networks and Deep Learning

9


Biological Background

(Very) simplified description of neural information processing
• Axon terminal releases chemicals, called neurotransmitters.
• These act on the membrane of the receptor dendrite to change its polarization.
(The inside is usually 70mV more negative than the outside.)
• Decrease in potential difference: excitatory synapse
Increase in potential difference: inhibitory synapse
• If there is enough net excitatory input, the axon is depolarized.
• The resulting action potential travels along the axon.
(Speed depends on the degree to which the axon is covered with myelin.)
• When the action potential reaches the terminal buttons,
it triggers the release of neurotransmitters.

Christian Borgelt


Artificial Neural Networks and Deep Learning

10


Recording the Electrical Impulses (Spikes)

pictures not available in online version

Christian Borgelt

Artificial Neural Networks and Deep Learning

11


Signal Filtering and Spike Sorting

picture not available
in online version

picture not available
in online version

Christian Borgelt

An actual recording of the electrical potential also contains the so-called local field
potential (LFP), which is dominated by
the electrical current flowing from all nearby

dendritic synaptic activity within a volume
of tissue. The LFP is removed in a preprocessing step (high-pass filtering, ∼300Hz).

Spikes are detected in the filtered signal with
a simple threshold approach. Aligning all
detected spikes allows us to distinguishing
multiple neurons based on the shape of their
spikes. This process is called spike sorting.

Artificial Neural Networks and Deep Learning

12


(Personal) Computers versus the Human Brain

processing units

storage capacity

Personal Computer

Human Brain

1 CPU, 2–10 cores
1010 transistors
1–2 graphics cards/GPUs,
103 cores/shaders
1010 transistors


1011 neurons

1010 bytes main memory (RAM) 1011 neurons
1012 bytes external memory
1014 synapses

processing speed 10−9 seconds
109 operations per second

Christian Borgelt

> 10−3 seconds
< 1000 per second

bandwidth

1012 bits/second

1014 bits/second

neural updates

106 per second

1014 per second

Artificial Neural Networks and Deep Learning

13



(Personal) Computers versus the Human Brain
• The processing/switching time of a neuron is relatively large (> 10−3 seconds),
but updates are computed in parallel.
• A serial simulation on a computer takes several hundred clock cycles per update.
Advantages of Neural Networks:
• High processing speed due to massive parallelism.
• Fault Tolerance:
Remain functional even if (larger) parts of a network get damaged.
• “Graceful Degradation”:
gradual degradation of performance if an increasing number of neurons fail.
• Well suited for inductive learning
(learning from examples, generalization from instances).
It appears to be reasonable to try to mimic or to recreate these advantages
by constructing artificial neural networks.
Christian Borgelt

Artificial Neural Networks and Deep Learning

14


Threshold Logic Units

Christian Borgelt

Artificial Neural Networks and Deep Learning

15



Threshold Logic Units
A Threshold Logic Unit (TLU) is a processing unit for numbers with n inputs
x1, . . . , xn and one output y. The unit has a threshold θ and each input xi is
associated with a weight wi. A threshold logic unit computes the function





y=



x1

n

1, if
i=1

0, otherwise.

w1
θ

xn

wixi ≥ θ,


y

wn

TLUs mimic the thresholding behavior of biological neurons in a (very) simple fashion.
Christian Borgelt

Artificial Neural Networks and Deep Learning

16


Threshold Logic Units: Examples
Threshold logic unit for the conjunction x1 ∧ x2.

x1

3
4

x2

y

2

x1 x2 3x1 + 2x2 y
0 0
0
0

1 0
3
0
0 1
2
0
1 1
5
1

Threshold logic unit for the implication x2 → x1.

x1

2
−1

x2 −2
Christian Borgelt

y

x1 x2 2x1 − 2x2 y
0 0
0
1
1 0
2
1
0 1

−2
0
1 1
0
1

Artificial Neural Networks and Deep Learning

17


Threshold Logic Units: Examples
Threshold logic unit for (x1 ∧ x2) ∨ (x1 ∧ x3) ∨ (x2 ∧ x3).

x1

2

x2 −2
x3

2

1

y

x1 x2 x3
0 0 0
1 0 0

0 1 0
1 1 0
0 0 1
1 0 1
0 1 1
1 1 1

i w i xi

0
2
−2
0
2
4
0
2

y
0
1
0
0
1
1
0
1

Rough Intuition:
• Positive weights are analogous to excitatory synapses.

• Negative weights are analogous to inhibitory synapses.
Christian Borgelt

Artificial Neural Networks and Deep Learning

18


Threshold Logic Units: Geometric Interpretation
Review of line representations
Straight lines are usually represented in one of the following forms:
Explicit Form:
Implicit Form:

g ≡ x2 = bx1 + c
g ≡ a1x1 + a2x2 + d = 0

Point-Direction Form: g ≡ x = p + kr
Normal Form:

g ≡ (x − p)⊤n = 0

with the parameters:
b:

Gradient of the line

c:
p:


Section of the x2 axis (intercept)
Vector of a point of the line (base vector)

r:

Direction vector of the line

n : Normal vector of the line
Christian Borgelt

Artificial Neural Networks and Deep Learning

19


Threshold Logic Units: Geometric Interpretation

A straight line and its defining parameters:

x2

b=

r2
r1

c
q=

ϕ


r



n = (a1 , a2 )

p n n
|n| |n|

p

g
d = −p⊤n
x1

O

Christian Borgelt

Artificial Neural Networks and Deep Learning

20


Threshold Logic Units: Geometric Interpretation

How to determine the side on which a point y lies:

x2


z=

z

y ⊤n n
|n| |n|

n = (a1 , a2 )
q=

ϕ

p⊤n n
|n| |n|

g

y
x1

O

Christian Borgelt

Artificial Neural Networks and Deep Learning

21



Threshold Logic Units: Geometric Interpretation
Threshold logic unit for x1 ∧ x2.

x1

1

3

1

4

x2

y

0

x2
0

2

0

x1

1


Threshold logic unit for x2 → x1.

x1

0

1

2
−1

y

1

x2
0

x2 −2
0
Christian Borgelt

Artificial Neural Networks and Deep Learning

x1

1
22



Threshold Logic Units: Geometric Interpretation

(1, 1, 1)

Visualization of 3-dimensional
Boolean functions:

x3

x2
x1
(0, 0, 0)

Threshold logic unit for (x1 ∧ x2) ∨ (x1 ∧ x3) ∨ (x2 ∧ x3).

x1

2

x2

−2

x3

2

x3

Christian Borgelt


1

y

x2
x1

Artificial Neural Networks and Deep Learning

23


Threshold Logic Units: Limitations
The biimplication problem x1 ↔ x2: There is no separating line.

1

x1 x2 y
0 0 1
1 0 0
0 1 0
1 1 1

x2
0
0

x1


1

Formal proof by reductio ad absurdum:
since (0, 0) → 1:

0

≥ θ,

(1)

since (1, 0) → 0:

w1

< θ,

(2)

w2

< θ,

(3)

since (1, 1) → 1:

w1 + w2 ≥ θ.

(4)


since (0, 1) → 0:

(2) and (3): w1 + w2 < 2θ. With (4): 2θ > θ, or θ > 0. Contradiction to (1).
Christian Borgelt

Artificial Neural Networks and Deep Learning

24


Linear Separability
Definition: Two sets of points in a Euclidean space are called linearly separable,
iff there exists at least one point, line, plane or hyperplane (depending on the dimension
of the Euclidean space), such that all points of the one set lie on one side and all points
of the other set lie on the other side of this point, line, plane or hyperplane (or on it).
That is, the point sets can be separated by a linear decision function. Formally:
Two sets X, Y ⊂ IRm are linearly separable iff w ∈ IRm and θ ∈ IR exist such that
∀x ∈ X :

w⊤x < θ

and

∀y ∈ Y :

w⊤y ≥ θ.

• Boolean functions define two points sets, namely the set of points that are
mapped to the function value 0 and the set of points that are mapped to 1.

⇒ The term “linearly separable” can be transferred to Boolean functions.
• As we have seen, conjunction and implication are linearly separable
(as are disjunction, NAND, NOR etc.).
• The biimplication is not linearly separable
(and neither is the exclusive or (XOR)).
Christian Borgelt

Artificial Neural Networks and Deep Learning

25


×