Tải bản đầy đủ (.pdf) (48 trang)

nn rule extraction

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (990.1 KB, 48 trang )

RuleExtractionfromArtificial
NeuralNetworks
RudySetiono
SchoolofComputing
NationalUniversityofSingapore
1
Outline
1. Thetrainproblem
2.
Motivations
2.
Motivations
3. Feedforward neuralnetworksforclassification
4
Rl ttif l tk
4
.
R
u
l
eex
t
rac
ti
on
f
romneura
l
ne
t
wor


k
s
5. Examples
6. Differenttypesofclassificationrules
7.
Regression rules
7.
Regression

rules
8. Hierarchicalrules:TheReRX algorithm
9
Cli
9
.
C
onc
l
us
i
ons
2
1.Thetrainproblem
Westbound trains Eastbound trains
scribe which trains are east/westbound?
Attributes of a train:
 lon
g
cars can onl
y

be rectan
g
ular
,
and if closed then their roofs are
gyg,
either jagged or flat
 if a short car is rectangular then it is also double side
d
 a short closed rectan
g
ular car can have either a flat or
p
eaked roof
3
gp
Thetrainproblem
Westbound trains Eastbound trains
Att ib t f t i
Att
r
ib
u
t
es o
f
a
t
ra
i

n:
 a long car can have either two or three axels
 a car can be either open or closed
 a train has 2,3 or 4 cars, each can be either short or long

4

…….
Thetrainproblem
Westbound trains
Eastbound trains
Answers
:
Answers
:
 if a train has short closed car, then it is westbound, otherwise eastbound
 if a train has two cars, or has a car with a jagged roof, then it is eastbound,
otherwise westbound.
 and many others
5
All the above rules can be obtained by neural networks!
2.Motivations
• Neuralnetworkshavebeenappliedtosolvemanyapplicationproblems
involving
‐ patternclassification
‐ functionapproximation/datafitting
‐ dataclustering
• Theyoftengivebetterpredictiveaccuracythanothermethodssuchas
regressionordecisiontrees.
• Dataminin

g
usin
g
neuralnetworks:: ifwecanextractrulesfromatrained
g g
network,abetterunderstandingaboutthedataandtheproblemcanbe
gained
gained
.
• Howtoextractsuchrules?
6
3.Feedforward neuralnetworksforpatternclassification
• Dataisfedintothenetworkinputunits
• Patternclassificationisdeterminedbytheoutputunitwiththe
largestoutputvalue.
• Unitsinthehiddenlayerallowthenetworktoseparateany
number of disjoint sets
7
number

of

disjoint

sets
.
Networkhiddenunits
Foreachunit:
I
N

is usually
fi d I
1
fi
xe
d
,
I
N
=
1
• Sumoftheweightedinputs iscomputed:
net=I
t
w
• Anonlinearfunctionisappliedtoobtainedtheunit’sactivation
l
va
l
ue:
o=f(net)

This acti ation fnctionis sall the logistic sigmoid fnction
8

This

acti
v
ation


f
u
nction

is
u
s
u
all
y
the

logistic

sigmoid

f
u
nction

(unipolar)orthehyperbolictangentfunction(bipolar).
Hyperbolictangentfunction
• Thefunctionisusedtoapproximatetheon‐offfunction.
• Thesumoftheweight edinputslarge outputiscloseto1(on).
• Thesumoftheweight edinputssmall outputiscloseto‐1(off).
• Differentiable:
f'(net)=(1‐o
2
)/2

whereo=f(net)

Derivative is largest when o = 0 that is when net = 0
9

Derivative

is

largest

when

o

=

0
,
that

is

when

net

=

0

• andapproaches0as|net|becomeslarge.
Neuralnetworktraining
• Givenasetofdata,minimisethetotalerrors:
Σ
i
(
target

predicted
)
2
Σ
i
(
target
i

predicted
i
)

• Supervisedlearning.
• Nonlinearoptimisationproblem:findasetofneuralnetworkweightsthat
minimisesthetotalerrors.
• Optimisationmethodsused:backpropagation/gradientdescent,quasi‐Newton
method,conjugategradientmethod,etc.
• Apenaltytermisusuallyaddedtotheerrorfunctionsothatredundantconnections
havesmall/zeroweights.
• Anexampleofanaugmentederrorfunction:
Σ

i
(target
i

predict
i
)
2
+CΣ
j
w
j
2
• N = # of samples
• K = # of weights
10
i
j
j
• C is a penalty parameter
Neuralnetworkpruning
• Afteranetworkhasbeentrained,redundantconnectionsandunitsare
removedbypruning.
• Prunednetworksgeneralisebetter:theycanpredictnewpatternsbetter
thanfullyconnectednetworks.
Si l lifii l b d f kl l d

Si
mp
l

ec
l
ass
ifi
cat
i
onru
l
escan
b
eextracte
d

f
roms
k
e
l
eta
l
prune
d

networks.
• Variousmethodsfornetworkpruningcanbefoundintheliterature.
11
Neuralnetworkpruning
A Simple Pruning Algorithm:
1 St t ith t i d f ll t d t k
1

.
St
ar
t
w
ith
a
t
ra
i
ne
d

f
u
ll
y connec
t
e
d
ne
t
wor
k
.
2. Identify potential connection for pruning (for example, one with small
magnitude).
3 Set the weight of this connection to 0
3
.

Set

the

weight

of

this

connection

to

0
.
4. Retrain the network (if necessary).
5. If the network still meets the required accuracy, go to step 2.
6. Otherwise, restore the removed connection and its corresponding weight.
Stop.
12
4.Ruleextractionfromneuralnetworks
1. Trainandpruneanetworkwithasinglehiddenlayer
2.
Cluster the hidden unit activation values:
2.
Cluster

the


hidden

unit

activation

values:
‐ Originalactivationvaluesliesin[‐1,1]

Clustering implies dividing this interval into subintervals
Clustering

implies

dividing

this

interval

into

subintervals
,
forexample[‐1,‐0.8),[‐0.8,0.5),[0.5,1]
An algorithm is needed to ensure the network does not lose its accuracy

An

algorithm


is

needed

to

ensure

the

network

does

not

lose

its

accuracy
3. Generateclassificationrulesintermsofclusteredactivationvalues
4
Gtl hi h li th lt d ti ti l i t f th
4
.
G
enera
t

eru
l
esw
hi
c
h
exp
l
a
i
n
th
ec
l
us
t
ere
d
ac
ti
va
ti
onva
l
ues
i
n
t
ermso
f


th
e
inputdataattributes
5
M th t t f l
5
.
M
erge
th
e
t
wose
t
so
f
ru
l
es
Decompositional approach!
13
Ruleextractionbydecompositional approach
Outputlayer
Hiddenlayer
Inputlayer
14
5.Example:Irisclassificationproblem
• 150instances.


4 continuous attributes: sepal length, sepal width,
4

continuous

attributes:

sepal

length,

sepal

width,

petallength,petalwidth.

Three different iris flowers:

Three

different

iris

flowers:

setosa versicolor virginica
15
Anetworkwith2hiddenunits

setosa versicolor vir
g
inica
• Threeclassproblem:3outputunits.
• Fourinputattributes:4inputunits+1forbias.
• Thenetworkhasonly2hiddenunitsand10connectionsafterpruning.
• Itcorrectlyclassifiesallbutonetrainingpattern.
16
• 2‐dimensionalplotoftheactivationvalues.
Scatteredplotofhiddenunitactivations
HH
22
setosa
virginica
versicolor
H
1
versicolor
Ruleintermsofthehiddenunitactivations:
IfH
1
 ‐0.7:Irissetosa
ElseifH
2

 ‐0.55:Irisversicolor
Else:Irisvirginica
17
Irisclassificationrules
Ifpetallength 2.23,thenIrissetosa.

El if
H
1
> - 0.7
El
se
if

3.57petallength+3.56petalwidth‐ sepallength‐ 1.57sepalwidth
12.63,thenIrisversicolor.
ElseIrisvirginica.
setosa
versicolor
virginica
H
<=
055
H
2
<=
-
0
.
55
18
Example:Breastcancerdiagnosis
• Ninemeasurementstakenfromfineneedleaspiratesofhumanbreast
tissues:clumpthickness,uniformityofcellsize,uniformityofcell
shape, etc.
shape,


etc.
• Eachmeasurementintegervalued0to10.

458 benign samples and 241 malignant samples from 699 patients

458

benign

samples

and

241

malignant

samples

from

699

patients
.
• Dataissplitinto350trainingsamplesand349testsamples.
100 l tk ti d

100

neura
l
ne
t
wor
k
swere
t
ra
i
ne
d
:
‐ Original numberofhiddenunits:5
‐ Original numberofconnections:460
• Afterpruning:
‐ Averagenumberofconnections:10.70
‐ Averagepredictiveaccuracy:92.70%.
19
BreastCancerDiagnosis:Example1
o Extractedrules:
If
uniformity of cell size

4
If

uniformity

of


cell

size


4

andbarenuclei≤ 5, then
benign
benign
.
Elsemalignant.
benign malignant
o Predictiveaccuracy:93.98%.
bias
20
cell size ≤ 4 bare nuclei ≤ 5
BreastCancerDiagnosis:Example2
o If
clump thickness

6
benign malignant
clump

thickness


6

,
blandchromatin≤ 3,and
normal nucleoli

9
then benign
normal

nucleoli


9
,
then

benign
.
Elsemalignant.
o Predictiveaccuracy:93.12%.
clump thickness ≤ 6
bland chromatin ≤ 3
normal nucleoli ≤ 9
21
Example:Applicationtohepatobiliary disorders
• Datacollectedfrom536patientsinaJapanesehospitals.
• Ninereal‐valuedmeasurementsobtainedfrombiomedicaltests.
• Patientsarediagnosedashaving oneofthe4liverdisorders(ALD,
PH,LC,orC).
• Accuracyfromdifferentmethods:
Li

Fl
Nltk

Li
near
discriminant
analysis
F
uzzy neura
l

networks
N
eura
l
ne
t
wor
k

extracted rules
ALD 57.6% 69.7% 87.9%
PH 64.7% 82.4% 92.2%
LC
65.7%
71.4%
80.0%
LC
65.7%
71.4%


80.0%
C 63.6% 81.8% 90.9%
Ttl
63 2%
77 3%
88 3%
22
T
o
t
a
l
63
.
2%
77
.
3%

88
.
3%


Example:LEDdisplayrecognition
AnLED(LightEmittingDiode)deviceanddigits0,1, 9:
z
1
1

z
2
z
3
z
4
z
z
z
7
z
5
z
6
23
Example:LEDdisplayrecognition
= 0
=

0
Mustbeon
Mustbeoff
Doesn

t matter
24
Doesn t

matter
Example:LEDdisplayrecognition

= 1
=

1
Mustbeon
Mustbeoff
Doesn

t matter
25
Doesn t

matter

Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay
×