Finding minimal Neural Network for Business

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (742.38 KB, 50 trang )

Finding Minimal Neural Networks for
Business Intelligence Applications
Rud
y
Setiono
y
School of Computing
National University of Singapore
d/d
www.comp.nus.e
d
u.sg
/
~ru
d
ys
Outline
• Introduction
• Feed-forward neural networks
•
Neural
network
training
and
pruning
•
Neural
network
training
and
pruning

• Rule extraction
• Business intelligence applications
• Conclusion
•
References
References
• For discussion: Time-series data mining
2
using neural network rule extraction
Introduction
• BusinessIntelligence(BI):Asetofmathematicalmodelsandanalysis
methodologiesthatexploitavailabledatatogenerateinformationand
knowledgeusefulforcomplexdecision‐makingprocess.
•
Mathematical models and analysis methodologies for BI include various
•
Mathematical

models

and

analysis

methodologies

for

BI


include

various

inductivelearningmodelsfordataminingsuchasdecisiontrees,artificial
neuralnetworks,fuzzylogic,geneticalgorithms,supportvectormachines,
andintelligentagents.
3
Introduction
BI Analytical Applications include:
• Customersegmentation:Whatmarketsegmentsdomycustomersfallinto,
andwhataretheircharacteristics?
• Propensitytobuy:Whatcustomersaremostlikelytorespondtomy
promotion?
• Frauddetection:HowcanItellwhichtransactionsarelikelytobefraudulent?
Ct tt iti
Whi h t i t ik f li?
•
C
us
t
omera
tt
r
iti
on:
Whi
c
h
cus

t
omer
i
sa
t
r
i
s
k
o
f

l
eav
i
ng
?
• Creditscoring:Whichcustomerwillsuccessfullyrepayhisloan,willnot
defaultonhiscreditcardpayment?
•
Time
series prediction
4
•
Time
‐
series

prediction
.

Feed-forward neural networks
A feed-forward neural network with one hidden layer:
ibl l i
• Inputvar
i
a
bl
eva
l
uesareg
i
ven
totheinputunits.
• Thehiddenunitscom
p
utethe
p
activationvaluesusinginput
valuesandconnectionweight
valuesW.
• Thehiddenunitactivationsare
giventotheoutputunits.
• Decisionismadeattheoutput
layeraccordingtotheactivation
valuesoftheoutputunits.
5
Feed-forward neural networks
Hiddenunitactivation:
•
Compute the weighted input: w

1
x
1
+ w
2
x
2
+ …. + w
x
Compute

the

weighted

input:

w
1
x
1
+

w
2
x
2
+

….


+

w
n
x
n
• Applyanactivationfunctiontothisweightedinput,forexamplethelogistic
fif( ) 1/(1
)
f
unct
i
on
f(
x
)
=
1/(1
+e
‐x
)
:
6
Neural network training and pruning
Neuralnetworktraining:
• Findanoptimalweight(W,V).
• Minimizeafunctionthatmeasureshowwellthenetworkpredictsthedesired
outputs (class label)
outputs


(class

label)
• Errorinpredictionfori‐th sample:
e
= (desired output)
–
(predicted output)
e
i
=

(desire d

output)
i
–
(predicted

output)
i
• Sumofsquarederrorfunction:
∑
E(W,V)=
∑
e
i
2
• Cross‐entropyerrorfunction:

E(W,V)=‐ Σ d
i
logp
i
+(1‐ d
i
)log(1–p
i
)
d
is the desired output either 0 or 1
7
d
i
is

the

desired

output
,
either

0

or

1
.

Neural network training and pruning
Neuralnetworktraining:
•
Many optimization methods can be applied to find an optimal (W,V):
Many

optimization

methods

can

be

applied

to

find

an

optimal

(W,V):
o Gradientdescent/errorbackpropagation
o Conjugategradient
o QuasiNewtonmethod
o Geneticalgorithm
Nt ki id d ll ti dif it di t tii dt d

•
N
e
t
wor
k

i
scons
id
ere
d
we
ll

t
ra
i
ne
d

if

it
canpre
di
c
t

t

ra
i
n
i
ng
d
a
t
aan
d
cross‐
validationdata withacceptableaccuracy.
8
Neural network training and pruning
Neuralnetworkpruning:Removeirrelevant/redundantnetworkconnections
1. Initialization.
(a)LetWbethesetofnetworkconnectionsthatarestillpresentinthenetworkand
(b)letCbethesetofconnectionsthathavebeencheckedforpossibleremoval
(c) W corresponds to all the connections in the fully connected trained network and C is the empty set.
(c)

W

corresponds

to

all

the


connections

in

the

fully

connected

trained

network

and

C

is

the

empty

set.
2.Saveacopyoftheweightvaluesofallconnectionsinthenetwork.
3.Findw∈ Wandw– Csuchthatwhenitsweightvalueissetto0,theaccuracyofthenetworkisleastaffected.
4.Settheweightfornetworkconnectionw to0andretrainthenetwork.
5.Iftheaccuracyofthenetworkisstillsatisfactory,then

(a)Removew,i.e.setW:=W−{w}.
(b)ResetC:=∅.
(c) Go to Step 2.
(c)

Go

to

Step

2.
6.Otherwise,
(a)SetC:=C∪ {w}.
9
(b)RestorethenetworkweightswiththevaluessavedinStep2above.
(c)IfC≠W, gotoStep2.Otherwise,Stop.
Neural network training and pruning
PrunedneuralnetworkforLEDrecognition(1)
z
1
z
2
z
3
z
4
2
3
z

7
z
5
z
6
Howmanyhiddenunitsandnetworkconnectionsareneededtorecognizeall
d l?
7
ten
d
igitscorrect
l
y
?

10
Neural network training and pruning
PrunedneuralnetworkforLEDrecognition(2)
z
1
Rawdata
A neural network
z
1
z
2
z
3
z
4

z
5
z
6
z
7
Digit
1110111 0
0010010 1
1
0
1
1
1
0
1
2
A

neural

network

fordataanalysis
Processed
data
1
0
1
1

1
0
1
2
1011011 3
0111010 4
1
1
0
1
0
1
1
5
1
1
0
1
0
1
1
5
1101111 6
1010010 7
1
1
1
1
1
1

1
8
11
1
1
1
1
1
1
1
8
1111011 9
Neural network training and pruning
PrunedneuralnetworkforLEDrecognition(3)
diff d l k
Many
diff
erentprune
d
neura
l
networ
k
s
canrecognizedall10digitscorr ectly.
12
Part2.Noveltechniquesfordataanalysis
Neural network training and pruning
PrunedneuralnetworkforLEDrecognition(4):Whatdowelearn?
0

1
2
=
0
=
1
=
2
Mustbeon
Mustbeoff
Classificationrulescanbe
ttdf d tk
Doesn’tmatter
ex
t
rac
t
e
d

f
romprune
d
ne
t
wor
k
s.
13
Part2.Noveltechniquesfordataanalysis

Rule extraction
Re‐RX:analgorithmforruleextractionfromneuralnetworks
•
New
pedagocical
rule extraction algorithm: Re
‐
RX (
Re
cursive
R
ule E
x
traction)
New

pedagocical
rule

extraction

algorithm:

Re
RX

(
Re
cursive


R
ule

E
x
traction)
• Handlesmixofdiscrete/continuousvariableswithoutneedfordiscretization of
continuousvariables
– Discretevariables:propositionalruletreestructure
– Continuousvariables:hyperplane rulesatleafnodes
• Examplerule:
IfYearsClients<5andPurpose≠PrivateLoan,then
IfNumberofapplicants≥2andOwnsrealestate=yes,then
IfSavingsamount+1.11Income‐ 38249Insurance‐ 0.46Debt>‐1939300,then
Customer=goodpayer
Else…
Cbi h ibili d
14
•
C
om
bi
nescompre
h
ens
ibili
tyan
d
accuracy
Part2.Noveltechniquesfordataanalysis

Rule extraction
AlgorithmRe‐RX(
S
,D,C):
Input:AsetofsamplesS havingdiscreteattributesD andcontinuousattributesC
Output:Asetofclassificationrules
1. TrainandpruneaneuralnetworkusingthedatasetS andallitsattributesD andC.
2
Lt
D'
d
C'
b th t f di t d ti tt ib t till t i th tk
2
.
L
e
t

D'
an
d

C'
b
e
th
ese
t
so

f

di
scre
t
ean
d
con
ti
nuousa
tt
r
ib
u
t
ess
till
presen
t

i
n
th
ene
t
wor
k
,
respectively.LetS'bethesetofdatasamplesthatarecorrectlyclassifiedbythepruned
network.

f
'

h
hl
li h l i
'
di h l f hi
3. I
f
D
'
=

,t
h
engenera t ea
h
yperp
l
ane tosp
li
tt
h
esamp
l
es
i
nS
'

accor
di
ngtot
h
eva
l
ueso
f
t
h
e
i
r
continuousattributesC' andstop.Otherwise,usingonlydiscreteattributesD',generat etheset
ofclassificationrulesR forthedatasetS'.
4. ForeachruleR
i
generated:
Ifsupport(R
i
) >
1
anderror(R
i
)>
2
,then:
Let
S
be the set of data samples that satisfy the condition of rule

R
and
D
be the set of
–
Let

S
i
be

the

set

of

data

samples

that

satisfy

the

condition

of


rule

R
i
,
and

D
i
be

the

set

of

discreteattributesthatdonotappearintheruleconditionofR
i
– IfD
i
=,thengener a t eahyperplane tosplitthesamplesinS
i
accordingtothevaluesof
th i ti tt ib t
C
d t
15
th

e
i
rcon
ti
nuousa
tt
r
ib
u
t
es
C
i
an
d
s
t
op
Otherwise,callRe‐RX(S
i
,D
i
,C
i
)
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
• Oneofthekeydecisionsfinancialinstitutionshavetomake
istodecidewhetheror notto
g

rantcredittoacustomerwhoa
pp
liesforaloan.
g pp
• Theaimofcreditscoringistodevelopclassificationmodelsthatareableto
distinguish good from bad payers, based on the repayment behaviour of past
distinguish

good

from

bad

payers,

based

on

the

repayment

behaviour

of

past


applicants.
•
These models usually summarize all available information of an applicant in a score:
These

models

usually

summarize

all

available

information

of

an

applicant

in

a

score:
• P(applicantisgoodpayer | age,maritalstatus,savingsamount, …).
• Application scoring:ifthisscoreisaboveapredeterminedthreshold,creditisgranted;

otherwisecreditisdenied.
• Similarscoringmodelsarenowalso usedtoestimatethecreditriskofentireloan 
portfoliosinthecontextofBasel II.
16
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
• BaselIIcapitalaccord:frameworkregulatingminimum
capitalrequirementsforbanks.
Ct dt
dit ik
h h it l t
•
C
us
t
omer
d
a
t
a cre
dit
r
i
s
k
score
h
owmuc
h
cap

it
a
l

t
o
setasideforaportfolioofloans.
• Datacollectedfromvariousoperationalsystemsinthebank,
bd hi h idi ll dtd
b
ase
d
onw
hi
c
h
scoresareper
i
o
di
ca
ll
yup
d
a
t
e
d
.
• Banksarere

q
uiredtodemonstrateand
p
eriodicall
y
validate
q py
theirscoringmodels,andreporttothenationalregulator.
17
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
Experiment1:CARDdatasets.
•
The 3 CARD datasets:
•
The

3

CARD

datasets:
Dataset Training set Testset Total
Class0Class1 Class 0 Class 1 Class0 Class1
CARD1 291 227 92 80 383 307
CARD1 284 234 99 73 383 307
CARD3 290 228 93 79 383 307
• Originalinput:6continuousattributesand9discreteattributes
• In
p

utaftercodin
g
:C
4
,
C
6
,
C
41
,
C
44
,
C
49
,
andC
51
p
lusbinar
y
‐valued
p g
4
,
6
,
41
,

44
,
49
,
51
p y
attributesD
1
,D
2
,D
3
,D
5
,D
7
,…,D
40
,D
42
,D
43
,D
45
,D
46
,D
47
,D
48

,andD
50
18
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
Experiment1:CARDdatasets.
l k f h f h d d
• 30neura
l
networ
k
s
f
oreac
h
o
f
t
h
e
d
atasetsweretraine
d
• Neuralnetworkstartshasonehiddenneuron.
• Thenumberofinputneurons,includingonebiasinputwas52
• Theinitialweightsofthenetworkswererandomlyand
uniformly generated in the interval [
−
1 1]
uniformly


generated

in

the

interval

[ 1
,
1]
• Inadditiontotheaccuracyrates,theAreaundertheReceiver
OperatingCharacteristic(ROC)Curve(AUC)isalsocomputed.
19
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
Experiment1:CARDdatasets.
•
Where
α
are the predicted outputs for Class 1 samples i 12
•
Where

α
i
are

the


predicted

outputs

for

Class

1

samples

i
=
1
,
2
,
…mandβ
j
arepredictedoutputforClass0samples,j=1,2,…n.
• AUCisamoreappropriateperformancemeasurethanACC
when the class distribution is
skewed
20
when

the


class

distribution

is

skewed
.
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
Experiment1:CARDdatasets.
Dataset #connections ACC(θ
1
)AUC
d
(θ
1
) ACC(θ
2
)AUC
d
(θ
2
)
CARD1(TR) 9.13±0.94 88.38±0.56 87.98±0.32 86.80±0.90 86.03±1.04
CARD1(TS) 87.79±0.57 87.75±0.43 88.35±0.56 88.16±0.48
CARD2(TR) 7.17±0.38 88.73±0.56 88.72±0.57 86.06±1.77 85.15±2.04
CARD2(TS) 81.76±1.28 82.09±0.88 85.17±0.37 84.25±0.55
CARD3(TR)
757

±
063
88 02
±
051
88 02
±
069
86 48
±
107
87 07
±
060
CARD3(TR)
7
.
57

±
0
.
63

88
.
02

±
0

.
51
88
.
02

±
0
.
69

86
.
48

±
1
.
07

87
.
07

±
0
.
60
CARD3(TS) 84.67±2.45 84.28±2.48 87.15±0.88 87.15±0.85
• θ isthecut‐offpointforneuralnetworkclassification:ifoutputisgreaterthanθ,thanpredict

Class1,elsepredictClass0.
• θ
1
andθ
2
arecut‐offpointsselectedtomaximizetheaccuracyonthetrainingdataandthetest
datasets,respectively.
21
• AUC
d
=AUCforthediscreteclassifier=(1
–
fp +tp)/2
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
Experiment1:CARDdatasets.
• Oneprunedneuralnetworkwasselectedforruleextractionforeachofthe3CARDdatasets:
i
C ()
C
(S)
d
i
Dataset #connect
i
ons AU
C

(
TR

)
AU
C
(
T
S)
Unprune
d
i
nputs
CARD1 8 93.13% 92.75% D
12
,D
13
,D
42
,D
43
,C
49
,C
51
CARD2
9
93 16%
89 36%
D
D
D
D

D
C
C
CARD2
9
93
.
16%
89
.
36%
D
7
,
D
8
,
D
29
,
D
42
,
D
44
,
C
49
,
C

51
CARD3 7 93.20% 89.11% D
42
,D
43
,D
47
,C
49
,C
51
• Errorratecomparisonversusothermethods:
Methods CARD1 CARD2 CARD3
GeneticAlgorithm 12.56 17.85 14.65
NN(other) 13.95 18.02 18.02
NeuralWorks
14 07
18 37
15 13
NeuralWorks
14
.
07
18
.
37
15
.
13
NeuroShell 12.73 18.72 15.81

PrunedNN
(
θ
1
)
12.21 18.24 15.33
22
(
1
)
PrunedNN(θ
2
) 11.65 14.83 12.85
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
Experiment1:CARDdatasets.
• Neuralnetworkswithjustonehiddenunitandveryfewconnectionsoutperformmorecomplex
l tk!
neura
l
ne
t
wor
k
s
!
• Rulecanbeextractedtoprovidemoreunderstandingabouttheclassification.
•
Rules for CARD1 from Re
‐

RX:
•
Rules

for

CARD1

from

Re
‐
RX:
 RuleR
1
:IfD
12
=1andD
42
=0,thenpredictClass0,

Rule R
: else if D
= 1 and D
= 0 then predict Class 0

Rule

R
2

:

else

if

D
13
=

1

and

D
42
=

0
,
then

predict

Class

0
,
 RuleR
3

:elseifD
42
=1andD
43
=1,thenpredictClass1,
 RuleR
4
:elseifD
12
=1andD
42
=1,thenClass0,
o RuleR
4a
:IfR
49
−0.503R
51
>0.0596,thenpredictClass0,else
o RuleR
4b
:predictClass1,
 RuleR
5
:elseifD
12
=0andD
13
=0,thenpredictClass1,
 RuleR

6
:elseifR
51
=0.496,thenpredictClass1,

Rule R
: else predict Class 0
23

Rule

R
7
:

else

predict

Class

0
.
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
Experiment1:CARDdatasets.
• RulesforCARD2:
 RuleR
1
:IfD

7
=1andD
42
=0,thenpredictClass0,
 RuleR
2
:elseifD
8
=1andD
42
=0,thenpredictClass0,
2
8
42
 RuleR
3
:elseifD
7
=1andD
42
=1,thenClass1
 RuleR
3a
:ifI
29
=0,thenClass1
 RuleR
3a−i
:ifC
49

−0.583C
51
<0.061,thenpredictClass1,
 RuleR
3a−ii
:elsepredictClass0,
 RuleR
3b
:elseClass0
 RuleR
3b−i
:ifC
49
−0.583C
51
<−0.274,thenpredictClass1,
 RuleR
3b−ii
:elsepredictClass 0.
 RuleR
4
:elseifD
7
=0andD
8
=0,thenpredictClass0,

Rl R
l di t Cl 0
24


R
u
l
e
R
5
:e
l
sepre
di
c
t

Cl
ass
0
.
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
Experiment1:CARDdatasets.
• RulesforCARD3:
 RuleR
1
:IfD
42
=0,thenClass1

Rule R
1

: if C
51
> 1.000, then predict Class 1,

Rule

R
1
a
:

if

C
51
>

1.000,

then

predict

Class

1,
 RuleR
1b
:elsepredictClass0,


Rule R
: else Class 1

Rule

R
2
:

else

Class

1
 RuleR
2a
:ifD
43
=0,thenClass1

l
f
h d l

Ru
l
eR
2a−i
:i
f

C
49
−0.496C
51
<0.0551,t
h
enpre
d
ictC
l
ass1,
 RuleR
2a−ii
:elsepredictClass0,
 RuleR
2b
:elseClass0
 RuleR
2b−i
:ifC
49
−0.496C
51
<2.6525,thenpredictClass1,
25
 RuleR
2b−ii
:elsepredictClass0,

Finding minimal Neural Network for Business

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về