Tải bản đầy đủ (.pdf) (50 trang)

Finding minimal Neural Network for Business

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (742.38 KB, 50 trang )

Finding Minimal Neural Networks for
Business Intelligence Applications
Rud
y
Setiono
y
School of Computing
National University of Singapore
d/d
www.comp.nus.e
d
u.sg
/
~ru
d
ys
Outline
• Introduction
• Feed-forward neural networks

Neural
network
training
and
pruning

Neural
network
training
and
pruning


• Rule extraction
• Business intelligence applications
• Conclusion

References
References
• For discussion: Time-series data mining
2
using neural network rule extraction
Introduction
• BusinessIntelligence(BI):Asetofmathematicalmodelsandanalysis
methodologiesthatexploitavailabledatatogenerateinformationand
knowledgeusefulforcomplexdecision‐makingprocess.

Mathematical models and analysis methodologies for BI include various

Mathematical

models

and

analysis

methodologies

for

BI


include

various

inductivelearningmodelsfordataminingsuchasdecisiontrees,artificial
neuralnetworks,fuzzylogic,geneticalgorithms,supportvectormachines,
andintelligentagents.
3
Introduction
BI Analytical Applications include:
• Customersegmentation:Whatmarketsegmentsdomycustomersfallinto,
andwhataretheircharacteristics?
• Propensitytobuy:Whatcustomersaremostlikelytorespondtomy
promotion?
• Frauddetection:HowcanItellwhichtransactionsarelikelytobefraudulent?
Ct tt iti
Whi h t i t ik f li?

C
us
t
omera
tt
r
iti
on:
Whi
c
h
cus

t
omer
i
sa
t
r
i
s
k
o
f

l
eav
i
ng
?
• Creditscoring:Whichcustomerwillsuccessfullyrepayhisloan,willnot
defaultonhiscreditcardpayment?

Time
series prediction
4

Time

series

prediction
.

Feed-forward neural networks
A feed-forward neural network with one hidden layer:
ibl l i
• Inputvar
i
a
bl
eva
l
uesareg
i
ven
totheinputunits.
• Thehiddenunitscom
p
utethe
p
activationvaluesusinginput
valuesandconnectionweight
valuesW.
• Thehiddenunitactivationsare
giventotheoutputunits.
• Decisionismadeattheoutput
layeraccordingtotheactivation
valuesoftheoutputunits.
5
Feed-forward neural networks
Hiddenunitactivation:

Compute the weighted input: w

1
x
1
+ w
2
x
2
+ …. + w
x
Compute

the

weighted

input:

w
1
x
1
+

w
2
x
2
+

….


+

w
n
x
n
• Applyanactivationfunctiontothisweightedinput,forexamplethelogistic
fif( ) 1/(1
)
f
unct
i
on
f(
x
)
=
1/(1
+e
‐x
)
:
6
Neural network training and pruning
Neuralnetworktraining:
• Findanoptimalweight(W,V).
• Minimizeafunctionthatmeasureshowwellthenetworkpredictsthedesired
outputs (class label)
outputs


(class

label)
• Errorinpredictionfori‐th sample:
e
= (desired output)

(predicted output)
e
i
=

(desire d

output)
i

(predicted

output)
i
• Sumofsquarederrorfunction:

E(W,V)=

e
i
2
• Cross‐entropyerrorfunction:

E(W,V)=‐ Σ d
i
logp
i
+(1‐ d
i
)log(1–p
i
)
d
is the desired output either 0 or 1
7
d
i
is

the

desired

output
,
either

0

or

1
.

Neural network training and pruning
Neuralnetworktraining:

Many optimization methods can be applied to find an optimal (W,V):
Many

optimization

methods

can

be

applied

to

find

an

optimal

(W,V):
o Gradientdescent/errorbackpropagation
o Conjugategradient
o QuasiNewtonmethod
o Geneticalgorithm
Nt ki id d ll ti dif it di t tii dt d


N
e
t
wor
k

i
scons
id
ere
d
we
ll

t
ra
i
ne
d

if

it
canpre
di
c
t

t

ra
i
n
i
ng
d
a
t
aan
d
cross‐
validationdata withacceptableaccuracy.
8
Neural network training and pruning
Neuralnetworkpruning:Removeirrelevant/redundantnetworkconnections
1. Initialization.
(a)LetWbethesetofnetworkconnectionsthatarestillpresentinthenetworkand
(b)letCbethesetofconnectionsthathavebeencheckedforpossibleremoval
(c) W corresponds to all the connections in the fully connected trained network and C is the empty set.
(c)

W

corresponds

to

all

the


connections

in

the

fully

connected

trained

network

and

C

is

the

empty

set.
2.Saveacopyoftheweightvaluesofallconnectionsinthenetwork.
3.Findw∈ Wandw– Csuchthatwhenitsweightvalueissetto0,theaccuracyofthenetworkisleastaffected.
4.Settheweightfornetworkconnectionw to0andretrainthenetwork.
5.Iftheaccuracyofthenetworkisstillsatisfactory,then

(a)Removew,i.e.setW:=W−{w}.
(b)ResetC:=∅.
(c) Go to Step 2.
(c)

Go

to

Step

2.
6.Otherwise,
(a)SetC:=C∪ {w}.
9
(b)RestorethenetworkweightswiththevaluessavedinStep2above.
(c)IfC≠W, gotoStep2.Otherwise,Stop.
Neural network training and pruning
PrunedneuralnetworkforLEDrecognition(1)
z
1
z
2
z
3
z
4
2
3
z

7
z
5
z
6
Howmanyhiddenunitsandnetworkconnectionsareneededtorecognizeall
d l?
7
ten
d
igitscorrect
l
y
?

10
Neural network training and pruning
PrunedneuralnetworkforLEDrecognition(2)
z
1
Rawdata
A neural network
z
1
z
2
z
3
z
4

z
5
z
6
z
7
Digit
1110111 0
0010010 1
1
0
1
1
1
0
1
2
A

neural

network

fordataanalysis
Processed
data
1
0
1
1

1
0
1
2
1011011 3
0111010 4
1
1
0
1
0
1
1
5
1
1
0
1
0
1
1
5
1101111 6
1010010 7
1
1
1
1
1
1

1
8
11
1
1
1
1
1
1
1
8
1111011 9
Neural network training and pruning
PrunedneuralnetworkforLEDrecognition(3)
diff d l k
Many
diff
erentprune
d
neura
l
networ
k
s
canrecognizedall10digitscorr ectly.
12
Part2.Noveltechniquesfordataanalysis
Neural network training and pruning
PrunedneuralnetworkforLEDrecognition(4):Whatdowelearn?
0

1
2
=
0
=
1
=
2
Mustbeon
Mustbeoff
Classificationrulescanbe
ttdf d tk
Doesn’tmatter
ex
t
rac
t
e
d

f
romprune
d
ne
t
wor
k
s.
13
Part2.Noveltechniquesfordataanalysis

Rule extraction
Re‐RX:analgorithmforruleextractionfromneuralnetworks

New
pedagocical
rule extraction algorithm: Re

RX (
Re
cursive
R
ule E
x
traction)
New

pedagocical
rule

extraction

algorithm:

Re
RX

(
Re
cursive


R
ule

E
x
traction)
• Handlesmixofdiscrete/continuousvariableswithoutneedfordiscretization of
continuousvariables
– Discretevariables:propositionalruletreestructure
– Continuousvariables:hyperplane rulesatleafnodes
• Examplerule:
IfYearsClients<5andPurpose≠PrivateLoan,then
IfNumberofapplicants≥2andOwnsrealestate=yes,then
IfSavingsamount+1.11Income‐ 38249Insurance‐ 0.46Debt>‐1939300,then
Customer=goodpayer
Else…
Cbi h ibili d
14

C
om
bi
nescompre
h
ens
ibili
tyan
d
accuracy
Part2.Noveltechniquesfordataanalysis

Rule extraction
AlgorithmRe‐RX(
S
,D,C):
Input:AsetofsamplesS havingdiscreteattributesD andcontinuousattributesC
Output:Asetofclassificationrules
1. TrainandpruneaneuralnetworkusingthedatasetS andallitsattributesD andC.
2
Lt
D'
d
C'
b th t f di t d ti tt ib t till t i th tk
2
.
L
e
t

D'
an
d

C'
b
e
th
ese
t
so

f

di
scre
t
ean
d
con
ti
nuousa
tt
r
ib
u
t
ess
till
presen
t

i
n
th
ene
t
wor
k
,
respectively.LetS'bethesetofdatasamplesthatarecorrectlyclassifiedbythepruned
network.

f
'

h
hl
li h l i
'
di h l f hi
3. I
f
D
'
=

,t
h
engenera t ea
h
yperp
l
ane tosp
li
tt
h
esamp
l
es
i
nS
'

accor
di
ngtot
h
eva
l
ueso
f
t
h
e
i
r
continuousattributesC' andstop.Otherwise,usingonlydiscreteattributesD',generat etheset
ofclassificationrulesR forthedatasetS'.
4. ForeachruleR
i
generated:
Ifsupport(R
i
) >
1
anderror(R
i
)>
2
,then:
Let
S
be the set of data samples that satisfy the condition of rule

R
and
D
be the set of

Let

S
i
be

the

set

of

data

samples

that

satisfy

the

condition

of


rule

R
i
,
and

D
i
be

the

set

of

discreteattributesthatdonotappearintheruleconditionofR
i
– IfD
i
=,thengener a t eahyperplane tosplitthesamplesinS
i
accordingtothevaluesof
th i ti tt ib t
C
d t
15
th

e
i
rcon
ti
nuousa
tt
r
ib
u
t
es
C
i
an
d
s
t
op
Otherwise,callRe‐RX(S
i
,D
i
,C
i
)
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
• Oneofthekeydecisionsfinancialinstitutionshavetomake
istodecidewhetheror notto
g

rantcredittoacustomerwhoa
pp
liesforaloan.
g pp
• Theaimofcreditscoringistodevelopclassificationmodelsthatareableto
distinguish good from bad payers, based on the repayment behaviour of past
distinguish

good

from

bad

payers,

based

on

the

repayment

behaviour

of

past


applicants.

These models usually summarize all available information of an applicant in a score:
These

models

usually

summarize

all

available

information

of

an

applicant

in

a

score:
• P(applicantisgoodpayer | age,maritalstatus,savingsamount, …).
• Application scoring:ifthisscoreisaboveapredeterminedthreshold,creditisgranted;

otherwisecreditisdenied.
• Similarscoringmodelsarenowalso usedtoestimatethecreditriskofentireloan 
portfoliosinthecontextofBasel II.
16
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
• BaselIIcapitalaccord:frameworkregulatingminimum
capitalrequirementsforbanks.
Ct dt
dit ik
h h it l t

C
us
t
omer
d
a
t
a cre
dit
r
i
s
k
score
h
owmuc
h
cap

it
a
l

t
o
setasideforaportfolioofloans.
• Datacollectedfromvariousoperationalsystemsinthebank,
bd hi h idi ll dtd
b
ase
d
onw
hi
c
h
scoresareper
i
o
di
ca
ll
yup
d
a
t
e
d
.
• Banksarere

q
uiredtodemonstrateand
p
eriodicall
y
validate
q py
theirscoringmodels,andreporttothenationalregulator.
17
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
Experiment1:CARDdatasets.

The 3 CARD datasets:

The

3

CARD

datasets:
Dataset Training set Testset Total
Class0Class1 Class 0 Class 1 Class0 Class1
CARD1 291 227 92 80 383 307
CARD1 284 234 99 73 383 307
CARD3 290 228 93 79 383 307
• Originalinput:6continuousattributesand9discreteattributes
• In
p

utaftercodin
g
:C
4
,
C
6
,
C
41
,
C
44
,
C
49
,
andC
51
p
lusbinar
y
‐valued
p g
4
,
6
,
41
,

44
,
49
,
51
p y
attributesD
1
,D
2
,D
3
,D
5
,D
7
,…,D
40
,D
42
,D
43
,D
45
,D
46
,D
47
,D
48

,andD
50
18
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
Experiment1:CARDdatasets.
l k f h f h d d
• 30neura
l
networ
k
s
f
oreac
h
o
f
t
h
e
d
atasetsweretraine
d
• Neuralnetworkstartshasonehiddenneuron.
• Thenumberofinputneurons,includingonebiasinputwas52
• Theinitialweightsofthenetworkswererandomlyand
uniformly generated in the interval [

1 1]
uniformly


generated

in

the

interval

[ 1
,
1]
• Inadditiontotheaccuracyrates,theAreaundertheReceiver
OperatingCharacteristic(ROC)Curve(AUC)isalsocomputed.
19
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
Experiment1:CARDdatasets.

Where
α
are the predicted outputs for Class 1 samples i 12

Where

α
i
are

the


predicted

outputs

for

Class

1

samples

i
=
1
,
2
,
…mandβ
j
arepredictedoutputforClass0samples,j=1,2,…n.
• AUCisamoreappropriateperformancemeasurethanACC
when the class distribution is
skewed
20
when

the


class

distribution

is

skewed
.
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
Experiment1:CARDdatasets.
Dataset #connections ACC(θ
1
)AUC
d

1
) ACC(θ
2
)AUC
d

2
)
CARD1(TR) 9.13±0.94 88.38±0.56 87.98±0.32 86.80±0.90 86.03±1.04
CARD1(TS) 87.79±0.57 87.75±0.43 88.35±0.56 88.16±0.48
CARD2(TR) 7.17±0.38 88.73±0.56 88.72±0.57 86.06±1.77 85.15±2.04
CARD2(TS) 81.76±1.28 82.09±0.88 85.17±0.37 84.25±0.55
CARD3(TR)
757

±
063
88 02
±
051
88 02
±
069
86 48
±
107
87 07
±
060
CARD3(TR)
7
.
57

±
0
.
63

88
.
02

±
0

.
51
88
.
02

±
0
.
69

86
.
48

±
1
.
07

87
.
07

±
0
.
60
CARD3(TS) 84.67±2.45 84.28±2.48 87.15±0.88 87.15±0.85
• θ isthecut‐offpointforneuralnetworkclassification:ifoutputisgreaterthanθ,thanpredict

Class1,elsepredictClass0.
• θ
1
andθ
2
arecut‐offpointsselectedtomaximizetheaccuracyonthetrainingdataandthetest
datasets,respectively.
21
• AUC
d
=AUCforthediscreteclassifier=(1

fp +tp)/2
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
Experiment1:CARDdatasets.
• Oneprunedneuralnetworkwasselectedforruleextractionforeachofthe3CARDdatasets:
i
C ()
C
(S)
d
i
Dataset #connect
i
ons AU
C

(
TR

)
AU
C
(
T
S)
Unprune
d
i
nputs
CARD1 8 93.13% 92.75% D
12
,D
13
,D
42
,D
43
,C
49
,C
51
CARD2
9
93 16%
89 36%
D
D
D
D

D
C
C
CARD2
9
93
.
16%
89
.
36%
D
7
,
D
8
,
D
29
,
D
42
,
D
44
,
C
49
,
C

51
CARD3 7 93.20% 89.11% D
42
,D
43
,D
47
,C
49
,C
51
• Errorratecomparisonversusothermethods:
Methods CARD1 CARD2 CARD3
GeneticAlgorithm 12.56 17.85 14.65
NN(other) 13.95 18.02 18.02
NeuralWorks
14 07
18 37
15 13
NeuralWorks
14
.
07
18
.
37
15
.
13
NeuroShell 12.73 18.72 15.81

PrunedNN
(
θ
1
)
12.21 18.24 15.33
22
(
1
)
PrunedNN(θ
2
) 11.65 14.83 12.85
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
Experiment1:CARDdatasets.
• Neuralnetworkswithjustonehiddenunitandveryfewconnectionsoutperformmorecomplex
l tk!
neura
l
ne
t
wor
k
s
!
• Rulecanbeextractedtoprovidemoreunderstandingabouttheclassification.

Rules for CARD1 from Re


RX:

Rules

for

CARD1

from

Re

RX:
 RuleR
1
:IfD
12
=1andD
42
=0,thenpredictClass0,

Rule R
: else if D
= 1 and D
= 0 then predict Class 0

Rule

R
2

:

else

if

D
13
=

1

and

D
42
=

0
,
then

predict

Class

0
,
 RuleR
3

:elseifD
42
=1andD
43
=1,thenpredictClass1,
 RuleR
4
:elseifD
12
=1andD
42
=1,thenClass0,
o RuleR
4a
:IfR
49
−0.503R
51
>0.0596,thenpredictClass0,else
o RuleR
4b
:predictClass1,
 RuleR
5
:elseifD
12
=0andD
13
=0,thenpredictClass1,
 RuleR

6
:elseifR
51
=0.496,thenpredictClass1,

Rule R
: else predict Class 0
23

Rule

R
7
:

else

predict

Class

0
.
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
Experiment1:CARDdatasets.
• RulesforCARD2:
 RuleR
1
:IfD

7
=1andD
42
=0,thenpredictClass0,
 RuleR
2
:elseifD
8
=1andD
42
=0,thenpredictClass0,
2
8
42
 RuleR
3
:elseifD
7
=1andD
42
=1,thenClass1
 RuleR
3a
:ifI
29
=0,thenClass1
 RuleR
3a−i
:ifC
49

−0.583C
51
<0.061,thenpredictClass1,
 RuleR
3a−ii
:elsepredictClass0,
 RuleR
3b
:elseClass0
 RuleR
3b−i
:ifC
49
−0.583C
51
<−0.274,thenpredictClass1,
 RuleR
3b−ii
:elsepredictClass 0.
 RuleR
4
:elseifD
7
=0andD
8
=0,thenpredictClass0,

Rl R
l di t Cl 0
24


R
u
l
e
R
5
:e
l
sepre
di
c
t

Cl
ass
0
.
Part2.Noveltechniquesfordataanalysis
Business intelligence applications
Experiment1:CARDdatasets.
• RulesforCARD3:
 RuleR
1
:IfD
42
=0,thenClass1

Rule R
1

: if C
51
> 1.000, then predict Class 1,

Rule

R
1
a
:

if

C
51
>

1.000,

then

predict

Class

1,
 RuleR
1b
:elsepredictClass0,


Rule R
: else Class 1

Rule

R
2
:

else

Class

1
 RuleR
2a
:ifD
43
=0,thenClass1

l
f
h d l

Ru
l
eR
2a−i
:i
f

C
49
−0.496C
51
<0.0551,t
h
enpre
d
ictC
l
ass1,
 RuleR
2a−ii
:elsepredictClass0,
 RuleR
2b
:elseClass0
 RuleR
2b−i
:ifC
49
−0.496C
51
<2.6525,thenpredictClass1,
25
 RuleR
2b−ii
:elsepredictClass0,

×