Tải bản đầy đủ (.pdf) (4 trang)

07 - immunity-based method for anti-spam model

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (381.44 KB, 4 trang )

Immunity-based Method for Anti-Spam Model
1


Jin Yang
Department of Computer Science
LeShan Normal University
LeShan 614004, China


Yi Liu
Department of Computer Science
LeShan Normal University
LeShan 614004, China


Qin Li
Department of Computer Science
LeShan Normal University
LeShan 614004, China

Abstract—Widespread information technique use has led to the
emergence of email networks large-scale applications networks
in cyberspace. But the traditional spam solutions for anti-spam
are mostly static methods, and the means of adaptive and real
time analyses the mail are seldom considered. Inspired by the
theory of artificial immune systems (AIS), a novel distributed
anti-spam model that leverages e-mail networks’ topological
properties is presented. The concepts and formal definitions of
immune cells are given, and dynamically evaluative equations
for self, antigen, immune tolerance, mature-lymphocyte


lifecycle are presented, and the hierarchical and distributed
management framework of the proposed model are built. The
experimental results show that the proposed model has the
features of real-time processing and more efficient than client-
server-based solutions, thus providing a promising solution for
anti-spam system.
Keywords-spam; artificial immune systems; anti-spam system
I. INTRODUCTION
The amount of unsolicited email has increased
dramatically in the past few years. Spam is becoming a great
serious problem since it causes huge losses to the
organization, such as wasting the bandwidth, adding the
user’s time to deal with the insignificancy mail, enhancing the
mail server processing and causing the mail server to crush
[1]. Anti-spam is the application of data investigation and
analysis techniques currently mainly by means of blocking
and filtering procedures [2]. However, the current techniques
classifying a message as either spam or legitimate utilize the
methods such as identifying keywords, phrases, sending
address etc. Keeping a blacklist of addresses to be blocked, or
an appointment list of addresses to be allowed are also used
widely. There are a few disadvantages with using this
technique. Because spammers can create many false from e-
mail addresses, it is difficult to maintain a black list that is
always updated with the correct e-mails to block [3].
Message filtering methods is straightforward and does not
require any modifications to existing e-mail protocols. But
message filtering often rely on humans to create detectors
based on the spam they’ve received. A dedicated spam
sender can use the frequently publicly available information

about such heuristics and their weightings to evade detection
[4]. Some of the different approaches have been proposed.
Neural networks also have been used for the detecting spam


1
This work was supported by the Scientific Research Fund of
Sichuan Provincial Education Department (No. 08ZA130) and he
Scientific Research Fund of LeShan Normal University (No.
Z0863).
[5]. Using data mining method has been described as well.
But the methods of adaptive capture the potential sensitive
traffic and real time analyses the mail are seldom considered.
Therefore, the traditional technology lack self-learning, self-
adaptation and the ability of parallel distributed processing,
calls for an effective and adaptive analyzing system for anti-
spam.
Gradually, researchers transfer their visions to the field of
biological immune system, exploring new ways for bionic
computation. Artificial Immune Systems (AIS) is a now
receiving more attention and is realized as a new research
hotspot of biologically inspired computational intelligence
approach after the genetic algorithms, neural networks and
evolutionary computation in the research of Intelligent
Systems. Burnet proposed clone Selection Theory in 1958 [6].
Negative Selection Algorithm and the concept of computer
immunity proposed by Forrest in 1994 [7]. It is known that
the Artificial immune system has lots of appealing features[8-
9] such as diversity, dynamic, parallel management, self-
organization and self-adaptation that has been widely used in

the fields such as [10-11] data mining, network security,
pattern recognition, learning and optimization etc. In this
paper, we propose a new spam detection technique based on
artificial immunity theory.
II. S
PAM SURVEILLANCE MODEL BASED ON AIS
The aim of this paper is to establish an immune-based
model for dynamic spam detection. The model is composed
of three processes: Process of Email Character distilling,
Process of Email Surveillance, and Process of Training.
Process of Email Character distilling use vector space model
and present the received mail in discrete words. Process of
Training generates various immature detectors from gene
library to distinguish Self and Non-self. According to
immune principle, some of these new immature detectors
are false detectors and they will be removed by the negative
selection process, which matches them to the training mails.
If the match strength between an immature detector and one
of the training mails is over the pre-defined threshold, this
new immature detector is consider as a false detector.
Process of Email Surveillance matches the received mails to
the mature detectors. If the match strength between a
received mail and one of detectors, the mail will be consider
as the spam. The detail training phases are as following.
A. Self and Non-self
A biological immune system can produce antibodies to
resist pathogens through B cells distributing all over the
human body. And T cells can regulate the antibody
2009 International Conference on Networks Security, Wireless Communications and Trusted Computing
978-0-7695-3610-1/09 $25.00 © 2009 IEEE

DOI 10.1109/NSWCTC.2009.328
171
concentration. An immune system can distinguish between
self and non-self to detect potentially dangerous. These non-
self elements include antibodies and viruses. In a spam
immune system, we distinguish legitimate messages from
spam. We consider the text of the email include the headers
and the body as the antigen of a spam message. In the model,
we define antigens (Ag) to be the features of email service
and the email information, and given by:
}|{ DagagAg ∈= ,
l
D }1,0{= .
Antigens are binary strings extracted from the email
information received in the network environment. The
antigen consists of the gene libraries of emails include
sender, sending organization, email service provider,
receiving organization, recipient fields, etc.
The structure of an antibody is the same as that of an
Antigen. For spam detection, the nonself set (Nonself)
represents abnormal information from a malignant email
service, while the self set (Self) is normal email service.
Set
Ag contains two subsets [12], AgSelf ⊆ and
AgNonself ⊆
such that,
AgNonselfSelf =∪ , ΦNonselfSelf =∩ (1)
For the convenience using the fields of a antigen
x
, a

subscript operator "." is used to extract a specified field
of
x
, where
x
.
fieldname
= the value of filed fieldname
x
.
In the model, all the detectors form a Set Detector called
SD
.
}, ,|,,{ NcountNageDdcountagedSD

∈∈><= (2)
where
d is the antibody gene that is used to match an
antigen, age is the age of detector
d, count (affinity) is the
number of detector matched by antibody
d, and N is the set
of nature numbers.
SD contains two subsets: mature and
memory, respectively, the set
M
and set
T
. A mature SD
is a

SD that is tolerant to self but is not activated by antigens.
A memory
SD evolves from a mature one that matches
enough antigens in its lifecycle. Therefore,
φ
=∩∪= TMTMSD , .
)}.,.(
,,|{
β
<∧>∉<

∀∈=
countxMatchydx
SelfySDxxM
(3)
)}.,.(
,,|{
β
≥∧>∉<

∀∈=
countxMatchydx
SelfySDxxT
(4)
where
β(>0) represents the activation threshold. Match
is a match relation defined by
}1),(,,|,{ =∈><= yxfDyxyxMatch
match
. (5)

In the course,
β
is the threshold of the affinity for the
activated detectors. The affinity function ),(
yxf
mathch
may
be any kind of Hamming, Manhattan, Euclidean, and
r-
continuous matching, etc. In this model, we take
r-
continuous matching algorithm to compute the affinity of
mature Detectors.
B. The Dynamic Model of Self
In the anti-spam immune system has the same situation
as the biological immune system that the self changes over
time. The legitimate mails will change over time along with
some environment and personal behavior change such as the
user contact friends list increase, develop new interests,
discuss new issues, and write email by a new language etc.
In order to prevent an antibody from matching a self, the
recent formed antibody must be tested by self endurance
before matching an antigen. We use following formulation
to show the new antibody’s self endurance:
Self(t) =Self(0)={x
1
,x
2
, ,x
n

}, t=0 (6)
Self(t+Δt
1
)=Self(t) , t≥1∧Δt
1
mod δ
1
≠0 (7)
Self(t+Δt
2
)= Self(t)+Self
new
(Δt
2
)-
(∂Self
variation
/∂x)·Δt
2
, t≥1∧ Δt
2
mod δ
1
≠0 (8)
} at timeforbidden antigent self theis |{)( txxtSelf
variation
=
(9)
} at time permittedantigent self theis |{)( txxtSelf
new

=
(10)
C. The Dynamic Mature Detector Model
0,0)0()( ==
=
tMtM (11)
1))(),((),(
)()()()(
_
≠Δ−
Δ

+
=
Δ
+
tAgtMfwhentM
tMtMtMttM
matchdead
otherfromnew
(12)
1))(),((
),1()(
=
−Δ⋅






=
tAgtMfwhen
t
x
M
x
M
tM
match
active
active
clone
clone
clone
 (13)
1)(.)(.
,)(.)(.
+=Δ+
Δ⋅+=
Δ
+
tcountMttcountM
tVtMttM
p
ρ
ρ
(14)
=Δ⋅



=Δ t
x
M
tM
new
new
new
)( )1( −Δ⋅


t
x
T
active
active
(15)
1))1(),1((
)(
=−−
Δ⋅



tSelftMfwhen
t
x
M
tM
match
death

death
dead

(16)
)()(
_
_
1
_
t
x
M
tM
otherfrom
i
otherfrom
k
i
otherfrom
Δ⋅




=
(17)
Equation (12) depicts the lifecycle of the mature
detector, simulating the process that the mature detectors
evolve into the next generation. All mature detectors have a
fixed lifecycle (λ). If a mature detector matches enough

antigens (
β

) in its lifecycle, it will evolve to a memory
detector. However, the detector will be eliminated and
replaced by new generated mature detector if they do not
match enough antigens in their lifecycle.
)(tM
new
is the
generation of new mature SD. )(tM
dead
is the set of SD that
haven’t match enough antigens (
β

) in lifecycle or
classified self antigens as nonself at time t.
)(tM
active
is the
set of the least recently used mature SD which degrade into
memory SD and be given a new age
0>T
and count
1>
β
.
172
When the same antigens arrive again, they will be detected

immediately by the memory SD. In the mature detector
lifecycle, the inefficient detectors on classifying antigens are
killed through the process of clone selection. Therefore, the
method can enhance detection efficiency when the abnormal
behaviors intrude the email system again.
As Figure 1 shows, system randomly creates the
immature detectors firstly, and then it computes the affinity
between the immature detectors and every element of
training example. If the affinity of one immature detector is
over threshold, it will become a mature detector and will be
add into mature detector set. System repeats this procedure
until mature detectors are created.

Figure 1. The Dynamic Mature Detector Model
D. The density of antibody dynamic evolvement
The Memory detector’s density of antibody expressed
the quantity and categories of the spam and malice intrusion,
reflecting the security level of the current system. There are
two major changes of density of antibody.
1) Increase:
When the memory detector captures a
particular antigen, we simulate human immune system
functions to increase the density of antibody, representing
spam and malice intrusion quantity increase. We use
ρ
V

reflect the increase speed of the density of antibody, then
the
t

moment the density
)(t
ρ
of antibodies
)(tMem
SD
is:
tVtt Δ⋅+−=
ρ
ρ
ρ
)1()(
(18)
+∞<<>=
⋅−

xuexV
uhx
0,0,
2
A
)(
2
])[(
2
σπ
ρ
(19)
The more intensive invasion of antigen, the faster of
antibody density increase. On the contrary, if memory

detector matches the invasion antigen relative less, the
increase rate of antibody density becomes slow. As each
invasion antigen (spam) causes to the host or network
different degrees, we introduce parameter
u to reflect the
damage degree caused, calculating by the experiment. To
avoid memory detector for unlimited cloning, we regulate A
as the largest limiting growth of antibody density.
2) Decrease: If memory detector fails to clone for a
cycle time, we make antibody density to decay according to
equation (20):
)(
2
1
)(
τρρ
−= tt

τ

t
(20)
The t is the half-life of antibody density. When the
density of antibody goes down to 0.05, we cease antibody
density attenuation.
05.0)( =

τ
ε
ρ

t . At this time shows that
the antibody corresponding alarm is free.
E.
The Antibody Variation
In order to prevent algorithm from converging
prematurely, we take variation operation to the gene set
=
1
G },,,,{
21 ni
gggg LL after the cross process. Select
variation point randomly and varied with some variation
probability (
m
p ) to generate new generation
=
new
G },,,,{
'
21 ni
gggg LL . Select variation point according
to Poisson distribution
L,2,1,0,
!
}{ ===

k
k
e
kXP

k
m
λ
λ
. (21)
0)()( >
=
=
λ
XDXE
, where X is the numbers of
variation points. Then the
1
G turn into the offspring
new
G by
the variation process.
F.
The Process of Email Surveillance
Our model uses detector state conversion in the
dynamic evolution of mature detector and memory detector,
erasing and self matching detector. As the Figure 2 shows,
the undetected Emails are compared with memory detectors
firstly. If one e-mail match any elements of memory
detector set, this Email is classified as spam and send
alarming information to user. Then, the remaining Emails
which are filtered by memory detectors are compared to
mature detectors. Mature detectors must have become
stimulated to classify an as junk, and therefore it is assumed
the first stimulatory signal has already occurred. Feedback

from administrator is then interpreted to provide a co-
stimulation signal. If system receives affirmative co-
stimulation in fixed period, the matched Email is classified
as spam. Or else it is considered as normal Email and
delivered to user client in the normal way. During the
filtering phase, when a mature detector matches one e-mail,
the count field of mature detector will be added. If the value
of filed count is over threshold, it will be activated and
become a memory detector. Meanwhile, if a memory
detector can not match with any e-mails in fixed period, it
will degenerate into a mature detector. When the unsolicited
emails and malice intrusions increase, we simulate immune
system functions to increase the density of antibody; when
they decrease, we simulate immune feedback functions and
reduce the density of corresponding antibody, restoring it to
normal level.
173

Figure 2. The Process of Email Surveillance
III. EXPERIMENTAL RESULTS AND ANALYSIS
Experiments of simulation were carried out in our
Laboratory. The main aim of the experiment was to test the
feasibility of the application for anti-spam based on AIS to
implement spam detecting. And we developed some series
experiments. Here are the coefficients for the model as the
Table 1 showing.
TABLE I. COEFFICIENTS FOR THE MODEL
Parameter Value
r-contiguous bits matching rule 8
The size of initial self set n 40

The Initial Scale of Detectors 100
Match Threshold β 40~60
Activable Threshold λ 50~150
Clone Scale 20
Mutation Scale 19
The Life Cycle of the Mature Detectors 120s
The first series of experiments were carried out to
testify the feasibility of our resolution for anti-spam as the
following.
We prepared the Ling-Spam datasets for analysis
and experiments. A mixture of 481 spam messages and
2412 messages sent via the Linguist list, a moderated list
about the profession and science of linguistics. Attachments,
HTML tags, and duplicate spam messages received on the
same day are not included. The whole experiment is divided
into two phase: training phase and application phase. The
main different between the two phases is that the former
does not use filtering module and just generates detectors
for system. We partitioned the emails randomly into ten
parts and choose one part randomly as a training example,
then remaining nine parts are used for test and we can get 9
group recall and precision ratios. The average value of these
9 group values is considered as the model’s recall and
precision ratio. The Figure 3 below shows the average
performance of Bayesian method and our model in the
comparison experiment. As indicated by the experiments, it
can be concluded that artificial immune-based detection of
spam can prove to be a useful technique.

Figure 3. Results of Comparison Experiments

IV. CONCLUSIONS
Traditional spam filters system and technology almost
adopted static measure, however, lack self-adaptation and
the ability of parallel distributed processing, consequently
unable to adjust to current network security situation. In this
paper, we have presented a model of spam detection based
on the theory of artificial immune system, and we have also
illustrated the advantages of this model than traditional
models. The concepts and formal definitions of immune
cells are given. And we have quantitatively depicted the
dynamic evolutions of self, antigens, immune-tolerance, and
the immune memory. Additionally, the model utilized a
distributed and multi-hierarchy framework to provide an
effective solution for the spam. Finally, the experimental
results show that the proposed model is a good solution for
anti-spam system.
R
EFERENCES
[1] D. D'Ambra, "Killer spam: clawing at your door", Inf. Prof. 4, vol. 28,
no. 4, 2007.
[2] Le Zhang, Jingbo Zhu, Tianshun Yao, "An Evaluation of Statistical
Spam Filtering Techinques", ACM Transactions on Asian Language
Information Processing (TALIP) vol. 3 ,2004, pp. 243-269.
[3] M.N. Marsono, M. Watheq, and F. Gebali, "Binary LNS-based naïve
Bayes inference engine for spam control: noise analysis and FPGA
implementation", IET Comput. Digit. Tech, vol. 56, no. 2, 2008.
[4] Mizrak.AT; Savage.S, "Detecting compromised routers via pocket
forwarding behavior", IEEE Network, vol. 22, no. 2, 2008, pp. 34-39.
[5] Villa.O, Petrini.F, "Accelerating real-time string searching with
multicore processors", Computer, vol.41, no. 4, 2008, pp. 42-44.

[6] F.M.Burnet, "The Clone Selection Theory of Acquired Immunity.
Gambridge", Gambridge University Press ,1959.
[7] T.B.Kepler, "Somatic hyper mutation in B cells: An optimal control
treatment", Theoret Biol ,1993, pp. 37-64.
[8] S Forrest, A S Perelson, L Allen, and R Cherukuri, "Self-Nonself
Discrimination in a Computer", Proceedings of IEEE Symposium on
Re-search in Security and Privacy, Oakland, 1994.
[9] Kim J, Bentley P, "The Artificial Immune Model for Network
Intrusion Detection", 7th European Congress on Intelligent
Techniques and Soft Computing, 1999.
[10] Artin-Herran. G, Rubel. O, Zaccour. G, "Competing for consumer's
attention", AUTOMATICA, vol. 44, 2008, pp. 361-370.
[11] Hanke.M, "On the effects of stock spam e-mails ", Journal Of
Financial Markets, vol. 11, 2008, pp. 57-83.
[12] T. Li, "An Introduction to Computer Network Security. 1st edition",
Publishing House of Electronics Industry Beijing , 2004.

174

×