MINISTRY OF EDUCATION
AND TRAINING
VIETNAM ACADEMY
OF SCIENCE AND TECHNOLOGY
GRADUATE UNIVERSITY OF SCIENCE AND TECHNOLOGY
NGUYEN VAN TRUONG
IMPROVING SOME ARTIFICIAL IMMUNE ALGORITHMS
FOR NETWORK INTRUSION DETECTION
Major: Mathematical foundations for Informatics
Code: 62 46 01 10
SUMMARY OF THE DOCTORAL THESIS OF
MATHEMATICS
Hanoi – 2019
Thesis is completed at: Graduate University of Science and Technology Vietnam Academy of Science and Technology.
Supervisors:
1. Assoc. Prof., Dr. Nguyen Xuan Hoai
2. Assoc. Prof., Dr. Luong Chi Mai
Review 1:
Review 2:
Review 3:
The thesis will be defended, meeting at: Graduate University of Science
and Technology - Vietnam Academy of Science and Technology.
At:
Thesis can be found at the library:
- National Library of Vietnam
- Library of Graduate University Of Science And Technology
1
INTRODUCTION
Motivation
Internet users and computer networks are suffering from rapid increase in number of attacks.
In order to keep them safe, there is a need for effective security monitoring systems, such
as Intrusion Detection Systems (IDS). However, intrusion detection has to face a number of
different problems such as huge network traffic volumes, highly imbalanced data distribution,
the difficulty to realize decision boundaries between normal and abnormal behavior, and a
requirement for continuous adaptation to a constantly changing environment. As a result,
many researchers have attempted to use different type of approaches to build reliable intrusion
detection system.
One of the promising computational intelligence methods for intrusion detection that have
emerged recently are artificial immune systems (AIS) inspired by the biological immune system.
Negative selection algorithm (NSA) of AIS, is widely used for intrusion detection systems
(IDS). Despite its successful application, NSA has some weaknesses: 1-High false positive rate
and/or false negative rate, 2-High training and/or testing time, 3-Exponential relationship
between the size of the training data and the number of detectors possibly generated for
testing, 4-Changeable definitions of ”normal data” and ”abnormal data” in dynamic network
environment. To overcome these limitations, trends of recent works are to concentrate on
complex structure of immune detectors, matching methods and hybrid NSAs.
Objectives
Since data representation is one of the factors that affect the training and testing time,
a compact and complete detector generation algorithm is investigated.
The thesis investigates optimal algorithms to generate detector set in AIS. They help to
reduce both training time and detecting time of AIS-based IDSs.
Also, it is regarded to propose and investigate an AIS-based IDS that can promptly detect
attacks, either if they are known or never seen before. The proposed system makes use of AIS
with statistics as analysis methods and flow-based network traffic as data source.
Problem statements
Since the NSA has four main limitations as listed in the first section, this thesis concentrates on three problems:
1. The first problem is to find compact representations of data. Objectives of this problem’s solution is not only to minimize memory storage but also to reduce testing time.
2. The second problem is to propose algorithms that can reduce training time and testing
time in compared with all existing related algorithms.
3. The third problem is to improve detection performance with respect to reducing false
2
alarm rates while keeping detection rate and accuracy rate as high as possible.
It is impossible to find an optimal algorithm that can reduce time and memory complexities
with best detection performance. These aspects are always in conflict with each other. Thus,
in each chapter, we will propose algorithms to solve each problem quite independently.
The intrusion detection problem mentioned in this thesis can be informally stated as:
Given a finite set S of network flows which labeled with self (normal) or nonself (abnormal).
The objective is to build classifying models on S that can label an unlabeled network flow s.
Outline of thesis
Chapter 1 introduces the background knowledge necessary to discuss the algorithms
proposed in following chapters.
In Chapter 2, a combination of selection algorithms is presented. The technique reduces
detectors storage generated in training phase. Testing time, an important measurement in IDS,
will also be reduced as a direct consequence of a smaller memory complexity. Tree structure
is used to improve time and memory complexities.
A complete and nonredundant detector set is necessary to archive acceptable self and
nonself coverage of classifiers. A selection algorithm to generate this type of detectors set is
investigated in Chapter 3.
Chapter 4 includes two selection algorithms for fast training phase. The optimal algorithms can generate a detectors set in linear time with respect to size of training data. In term
of detection time, the first algorithm and the second one is linear and polynomial, respectively.
Chapter 5 mainly introduces a hybrid approach of positive selection algorithm with statistics. Frequencies of self and nonself data contained in leaves of tree-based detectors play an
important role in improving performance of the proposed algorithms. The hybrid approach
came as a new positive selection algorithm for two-class classification that can be trained with
samples from both self and nonself data types.
3
Chapter 1
BACKGROUND
1.1 Human immune system
Our human immune system has a multi-layered protection architecture, including physical
barriers, physiological barriers, an innate immune system, and an adaptive immune system.
The adaptive immune system is capable of adaptively recognizing specific types of pathogens,
and memorizing them for accelerated future responses. It is the main inspiration for AISs.
1.2 Selection algorithms
Negative selection approaches are based on self-nonself discrimination in biology system. This
property makes it attractive for computer and network security researchers. A survey show
that in six year 2008-2013, NSA predominate all the other models of AIS in term of published
papers in both network security and anomaly detection.
1.2.1 Negative Selection Algorithms
A typical NSA comprises of two phases: detector generation and detection. In the detector
generation phase, the detector candidates are generated by some random processes and censored by matching them against given self samples taken from a set S (representing the system
components). The candidates that match any element of S are eliminated and the rest are
kept and stored in the detector set D.
1.2.2 Positive Selection Algorithms
A PSA contains two phases: detector generation and detection. In the detector generation
phase, the detector candidates are generated by some random processes and matched against
the given self sample set S. The candidates that do not match any element in S are eliminated
and the rest are kept and stored in the detector set D. In the detection phase, the collection
of detectors are used to distinguish self from non-self. If incoming data instance matches any
detector, it is claimed as self.
1.3 Basic terms and definitions
1.3.1 Strings, substrings and languages
An alphabet Σ is nonempty and finite set of symbols. A string s ∈ Σ∗ is a sequence of symbols
from Σ, and its length is denoted by |s|. A string is called empty string if its length equals 0.
4
Given an index i ∈ {1, . . . , |s|}, then s[i] is the symbol at position i in s. Given two indices i
and j, whenever j ≥ i, then s[i . . . j] is the substring of s with length j − i + 1 that starts at
position i and whenever j < i, then s[i . . . j] is the empty string. If i = 1, then s[i . . . j] is a
prefix of s and, if j = |s|, then s[i . . . j] is a suffix of s. For a proper prefix or suffix s of s, we
have in addition |s | < |s|. Given a string s ∈ Σ , another string d ∈ Σr with 1 ≤ r ≤ , and
an index i ∈ {1, . . . , − r + 1}, we say that d occurs in s at position i if s[i . . . i + r − 1] = d.
Moreover, concatenation of two strings s and s is s + s .
A set of strings S ⊆ Σ∗ is called a language. For two indices i and j, we define S[i . . . j] =
{s[i . . . j]|s ∈ S}.
1.3.2 Prefix trees, prefix DAGs
A prefix tree T is a rooted directed tree with edge labels from Σ where for all c ∈ Σ, every
node has at most one outgoing edge labeled with c. For a string s, we write s ∈ T if there is
a path from the root of T to a leaf such that s is the concatenation of the labels on this path.
The language L(T ) described by T is defined as the set of all strings that have a nonempty
prefix s ∈ T . Trees for self dataset and nonself dataset are called positive trees and negative
trees, respectively.
A prefix DAG D is a directed acyclic graph with edge labels from Σ, where again for all
c ∈ Σ, every node has at most one outgoing edge labeled with c. Similar to prefix trees, the
terms root and leaf used to refer to a node without incoming and outgoing edges, respectively.
We write s ∈ D if there is a path from a root node to a leaf node in D that is labeled by
s. Given n ∈ D, the language L(D, n) contains all strings that have a nonempty prefix that
labels a path from n to some leaf. Moreover, we define L(D) = ∪n
is a root of D L(D, n).
A prefix DAG can be turned into a finite automaton to decide the membership of strings
in languages.
1.3.3 Detectors
Given an alphabet Σ is nonempty and finite set of symbols, positive and negative r-chunk
detectors, r-contiguous detectors, rcbvl-detectors could be defined as follows:
Definition 1.1. Given a self set S ⊆ Σ , a tuple (d, i) of a string d ∈ Σr , where r ≤ , and
an index i ∈ {1, ..., − r + 1} is a positive r-chunk detector if there exists a s ∈ S such that d
occurs in s.
Definition 1.2. Given a self set S ⊆ Σ , a tuple (d, i) of a string d ∈ Σr , r ≤ , and an index
i ∈ {1, ..., − r + 1} is a negative r-chunk detector if d does not occurs any s ∈ S.
Example 1.1. Let
= 6, r = 3. Given a set of five self strings S = {s1 = 010101, s2 =
111010, s3 = 101101, s4 = 100011, s5 = 010111}. The set of some positive r-chunk detectors
is {(010,1), (111,1), (101,2), (110,2), (010,3), (101,3), (101,4), (010,4), (111,4))}. The set
5
of some negative r-chunk detectors is {(000,1), (001,1), (011,1), (001,2), (010,2), (100,2),
(000,3), (100,3), (000,4), (001,4), (100,4)}.
Definition 1.3. Given a self set S ⊆ Σ , a string d ∈ Σ is a r-contiguous detector if d[i, . . . , i+
r − 1] does not occurs any s ∈ S for all i ∈ {1, ..., − r + 1}.
Example 1.2. Let
= 5, r = 3. Given a set of 7 self strings S = {s1 = 01111, s2 = 00111,
s3 = 10000, s4 = 10001, s5 = 10010, s6 = 10110, s7 = 11111}. The set of all 3-contiguous
detectors is {01011, 11011}.
We also use the following notations:
• Dpi = {(d, i)|(d, i) is a positive r-chunk detector} is set of all positive r-chunk detectors
at position i, i = 1, . . . , − r + 1.
• Dni = {(d, i)|(d, i) is a negative r-chunk detector} is set of all negative r-chunk detectors
at position i, i = 1, . . . , − r + 1.
−r+1
• CHU N Kp (S, r) = ∪i=1
Dpi is set of all positive r-chunk detectors.
−r+1
• CHU N K(S, r) = ∪i=1
Dni is set of all negative r-chunk detectors.
• CONT(S, r) is the set of all r-contiguous detectors that do not match any string in S.
• For a given detectors set X, L(X) is the set of all nonself strings detected by X. We also
say that Σ \ L(X) is the set of all self strings detected by X.
Definition 1.4. Given a self set S ⊆ Σ . A triple (d, i, j) of a string d ∈ Σk , 1 ≤ k ≤ , an
index i ∈ {1, ..., − r + 1} and an index j ∈ {i, ..., − r + 1} is called a negative detector under
rcbvl matching rule if d does not occur in any s, s ∈ S.
To combine PSA and NSA in a unified framework, we have to change the original semantic
of positive selection in the detection phase as follows.
Definition 1.5. If new instance matches
− r + 1 positive r-chunk detectors (dij , i), i =
1, . . . , − r + 1, it is claimed as self, otherwise it is claimed as nonself.
With this new detection semantic, the following proposition on the equivalence of detection coverage of r-chunk type PSA and NSA could be stated.
Theorem 1.1 (Detection Coverage). The detection coverage of positive and negative selection
algorithms coincide.
L(CHU N Kp (S, r)) = L(CHU N K(S, r))
6
1.3.4 Ring representation of data
With reference to string-based detectors set, a simple technique for this approach is to concatenate each string representing a detector with its fist k symbols. Each new linear string is a
ring representation of its original binary one. Given a set of strings S ⊂ Σ , a set Sr ⊂ Σ
+r−1
includes ring representations of all strings in S by concatenating each string s ∈ S with its fist
r − 1 symbols. We can easily apply the idea of ring strings for other data representations in
AIS. One way to do this, for instance, is to create ring representations of other structures such
as trees, and automata, from set Sr instead of S as usual.
1.3.5 Frequency trees
Given a set D of length-equaled strings, a tree T on D, noted TD , is a rooted directed tree
with edge labels from Σ where for all c ∈ Σ, every node has at most one outgoing edge labeled
with c. For a string s, we write s ∈ T if there is a path from the root of T to a leaf such that s
is the concatenation of the labels on this path. Each leaf is associated with an integer number,
that is frequency of string s ∈ D and s is the concatenation of the labels on the path ending
by this leaf. We also use two concepts: self trees and nonself trees to present r-chunk detectors
set for normal dataset and abnormal dataset, respectively.
1.4 Datasets
We only concentrate on flow-based NIDSs because: 1 - It can detect some special attacks more
efficient and faster than payload based one, since lesser information is needed to be analyzed;
2 - Flow-based anomaly detection methods process only packet headers and reduce data and
processing time for high-speed detection on large networks. It can solve the scalability problem in condition of increasing network usage and load. 3 - Flow-based NIDSs decrease privacy
issues in comparison with packet-based ones because of the absence of payload.
The DARPA-Lincoln datasets: The DARPA-Lincoln datasets were collected by MITs Lincoln laboratory with the purpose of evaluating the performance of different intrusion detection
methodologies.
UT datasets: This data set was captured by monitoring a honeypot hosted in the University
of Twente network. The dataset has three categories: malicious traffic, unknown traffic and
side-effect traffic. Each flow in the datasets has 12 fields.
Netflow datasets: This dataset focuses only on flows to a specific port and a IP address
which receives the most number of attacks. It contains all 129,571 traffics (including attacks)
to and from victims.
7
Chapter 2
COMBINATION OF NEGATIVE
SELECTION AND POSITIVE
SELECTION
2.1 New Positive-Negative Selection Algorithm
Our algorithm first constructs
− r + 1 binary trees (called positive trees) corresponding to
− r + 1 positive r-chunk detector set Dpi , i = 1, . . . , − r + 1. Then, all complete subtrees of
these trees are removed to achieve a compact representation of the positive r-chunk detector
set while maintaining the detection coverage. Finally, for every ith positive trees, we decide
whether or not it should be converted to the negative tree, which covers the negative r-chunk
detector set Dni . The decision depends on which tree is more compact. When this process is
done, we have
− r + 1 compact binary trees that some of them represent positive r-chunk
detectors and the others represent negative ones.
The r-chunk matching rule on binary trees is implemented as follows: a given sample
s matches the positive (negative) tree ith if s[i . . . i + k] is a path from the root to a leaf,
i = 1, . . . , − r + 1, k < r. The detection phase can be conducted by traveling the compact
binary trees iteratively one by one: a sample s is claimed as non-self if it matches a negative
tree or it does not match all positive trees, otherwise it is considered as self.
From the
description of DetectorGeneration, it is trivial to show that it takes |S|( − r + 1).r steps to
generate all necessary trees (detector generation time complexity) and ( − r + 1).r steps to
verify a cell string as self or non-self in the worst case (worse-case detection time complexity).
These time complexities are similar to the state-of-the-art NSAs (PSAs).
Theorem 2.1. Given a self set S and an integer , procedure DetectorGeneration produces
the detector (binary) tree set that have at most total ( − r + 1).2r−2 less number of nodes in
comparison to the detector tree set created by a PSA or NSA only, where r ∈ {2, . . . , − r + 1}.
8
Algorithm 2.1 Detector Generation Algorithm.
1: procedure DetectorGeneration(S, r, T )
Input: A set of self strings S ⊆ Σ , a matching threshold r ∈ {1, . . . , }.
Output: A set T of − r + 1 prefix trees presenting all r-chunk detectors.
2:
T =∅
3:
for i = 1, ..., − r + 1 do
4:
create an empty prefix tree Ti
5:
for all s ∈ S do
6:
insert every s[i . . . i + r − 1] into Ti
7:
for all internal node n ∈ Ti do
8:
if n is root of complete binary subtree then
9:
delete this subtree
10:
if (number of leaves of Ti ) > (number of nodes of Ti that have only one child) then
11:
for all internal node ∈ Ti do
12:
if it has only one child then
13:
if the child is a leaf then
14:
delete the child
15:
create the other child for it
16:
mark Ti as a negative tree
17:
T = T ∪ {Ti }
Algorithm 2.2 Positive-Negative Selection Algorithm.
1: procedure PNSA(T , r, s)
Input: A set T of − r + 1 prefix trees presenting all r-chunk detectors, a matching
threshold r ∈ {1, . . . , }, an unlabeled string s ∈ Σ .
Output: A label of s (as self or nonself).
2:
f lag = true
A temporary boolean variable
3:
i=1
4:
while (i ≤ − r + 1) and (f lag = true) do
5:
if (Ti is positive tree) and (s ∈
/ Ti ) then
6:
f lag = false
7:
if (Ti is negative tree) and (s ∈ Ti ) then
8:
f lag = false
9:
i=i+1
10:
if f lag = false then
11:
output s is nonself
12:
else
13:
output s is self
2.2 Experiments
Table 2.1 shows the results on detector memory storage and detection time of PNSA compared
to one of the popular single NSAs on some combinations of S,
and r. We have conducted
another experiment by choosing = 40, |S| = 20,000 (S is the set of randomly generated binary
strings of length ) and varying r (from 15 to 40). Then, −r +1 trees were created using single
NSA and other
− r + 1 compact trees were created using PNSA. Next, both detector sets
were used to detect every s ∈ S. Next experiment is conducted on Netflow dataset. Table 2.2
9
Table 2.1: Comparison of memory and detection time reductions.
S
1,000
2,000
2,000
2,000
50
30
40
50
r Memory (%)
12
0
15
2.5
17
25.9
20
36.3
Time (%)
0
5
42.7
50
shows some of the experiment steps. The percentage of node reduction is in the final column.
Table 2.2: Comparison of nodes generation on Netflow dataset.
r
5
10
15
20
25
30
35
40
45
NSA
727
33,461
1,342,517
9,428,132
18,997,102
29,668,240
42,596,987
58,546,497
79,034,135
PNSA
Reduction(%)
706
2.89
31,609
5.53
1,154,427
14.01
6,157,766
34.68
11,298,739
40.52
17,080,784
42.42
24,072,401
43.48
32,841,794
43.90
44,194,012
44.08
10
Chapter 3
GENERATION OF COMPACT
DETECTOR SET
3.1 New negative selection algorithm
Given a non-empty set S of self strings of length , and an integer r ∈ {1, . . . , − r + 1}, this
section presents a new NSA bases on rcbvl matching rule. Some prefix trees are first used to
generate perfect detectors set from S and then this set is used to distinguish if a new sample
as self or non-self.
Algorithm 1 summarizes the first phase of new NSA. From the description of the algorithm, it takes |S|.( − r + 1).r steps to generate ( − r + 1) prefix trees and |D|.( − r + 1).2r
steps to generate perfect detector set D.
Example 3.1. Given , r and the set of self strings S as in Example 1.1, S = {s1 = 010101, s2
= 111010, s3 = 101101, s4 = 100011, s5 = 010111}. Some steps in the Algorithm 1 generating
a perfect detector set {(0001,1,2), (00100,1,4), (100,4,4), (011110,1,4), (11000,1,3)} are:
Set D is first created as (00,1,1), (011,1,1), (110,1,1). Then the for loop (lines 13-29) calculates
D and D1 as following:
For i = 2: D = (0001,1,2); (0010,1,2); (0111,1,2); (1100,1,2) and D1 = ∅.
For i = 3: D = (00100,1,3); (01111,1,3); (11000,1,3) and D1 = (0001,1,2).
For i = 4: D = (00100,1,4); (011110,1,4) and D1 = (0001,1,2); (11000,1,3); (100,4,4).
The final step, D = D
D1 in line 30, generates the perfect detector set {(0001,1,2), (00100,1,4),
(100,4,4), (011110,1,4), (11000,1,3)}.
To detect if a given string s is self or non-self, we simply check our rcbvl matching rule
on s against every detector in D. If it is the case, output s is nonself, otherwise s is self.
3.2 Experiments
The flow-based NetFlow is used for experiment 1. A randomly created dataset is used for
experiment 2. Flows in NetFlow are converted into binary strings by two steps. The first
step is to map all features to binary string features. After this step, a total string features
are constructed for both normal data and anomalous one. The second step is to concatenate
11
Algorithm 3.1 Algorithm to generate perfect rcbvl detector set
1: procedure GenerationDetectors(S, , r, D)
2:
for i = 1, ..., − r + 1 do Create an empty prefix tree Ti
3:
for all s ∈ S do
4:
for i = 1, ..., − r + 1 do
5:
insert every s[i . . . i + r − 1] into Ti
6:
for i = 1, ..., − r + 1 do
7:
for all non-leaf node n ∈ Ti and all σ ∈ Σ do
8:
if no edge with label σ starts at n then
9:
create a new leaf n and an edge (n, n ) labeled with σ.
10:
delete every node n ∈ Ti from which none of the newly created leaves is reachable.
11:
D1 = ∅
12:
D = { (s, 1, 1)|s ∈ T1 }
13:
for i = 2, ..., − r + 1 do
14:
D2 = ∅
15:
for all (s, k, j) ∈ D do
16:
if there exists a s ∈ Ti where s[i − k + 1...|s|] is prefix of it then
17:
D2 = D2 {(s + s [|s| − j + k...|s |], k, i)}
18:
delete every node n ∈ Ti from which only nodes in the s is reachable
19:
for all s ∈ Ti where s[i − k + 1...|s|] is prefix of it do
20:
if |s| − i + k < r then
21:
D2 = D2 {(s[|s|] + s , i − 1, i)}
22:
else
23:
D2 = D2 {(s , i, i)}
24:
delete every node n ∈ Ti from which only nodes in the s is reachable
25:
else
26:
D1 = D1 {(s, k, j)}
27:
for all s ∈ Ti do
28:
D2 = D2 {(s , i, i)}
29:
D = D2
30:
D = D D1
12
the binary string features for every flows. After this step, dataset contains binary strings with
their length of 49. The distributions of training and testing datasets as well as parameters r,
for 4 experiments are described in Table 3.1.
Table 3.1: Data and parameters distribution for experiments and results comparison.
r
Train
Test
49 10 119,571 10,000
49 8 79,571 50,000
30 12
30 14
25,000
40,000
25,000
10,000
Size (in
r-chunk
Case 1
206,810
31,672
Case 2
367,092
2,324,056
bits)
rcbvl
Time (in minutes)
r-chunk
rcbvl
42,704
8,096
1
0.8
0.2
0.18
79,222
392,815
4.1
8.65
0.75
1.4
Results in Table 3.1 show that our proposed algorithm reduce both size (bits) of detectors
and time to classify testing dataset in comparison with that of best algorithm proposed in 2004.
13
Chapter 4
FAST SELECTION ALGORITHMS
4.1 A fast negative selection algorithm based on r-chunk detector.
Table 4.1 shows the comparison of our results with the runtimes of previously published
r-chunk detector-based algorithms. All runtimes for training phase and classification phase
are given for a binary alphabet (|Σ| = 2) since not all algorithms are applicable to arbitrary
alphabets. The parameter K = min{|S[i..i+r−1]|, i = 1, . . . , −r+1}. The parameter |D|, the
desired number of detectors, is only applicable to algorithms that generate detectors explicitly.
Our algorithm and algorithm in by Textor produce the results that would be obtained with
the maximal number of generated detectors. We proposed a NSA that produces a substantial
performance improvement in training phase compared to the most effective published NSA by
Textor.
Table 4.1: Comparison of our results with the runtimes of previously published algorithms.
Matching rules
r-chunk
r-contiguous
Algorithms
Training
r
Stibor et al.
(2 + |S|)( − r + 1)
Elberfeld, Textor
r2 |S|( − r)
Elberfeld, Textor
|S| r
Present thesis
|S|
D’haeseleer et al. (linear)
(2r + |S|)( − r)
D’haeseleer et al. (greedy)
2r |S|( − r)
Wierzcho´
n
2r (|D|( − r) + |S|)
Elberfeld, Textor (2009)
|S|3 3 r3
Elberfeld, Textor (2011)
|S| r
Present thesis
|S| + (2r − K).
Classification
|D|
|S| 2 r
|D|
|D|
|D|
|S|2 3 r3
r
We use the following notations: An array Q, where Q[s][c] is a pointer used for creating
new nodes in the tree, s ∈ Σr−1 , and c ∈ Σ; An array P, where P [i] is a structure of two fields,
a pointer P[i].end and a string P[i].str ∈ Σr−1 .
Algorithm 4.3 summarizes the overall of proposed Chunk-NSA. In the algorithm, we use
a self set S, an integer r ∈ {1, . . . , − r + 1}, a cell string s to be detected, a prefix DAG G.
14
Algorithm 4.1 To generate positive r-chunk detectors set
1: procedure PositiveR-chunkDetector(S, r, G)
Input: A set of self strings S ⊆ Σ , a matching threshold r ∈ {1, . . . , }.
Output: A prefix DAG G presenting positive r-chunk detectors set.
2:
G=∅
3:
for i = 1, ..., |S| do
4:
insert si [1 . . . r] into G and assign P [i].end to the leaf node in path s[1 . . . r]
5:
P [i].str = s[2 . . . r]
6:
for i = r + 1, ..., do
7:
for j = 1, ..., |S| do
8:
if Q[P [j].str][sj [i]] = N U LL then
9:
Q[P [j].str][sj [i]] = new()
10:
for j = 1, ..., |S| do
11:
p = P [j].end
12:
for c ∈ Σ do
13:
if (Q[P [j].str][c] = N U LL)&&(edge starts from p with label c does
not exist) then
14:
create an edge starts from p with label c to Q[P [j].str][c]
15:
P [j].end = end node of the edge starts from p with label sj [i]
16:
for j = 1, ..., |S| do
17:
Q[P [j].str][sj [i]] = N U LL
18:
P [j].str = P [j].str[2...r − 1] + sj [i]
Chunk-NSA will detect s as self or nonself.
Theorem 4.1. Given any S ⊆ Σ , s ∈ Σ and r ∈ {1, . . . , }, algorithm Chunk-NSA constructs
an automaton M with L(M ) ∩ Σ = L(CHU N K(S, r)) in time O(|S|. .|Σ|) and checks if
s ∈ L(M ) in time O( ).
4.2 A fast negative selection algorithm based on r-contiguous detector
Two arrays P and Q are used in this section as the same meaning in the previous section.
Besides, we use two set P1 and P2 of pointers for chunk detectors. They swap their roles at
the end of each step.
Theorem 4.2. There exists an algorithm that, given any S ⊆ Σ and r ∈ {1, . . . , }, constructs
a finite automaton M with L(M ) ∩ Σ = CON T (S, r) in time O(|S|. .|Σ| + (Σr − K). ), where
K= min{|S[i..i + r − 1]|, i = 1, . . . , − r + 1}.
4.3 Experiments
In our experiments, we use the NetFlow and a random datasets for experiment 1 and experiment 2, respectively. In Table 4.2, the runtime of NSA by Textor (2011) on experiment 1 and
experiment 2 are in columns a and c, respectively. Also, the runtime of proposed Chunk-NSA
on experiment 1 and experiment 2 are in columns b and d, respectively.
15
Algorithm 4.2 To generate negative r-chunk detectors set
1: procedure NegativeR-chunkDetector(S, r, G)
Input: A set of self strings S ⊆ Σ , a matching threshold r ∈ {1, . . . , }.
Output: A prefix DAG G presenting negative r-chunk detectors set.
2:
PositiveR-chunkDetector(S, r, G)
3:
create a special node n
4:
for each non-leaf node n ∈ G do
5:
for c ∈ Σ do
6:
if no edge with label c starts at n then
7:
create new edge (n, n ) labeled with c
8:
for each node n ∈ G do
9:
if n is not reachable to n then
10:
delete n
Algorithm 4.3 A fast r-chunk detector-based NSA
1: procedure Chunk-NSA(s, S, r, G, M )
Input: A set of self strings S ⊆ Σ , an unlabeled string s ∈ Σ , a matching threshold
r ∈ {1, . . . , }.
Output: A prefix DAG G presenting negative r-chunk detectors set, an automaton M , a
label of s (self or nonself).
2:
NegativeR-chunkDetector(S, r, G)
3:
turn G into an automaton M
4:
if s ∈ L(M ) then
5:
output s is nonself
6:
else
7:
output s is self
Table 4.2: Comparison of Chunk-NSA with r-chunk detector-based NSA.
r
10
11
12
13
14
15
16
17
18
19
20
Experiment 1
a
b a/b
1,330 454 2.9
1,395 439 3.2
1,564 454 3.4
1,767 435 4.1
1,771 418 4.2
2,092 486 4.3
1,985 437 4.5
2,071 391 5.3
2,249 410 5.5
2,345 375 6.3
2,859 359 7.0
Experiment 2
c
d
c/d
1,490 482 3.1
1,633 472 3.5
637 360 4.5
2,134 453 4.7
2,276 451 5.0
2,793 450 6.2
3,086 365 8.5
4,079 427 9.6
4,509 422 10.7
5,312 470 11.3
6,796 437 15.6
16
Algorithm 4.4 Algorithm to generate negative r-contiguous detectors set
1: procedure Cont-NSA(S, r, G)
Input: A set of self strings S ⊆ Σ , a matching threshold r ∈ {1, . . . , }.
Output: An automaton G presenting negative r-contiguous detectors set.
2:
G = P1 = ∅
3:
for s ∈ Σr \ S[1 . . . r] do
4:
insert s into G and create p.end that points to the leaf node in path s
5:
p.str = s[2 . . . r]
6:
P1 = P1 ∪ {p}
7:
for i = r + 1, ..., do
8:
for j = 1, ..., |S| do
9:
if Q[P [j].str][sj [i]] = N U LL then
10:
Q[P [j].str][sj [i]] = new()
11:
P2 = ∅
12:
for each p ∈ P1 do
13:
for c ∈ Σ do
14:
if (Q[p.str][c] = N U LL) then
15:
Q[p.str][c] = new()
16:
p.str = p.str[2...r − 1] + c
17:
p.end = Q[p.str][c]
18:
P2 = P2 ∪ {p}
19:
else
20:
if Q[p.str][c] is newly created in this inner for loop then
21:
p.str = p.str[2...r − 1] + c
22:
p.end = Q[p.str][c]
23:
P2 = P2 ∪ {p}
24:
for j = 1, ..., |S| do
25:
Q[P [j].str][sj [i]] = N U LL
26:
P [j].str = P [j].str[2...r − 1] + sj [i]
27:
for each p ∈ P1 do
28:
for c ∈ Σ do
29:
Q[p.str][c] = N U LL
30:
P1 = P2
31:
for each node n ∈ G do
32:
if n is not reachable to a leaf node ∈ P1 then
33:
delete n
34:
turn G into an automaton
17
Chapter 5. APPLYING HYBRID ARTIFICIAL IMMUNE
SYSTEM FOR NETWORK SECURITY
5.1 Hybrid Positive Selection Algorithm With Chunk Detectors
Given , r, a normal data set N ⊂ Σ , an abnormal data set A ⊂ Σ . Algorithm 4.5 and
Algorithm 4.6 create positive trees and negative trees, respectively. A new data instance
s ∈ Σ is detected as self or nonself by Algorithm 4.7.
Algorithm 4.5 Algorithm to generate positive trees
1: procedure SelfTreesGeneration(N , r, TN )
Input: A set of self strings N ⊆ Σ , a matching threshold r ∈ {1, . . . , }.
Output: A set TN of − r + 1 prefix trees presenting positive trees.
2:
for i = 1, ..., do
3:
Create an empty tree TNi
4:
for all s ∈ Nr do
5:
for i = 1, ..., do
6:
insert every s[i . . . i + r − 1] into TNi
Algorithm 4.6 Algorithm to generate negative trees
1: procedure NonselfTreesGeneration(A, r, TA )
Input: A set of nonself strings A ⊆ Σ , a matching threshold r ∈ {1, . . . , }.
Output: A set TA of − r + 1 prefix trees presenting negative trees.
2:
for i = 1, ..., do
3:
Create an empty tree TAi
4:
for all s ∈ Ar do
5:
for i = 1, ..., do
6:
insert every s[i . . . i + r − 1] into TAi
In Algorithm 4.7, Leaf(s, T ) is a function to return frequency value corresponding with a
string s ∈ Σr , this value is contained in the leaf of the path labeled by s in tree T . Parameters
d1 and d2 sum up frequencies of s in positive trees TN and negative trees TA , respectively.
Four other parameters t1 , t2 , t3 , t4 are also used in PSA2 to control its performance.
18
Algorithm 4.7 Algorithm PSA2 to detect if a new data instance s ∈ Σ is self or nonself
1: procedure PSA2(N , A, s, r, TN , TA )
Input: A set of nonself strings N ⊆ Σ , a set of self strings A ⊆ Σ , an unlabeled string
s ∈ Σ , a matching threshold r ∈ {1, . . . , }.
Output: A set TA of − r + 1 prefix trees presenting negative trees, a set TN of − r + 1
prefix trees presenting positive trees, a label of s (self or nonself).
2:
SelfTreesGeneration(N , r, TN )
3:
NonselfTreesGeneration(A, r, TA )
4:
d1 = d2 = d3 = 0
5:
Create a string sr as ring representation of the string s
6:
for i = 1, ..., do
7:
s = sr [i . . . i + r − 1]
8:
if s ∈
/ TNi then
9:
d1 = d1 + 1
10:
if s ∈
/ TAi then
11:
d2 = d2 + 1
12:
if Leaf(s , TNi ) < Leaf(s , TAi ).t1 then
13:
d3 = d3 + 1
14:
if d1 > t2 then output s is nonself
15:
else if d2 > t3 then output s is self
16:
else if d3 .t4 > then output s is nonself
17:
else output s is self
5.2 Experiment
5.2.1 Datasets
We use two popular flow-based datasets: NetFlow and TU. Similar to the previous studies, we
select the same 4 features from the NetFlow dataset as the input of experiments 1, 5, 6, and
7; 7 features from the NetFlow dataset as the input of experiments 2 and 3; 4 features from
the UT dataset as the input of experiment 4.
Table 4.3: Features for NIDS.
Experiment Dataset Feature
1
NetFlow Packets, Octets, Duration, Flags
2, 3
NetFlow Packets, Octets, Duration, Scr port,
Dst port, Flags, IP protocol
4
UT
Packets, Octets, Duration, Flags
5, 6, 7
NetFlow Packets, Octets, Duration, Flags
5.2.2 Data preprocessing
The preprocessing for the features of training dataset is composed of two steps. The first step
is to map all features to binary string features. After this step, a total string features are
constructed for both normal data and anomalous one. The second step is to concatenate the
binary string features for every flows. After this step, training dataset contains equaled-length
binary strings. For testing dataset, each feature will be convert to a binary string using the
19
map of training data. If a feature value is not in range of corresponding training feature range,
its binary form will be selected randomly.
Example 4.1. Given a training dataset contain numerical two-feature flows S = {N, A},
where set of normal data N = {n1 = (1; 8); n2 = (0.5; 6)} and anomalous data A = {a1 =
(1; 9); a2 = (1; 6), a3 = (0.5; 9)}.
The ranges of the first feature and the second one are {0.5; 1} and {6; 8; 9}, respectively.
Therefore, we can use one bit (two bits) to encode the first (the second) feature. A map of the
numeric ranges {0.5; 1} and {6; 8; 9} can be binary string ranges {0; 1} and {00; 01; 10},
respectively. Finally, the training dataset contains strings of length 3, S = {N, A}, where N
= {n1 = 101; n2 = 000} and A = {a1 = 110; a2 = 100, a3 = 010}.
Assuming a flow s = (2; 8) is in testing dataset. The first feature (2 ∈
/ {0.5; 1}) is
randomly encoded by 0 or 1 while the second feature (8 ∈ {6; 8; 9}) is encoded by 01. The
binary form of s is 001 or 101.
5.2.3 Performance metrics
We used a 10-fold cross-validation technique to evaluate our approach in Experiments 1 and
4. In Experiments 2, 3 and 5-7, a hold-out validation is used to evaluate our approach with
subsets of Netflow dataset as described in Table 4.4.
Table 4.4: Distribution of flows and parameters for experiments.
Experiment Normal flows
1
105,033
2
59,980 for training,
45,053 for testing
3
59,980 for training,
45,053 for testing
4
5,968
5
70,022 for training,
35,011 for testing
6
52,516 for training,
52,517 for testing
7
35,011 for training,
70,022 for testing
Attack flows
24,538
5,952 for training,
18,586 for testing
5,952 for training,
18,586 for testing
10,000
16,359 for training,
8,179 for testing
12,269 for training,
12,269 for testing
8,179 for training,
16,359 for testing
49
63
r t1
10 5
10 7
t2
7
9
t3
8
14
t4
9
7
53
9
7
12 16
8
41
49
7
14
1
3
7 2
37 11
4
4
48
12
4
19
7
5
46
11
2
30
8
3
Algorithm PSA2 requires the 5 integer-valued parameters r, t1 , t2 , t3 , t4 , which control
the detection performance of NIDS. To obtain optimal values of parameters for a given dataset,
we first predict the possible value ranges for them. Then we test with every quintuple (r, t1 , t2 ,
t3 , t4 ) in these ranges on a randomly sampled subset, usually one-tenth, of the testing dataset.
Finally, we retest the system on the whole testing dataset in localities of parameters in the
chosen quintuple. Regarding to 10-fold cross-validation, the chosen quintuple in the fold 1 is
20
used to estimate the others folds to reduce training time. Additionally, value of is determined
by encoding technique in data preprocessing. Distribution of parameter values and a number
of flows for all four experiments are showed in Table 4.4.
Since it is challenging to decide which objective function is the best to choose an optimal
quintuple. More specifically, optimization problem in this research is a multi-objective optimization problem: we aim to increase the DR while reducing the FAR at the same time. We
select our objective function to reach the highest DR while keeping FAR as low as possible.
5.2.4. Performance
Table 4.5 illustrates the results of all experiments. We compare PSA2 performance with some
other algorithms in most recently published works and some ones run with the help of WEKA
tool (Version 3.6).
21
Table 4.5: Comparison between PSA2 and other algorithms.
Algorithms
ACC
DR
FAR
0.9879
0.9682
0.9273
0.9344
0.9468
0.9614
0.6777
0.9943
0.9941
0.9977
0.9992
0.9992
0.9975
0.9974
0.9974
0.9975
0.9976
0.9977
0.0146
0.0514
0.1223
0.1071
0.0839
0.0871
0.5116
0.0064
0.0067
0.9896
0.9812
0.9796
0.9763
0.9672
0.9655
0.9309
0.7596
0.9676
0.9645
0.9622
0.9564
0.9369
Experiment 1
PSA2
BBNN
SVM (linear)
SVM (polynomial)
SVM (RBF)
SVM (sigmoid)
Naive Bayes
Random Forest
Random Tree
Experiment 2
PSA2
PSOGSA-based MLP
Cuckoo-based MLP
GSA-based MLP
PSO-based MLP
EBP-based MLP
Random Forest
Random Tree
0.0013
0.0119
0.0133
0.0155
0.0203
0.0315
0.7634
0
0.1771 0.0001
Experiment 3
PSA2 (10% labelled data)
S4VM (10% labelled, 90% unlabelled data)
Random Forest (10% labelled data)
Random Tree (10% labelled data)
Experiment 4
PSA2
Naive Bayes
SVM (linear)
SVM (polynomial)
SVM (RBF)
SVM (sigmoid)
Random Forest
Random Tree
Experiment 5
PSA2
Random Forest
Experiment 6
PSA2
Random Forest
Experiment 7
PSA2
Random Forest
0.9781 0.9626 0.0157
0.9196
0.0384
0.7157 0.0421 0.0063
0.7151 0.0400 0.0065
0.9757
0.6832
0.7315
0.7143
0.6263
0.6263
0.9777
0.9766
0.9634
0.9972
1.0000
0.8788
1.0000
1.0000
0.9675
0.9664
0.0035
0.8668
0.7185
0.5613
0.9998
1.0000
0.0052
0.0062
0.9845 0.9979 0.0186
0.9818 0.9997 0.0868
0.9961 0.9915 0.0028
0.9870 0.9998 0.0636
0.9847 0.9979 0.0183
0.9901 0.9999 0.0490
The results of Experiment 1 show that PSA2 reaches best performance when we set
quintuple values (10, 5, 7, 8, 9) for parameters quintuple (r, t1 , t2 , t3 , t4 ), in which the
22
remarkable FAR is 1.46%, and ACC is 98.79%. Moreover, DR is ranked the second with
99.77%. BBNN and SVM (linear) have the highest DR, but their FAR are not comparable
with that of PSA2. Both Random Forest and Random Tree perform excellently with highest
ACC, admirable DR and genuine FAR.
Results of Experiment 2 show that PSA2 outperforms four algorithms proposed by Jadidi
(2013) in terms of three metrics DR, ACC and FAR. Also, PSA2 is better than both Random
Forest and Random Tree in terms of ACC and DR.
PSA2 can not train with unlabelled data, so we only use 10% labelled dataset, 5998 attack
flows and 595 normal flows, for training phase in Experiment 3. Meanwhile algorithm S4VM
uses 100% training dataset for training, in which 90% unlabelled dataset and 10% labelled
dataset. Results from the experiment strongly confirm the efficiency of PSA2 in comparison
with methods proposed by Jadidi (2015). Both Random Forest and Random Tree perform
badly with very low ACC and DR.
In Experiment 4, PSA2 is competitive with both Random Forest and Random Tree.
Some SVMs achieved admirable accuracies with 100%, but their FARs are very high with
approximately or even 100%. Poor FARs of these algorithms mean that they can not verify
the nature of benign traffic in the dataset, while PSA2, Random Forest and Random Tree can.
In experiments 5-7, ACCs and DRs of two algorithms PSA2 and Random Forest are similar.
However, FARs of PSA2 are better (far lower) than that of Random Forest.
In all experiments, FAR of PSA2 is always the lowest among that of compared algorithms
except for Random Forest and Random Tree in experiments 1-3. This result partly proves the
consistency of PSA2 for NIDS.
Two drawbacks of PSA2 are listed: 1- It must store a map to encode each new sample
into a binary string in detection phase. The size of the map depend on the distribution of flow
features’ domains. However, all four experiments show that map storage does not exceed 1%
size of training dataset. 2- Time to tune parameters is an expensive factor in PSA2. Depending
on training data, it takes about 2-5 hours to choose optimal parameters in the experiments.
23
CONCLUSIONS
Contributions of this thesis
1. Propose a ring representation of data instead of linear one for better performance in
terms of both detection rate and accuracy rate.
2. Propose an algorithm PNSA that combine two selection algorithms in a uniform for
compact representation of data. Performance of the algorithm is highly guaranteed by
experiment result and theoretical proof.
3. Propose a negative selection algorithm with variable length of detectors to generate a
complete a and non-redundant detector sets to reduce detectors storage and classification
time.
4. Propose a algorithm Chunk-NSA and experimentally and theoretically prove that it is r
times faster in data training compared with a recently published algorithm.
5. Propose an algorithm, called PSA2, to apply a hybrid algorithm that combines PSA
and some statistical approaches to achieve better performance of intrusion detection in
compared with some recently published work.
Future work
• Combine our algorithms with some machine learning methods to have better detection
performance.
• Further develop technique that can choose optimal parameters and integrate them in
new objective functions for hybrid NIDS.
• Improve proposed algorithms to apply them on other data types with different data
representations, matching rule is also a future research direction.
• Further optimize Cont-NSA for better detection time and optimal training time.
Published work
A1. N. V. Truong and P. D. Lam, “Improving negative selection algorithm in artificial immune
systems for computer virus detection,” Journal of Science and Technology, Thai Nguyen
University, 72(06):53–58, 2010.
A2. N. V. Truong, V. D. Quang and T. V. Ha, “A fast r-chunk detector-based negative selection
algorithm,” Journal of Science and Technology, Thai Nguyen University,90(02):55–58, 2012.