Tải bản đầy đủ (.pdf) (103 trang)

Luận án tiến sĩ cải tiến một số thuật toán trong miễn dịch nhân tạo cho phát hiện xâm nhập mạng

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.56 MB, 103 trang )

MINISTRY OF EDUCATION
AND TRAINING

VIETNAMESE ACADEMY
OF SCIENCE AND TECHNOLOGY

GRADUATE UNIVERSITY OF SCIENCE AND TECHNOLOGY
————————————

NGUYEN VAN TRUONG

IMPROVING SOME ARTIFICIAL IMMUNE
ALGORITHMS FOR NETWORK INTRUSION
DETECTION

THE THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
IN MATHEMATICS

Hanoi - 2019


MINISTRY OF EDUCATION
AND TRAINING

VIETNAMESE ACADEMY
OF SCIENCE AND TECHNOLOGY

GRADUATE UNIVERSITY OF SCIENCE AND TECHNOLOGY
————————————

NGUYEN VAN TRUONG



IMPROVING SOME ARTIFICIAL IMMUNE
ALGORITHMS FOR NETWORK INTRUSION
DETECTION

THE THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
IN MATHEMATICS
Major: Mathematical foundations for Informatics
Code: 62 46 01 10
Scientific supervisor:
1. Assoc. Prof., Dr. Nguyen Xuan Hoai
2. Assoc. Prof., Dr. Luong Chi Mai

Hanoi - 2019


Acknowledgments
First of all I would like to thank is my principal supervisor, Assoc. Prof.,
Dr. Nguyen Xuan Hoai for introducing me to the field of Artificial Immune System.
He guides me step by step through research activities such as seminar presentations,
paper writing, etc. His genius has been a constant source of help. I am intrigued
by his constructive criticism throughout my PhD. journey. I wish also to thank my
co-supervisor, Assoc. Prof., Dr. Luong Chi Mai. She is always very enthusiastic in
our discussion promising research questions. It is a pleasure and luxury for me to work
with her. This thesis could not have been possible without my supervisors’ support.
I gratefully acknowledge the support from Institute of Information Technology,
Vietnamese Academy of Science and Technology, and from Thai Nguyen University
of Education. I thank the financial support from the National Foundation for Science
and Technology Development (NAFOSTED), ASEAN-European Academic University
Network (ASEA-UNINET).

I thank M.Sc. Vu Duc Quang, M.Sc. Trinh Van Ha and M.Sc. Pham Dinh
Lam, my co-authors of published papers. I thank Assoc. Prof., Dr. Tran Quang
Anh and Dr. Nguyen Quang Uy for many helpful insights for my research. I thank
colleagues, especially my cool labmate Mr. Nguyen Tran Dinh Long, in IT Research
& Development Center, HaNoi University.
Finally, I thank my family for their endless love and steady support.


Certificate of Originality
I hereby declare that this submission is my own work under my scientific supervisors, Assoc. Prof., Dr. Nguyen Xuan Hoai, and Assoc. Prof., Dr. Luong Chi Mai. I
declare that, it contains no material previously published or written by another person,
except where due reference is made in the text of the thesis. In addition, I certify that
all my co-authors allow me to present our work in this thesis.

Hanoi, 2019
PhD. student

Nguyen Van Truong


i

Contents

List of Figures
List of Tables

v
vii


Notation and Abbreviation
INTRODUCTION

viii
1

Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

Problem statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

Outline of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1 BACKGROUND
1.1

5

Detection of Network Anomalies . . . . . . . . . . . . . . . . . . . . . .

5


1.1.1

Host-Based IDS . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

1.1.2

Network-Based IDS . . . . . . . . . . . . . . . . . . . . . . . . .

6

1.1.3

Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

1.1.4

Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

1.2

A brief overview of human immune system . . . . . . . . . . . . . . . .

8


1.3

AIS for IDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

1.3.1

AIS model for IDS . . . . . . . . . . . . . . . . . . . . . . . . .

10

1.3.2

AIS features for IDS . . . . . . . . . . . . . . . . . . . . . . . .

11

Selection algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

1.4.1

12

1.4

Negative Selection Algorithms . . . . . . . . . . . . . . . . . . .



ii

1.4.2
1.5

1.6

1.7

Positive Selection Algorithms . . . . . . . . . . . . . . . . . . .

15

Basic terms and definitions . . . . . . . . . . . . . . . . . . . . . . . . .

16

1.5.1

Strings, substrings and languages . . . . . . . . . . . . . . . . .

16

1.5.2

Prefix trees, prefix DAGs and automata . . . . . . . . . . . . .

17


1.5.3

Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

1.5.4

Detection in r-chunk detector-based positive selection . . . . . .

20

1.5.5

Holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

1.5.6

Performance metrics . . . . . . . . . . . . . . . . . . . . . . . .

22

1.5.7

Ring representation of data . . . . . . . . . . . . . . . . . . . .

23


1.5.8

Frequency trees . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

1.6.1

The DARPA-Lincoln datasets . . . . . . . . . . . . . . . . . . .

27

1.6.2

UT dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

1.6.3

Netflow dataset . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

1.6.4


Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

2 COMBINATION OF NEGATIVE SELECTION AND POSITIVE SELECTION

30

2.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

2.2

Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

2.3

New Positive-Negative Selection Algorithm . . . . . . . . . . . . . . . .

31


2.4

Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

2.5

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

3 GENERATION OF COMPACT DETECTOR SET

43

3.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

3.2

Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

3.3


New negative selection algorithm . . . . . . . . . . . . . . . . . . . . .

45


iii

3.3.1

Detectors set generation under rcbvl matching rule . . . . . . .

45

3.3.2

Detection under rcbvl matching rule . . . . . . . . . . . . . . .

48

3.4

Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

3.5

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


49

4 FAST SELECTION ALGORITHMS

51

4.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

4.2

Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

4.3

A fast negative selection algorithm based on r-chunk detector . . . . . .

52

4.4

A fast negative selection algorithm based on r-contiguous detector . . .

57


4.5

Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62

4.6

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

5 APPLYING HYBRID ARTIFICIAL IMMUNE SYSTEM FOR NETWORK SECURITY

66

5.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

5.2

Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

5.3


Hybrid positive selection algorithm with chunk detectors . . . . . . . .

69

5.4

Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

70

5.4.1

Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

5.4.2

Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . .

71

5.4.3

Performance metrics and parameters . . . . . . . . . . . . . . .

72

5.4.4


Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

5.5

CONCLUSIONS
Contributions of this thesis

78
. . . . . . . . . . . . . . . . . . . . . . . . . . .

78

Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

79

Published works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80


iv

BIBLIOGRAPHY


81


v

List of Figures
1.1

Classification of anomaly-based intrusion detection methods . . . . . .

7

1.2

Multi-layered protection and elimination architecture . . . . . . . . . .

9

1.3

Multi-layer AIS model for IDS . . . . . . . . . . . . . . . . . . . . . . .

10

1.4

Outline of a typical negative selection algorithm. . . . . . . . . . . . . .

13


1.5

Outline of a typical positive selection algorithm. . . . . . . . . . . . . .

15

1.6

Example of a prefix tree and a prefix DAG. . . . . . . . . . . . . . . . .

18

1.7

Existence of holes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

1.8

Negative selections with 3-chunk and 3-contiguous detectors. . . . . . .

23

1.9

A simple ring-based representation (b) of a string (a). . . . . . . . . . .

25


1.10 Frequency trees for all 3-chunk detectors. . . . . . . . . . . . . . . . . .

26

2.1

Binary tree representation of the detectors set generated from S. . . . .

33

2.2

Conversion of a positive tree to a negative one. . . . . . . . . . . . . . .

33

2.3

Diagram of the Detector Generation Algorithm. . . . . . . . . . . . . .

35

2.4

Diagram of the Positive-Negative Selection Algorithm. . . . . . . . . .

37

2.5


One node is reduced in a tree: a compact positive tree has 4 nodes (a)
and its conversion (a negative tree) has 3 node (b). . . . . . . . . . . .

38

2.6

Detection time of NSA and PNSA. . . . . . . . . . . . . . . . . . . . .

40

2.7

Nodes reduction on trees created by PNSA on Netflow dataset. . . . . .

41

2.8

Comparison of nodes reduction on Spambase dataset. . . . . . . . . . .

41

3.1

Diagram of a algorithm to generate perfect rcbvl detectors set. . . . . .

47


4.1

Diagram of the algorithm to generate positive r-chunk detectors set. . .

55


vi

4.2

A prefix DAG G and an automaton M . . . . . . . . . . . . . . . . . .

4.3

Diagram of the algorithm to generate negative r-contiguous detectors set. 61

4.4

An automaton represents 3-contiguous detectors set. . . . . . . . . . . .

4.5

Comparison of ratios of runtime of r-chunk detector-based NSA to runtime of Chunk-NSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.6

57

62


63

Comparison of ratios of runtime of r-contiguous detector-based NSA to
runtime of Cont-NSA . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64


vii

List of Tables
1.1

Performance comparison of NSAs on linear strings and ring strings. . .

24

2.1

Comparison of memory and detection time reductions. . . . . . . . . .

39

2.2

Comparison of nodes generation on Netflow dataset. . . . . . . . . . . .

40


3.1

Data and parameters distribution for experiments and results comparison. 49

4.1

Comparison of our results with the runtimes of previously published
algorithms.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

4.2

Comparison of Chunk-NSA with r-chunk detector-based NSA. . . . . .

63

4.3

Comparison of proposed Cont-NSA with r-contiguous detector-based NSA. 64

5.1

Features for NIDS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

5.2


Distribution of flows and parameters for experiments. . . . . . . . . . .

73

5.3

Comparison between PSA2 and other algorithms. . . . . . . . . . . . .

74

5.4

Comparison between ring string-based PSA2 and linear string-based PSA2. 76


viii

Notation and Abbreviation
Notation
Length of data samples
Sr
|X|

Set of ring presentations of all strings in S
Cardinality of set X

Σ

An alphabet, a nonempty and finite set of symbols


Σk

Set of all strings of length k on alphabet Σ, where k is a

Σ∗

positive integer.
Set of all strings on alphabet Σ, including an empty string.

r

Matching threshold

Dpi

Set of all positive r-chunk detectors at position i.

Dni
CHUNKp (S, r)

Set of all negative r-chunk detectors at position i.
Set of all positive r-chunk detectors.

CHUNK(S, r)

Set of all negative r-chunk detectors.

CONT(S, r)


Set of all r-contiguous detectors.

L(X)
rcbvl

Set of all nonself strings detected by X.
r-contiguous bit with variable length.

Abbreviation
AIS

Artificial Immune System

ACC
ACO

Accuracy Rate
Ant Colony Optimization

ANIDS

Anomaly Network Intrusion Detection System

BBNN

Block-Based Neural Network

Chunk-NSA Chunk Detector-Based Negative Selection Algorithm
Cont-NSA
Contiguous Detector-Based Negative Selection Algorithm

DR

Detection Rate

DAG

Directed Acyclic Graph

FAR
GA

False Alarm Rate
Genetic Algorithm

HIS

Human Immune System

HIDS

Host Intrusion Detection System

IDS

Intrusion Detection System


ix

ML


Machine Learning

MLP
NIDS

Multilayer Perceptron
Network Intrusion Detection System

NS

Negative Selection

NSA

Negative Selection Algorithm

NSM
PNSA

Negative Selection Mutation
Positive-Negative Selection Algorithm

PSA

Positive Selection Algorithm

PSA2

Two-class Positive Selection Algorithm


PSO
PSOGSA

Particle Swarm Optimization
Particle Swarm Optimization-Gravitational Search Algorithm

RNSA

Real-valued NSA

SVM

Support Vector Machines

TCP
VNSA

Transmission Control Protocol
Variable length detector-based NSA


1

INTRODUCTION
Motivation
Internet users and computer networks are suffering from rapidly increasing number of attacks. In order to keep them safe, there is a need for effective security monitoring systems, such as Intrusion Detection Systems (IDS). However, intrusion detection
has to face a number of different problems such as large network traffic volumes, imbalanced data distribution, difficulties to realize decision boundaries between normal
and abnormal actions, and a requirement for continuous adaptation to a constantly
changing environment. As a result, many researchers have attempted to use different

types of approaches to build reliable intrusion detection system.
Computational intelligence techniques, known for their ability to adapt and to
exhibit fault tolerance, high computational speed and resilience against noisy information, are hopefully alternative methods to the problem.
One of the promising computational intelligence methods for intrusion detection
that have emerged recently are artificial immune systems (AIS) inspired by the biological immune system. Negative selection algorithm (NSA), a dominating model of AIS,
is widely used for intrusion detection systems (IDS) [55, 52]. Despite its successful
application, NSA has some weaknesses: 1-High false positive rate (false alarm rate)
and false negative rate, 2-High training and testing time, 3-Exponential relationship
between the size of the training data and the number of detectors possibly generated for
testing, 4-Changeable definitions of ”normal data” and ”abnormal data” in dynamic
network environment [55, 79, 92]. To overcome these limitations, trends of recent works
are to concentrate on complex structures of immune detectors, matching methods and
hybrid NSAs [11, 94, 52].
Following trends mentioned above, in this thesis we investigate the ability of
NSA to combine with other classification methods and propose more effective data


2

representations to improve some NSA’s weaknesses.
Scientific meaning of the thesis: to provide further background to improve performance of AIS-based computer security field in particular and IDS in general.
Reality meaning of the thesis: to assist computer security practicers or experts
implement their IDS with new features from AIS origin.
The major contributions of this research are: Propose a new representation of
data for better performance of IDS; Propose a combination of existing algorithms as
well as some statistical approaches in an uniform framework; Propose a complete and
non-redundant detector representation to archive optimal time and memory complexities.

Objectives
Since data representation is one of the factors that affect the training and testing

time, a compact and complete detector generation algorithm is investigated.
The thesis investigates optimal algorithms to generate detector set in AIS. They
help to reduce both training time and detecting time of AIS-based IDSs.
Also, it is regarded to propose and investigate an AIS-based IDS that can
promptly detect attacks, either if they are known or never seen before. The proposed
system makes use of AIS with statistics as analysis methods and flow-based network
traffic as experimental data.

Problem statements
Since the NSA has some limitations as listed in the first section, this thesis
concentrates on three problems:
1. The first problem is to find compact representations of data. Objectives of this
problem’s solution is not only to minimize memory storage but also to reduce
testing time.
2. The second problem is to propose algorithms that can reduce training time
and testing time in compared with all existing related algorithms.


3

3. The third problem is to improve detection performance with respect to reducing false alarm rates while keeping detection rate and accuracy rate as high as
possible.
Solutions of these problems can partly improve first three weaknesses as listed in the
first section. Regarding to the last NSAs’ weakness about changeable definitions of
”normal data” and ”abnormal data” in dynamic network environment, we consider it
as a risk in our proposed algorithm and left it for future work.
Logically, it is impossible to find an optimal algorithm that can both reduce time
and memory complexities and obtain best detection performance. These aspects are
always in conflict with each other. Thus, in each chapter, we will propose algorithms
to solve each problem quite independently.

The intrusion detection problem mentioned in this thesis can be informally
stated as:
Given a finite set S of network flows which labeled with self (normal) or nonself
(abnormal). The objective is to build classifying models on S that can label exactly an
unlabeled network flow s.

Outline of thesis
The first chapter introduces the background knowledge necessary to discuss the
algorithms proposed in following chapters. First, detection of network anomalies is
briefly introduced. Following that, the human immune system, artificial immune system, machine learning and their relevance are reviewed and discussed. Then, popular
datasets used for experiments in the thesis are examined. related works.
In Chapter 2, a combination method of selection algorithms is presented. The
proposed technique helps to reduce detectors storage generated in training phase. Testing time, an important measurement in IDS, will also be reduced as a direct consequence
of a smaller memory complexity. Tree structure is used in this chapter (and in Chapter
5) to improve time and memory complexities.
A complete and nonredundant detector set, also called perfect detectors set,


4

is necessary to archive acceptable self and nonself coverage of classifiers. A selection
algorithm to generate a perfect detectors set is investigated in Chapter 3. Each detector
in the set is a string concatenated from overlapping classical ones. Different from
approaches in the other chapters, discrete structure of string-based detectors in this
chapter are suitable for detection in distributed environment.
Chapter 4 includes two selection algorithms for fast training phase. The optimal
algorithms can generate a detectors set in linear time with respect to size of training
data. The experiment results and theoretical proof show that proposed algorithms
outperform all existing ones in term of training time. In term of detection time, the
first algorithm and the second one is linear and polynomial, respectively.

Chapter 5 mainly introduces a hybrid approach of positive selection algorithm
with statistics for more effective NIDS. Frequencies of self and nonself data (strings) are
contained in leaves of trees representing detectors. This information plays an important
role in improving performance of the proposed algorithms. The hybrid approach came
as a new positive selection algorithm for two-class classification that can be trained
with samples from both self and nonself data types.


5

Chapter 1
BACKGROUND
The human immune system (HIS) has successfully protected our bodies against
attacks from various harmful pathogens, such as bacteria, viruses, and parasites. It
distinguishes pathogens from self-tissue, and further eliminates these pathogens. This
provides a rich source of inspiration for computer security systems, especially intrusion
detection systems [92]. Hence, applying theoretical immunology and observed immune
functions, its principles, and its models to IDS has gradually developed into a new
research field, called artificial immune system (AIS).
How to apply remarkable features of HIS to archive scalable and robust IDS
is considered a researching gap in the field of computer security. In this chapter, we
introduce the background knowledge necessary to discuss the algorithms proposed in
following chapters that can partly fulfill the gap.
Firstly, a brief introduction to network anomaly detection is presented. We
then overview HIS. Next, immune selection algorithms, detectors, performance metrics
and their relevance are reviewed and discussed. Finally, some popular datasets are
examined.

1.1


Detection of Network Anomalies
The idea of intrusion detection is predicated on the belief that an intruder’s

behavior is noticeably different from that of a legitimate user and that many unauthorized actions are detectable [65]. Intrusion detection systems (IDSs) are deployed as a
second line of defense along with other preventive security mechanisms, such as user


6

authentication and access control. Based on its deployment, an IDS can act either as
a host-based or as a network-based IDS.

1.1.1

Host-Based IDS
A Host-Based IDS (HIDS) monitors and analyzes the internals of a computing

system. A HIDS may detect internal activity such as which program accesses what
resources and attempts illegitimate access, for example, an activity that modifies the
system password database. Similarly, a HIDS may look at the state of a system and
its stored information whether it is in RAM or in the file system or in log files or
elsewhere. Thus, one can think of a HIDS as an agent that monitors whether anything
or anyone internal or external has circumvented the security policy that the operating
system tries to enforce [12].

1.1.2

Network-Based IDS
A Network-Based IDS (NIDS) detects intrusions in network data. Intrusions


typically occur as anomalous patterns. Most techniques model the data in a sequential
fashion and detect anomalous subsequences. The primary reason for these anomalies
is the attacks launched by outside attackers who want to gain unauthorized access to
the network to steal information or to disrupt the network. In a typical setting, a
network is connected to the rest of the world through the Internet. The NIDS reads
all incoming packets or flows, trying to find suspicious patterns. For example, if a
large number of TCP connection requests to a very large number of different ports are
observed within a short time, one could assume that there is someone committing a
port scan at some of the computers in the network. Port scans mostly try to detect
incoming shell codes in the same manner that an ordinary intrusion detection system
does. In addition to inspecting the incoming traffic, a NIDS also provides valuable
information about intrusion from outgoing or local traffic. Some attacks might even be
staged from the inside of a monitored network or network segment; and therefore, not
regarded as incoming traffic at all. The data available for intrusion detection systems
can be at different levels of granularity, like packet level traces or Cisco netflow data.


7

The data is high dimensional, typically, with a mix of categorical as well as continuous
numeric attributes. Misuse-based NIDSs attempt to search for known intrusive patterns
while an anomaly-based intrusion detector searches for unusual patterns. Today, the
intrusion detection research is mostly concentrated on anomaly-based network intrusion
detection because it can detect both known and unknown attacks [12].

1.1.3

Methods
On the basis of the availability of prior knowledge, the detection mechanism


used, the mode of performance and the ability to detect attacks, existing anomaly
detection methods are categorized into six broad categories [41] as shown in Fig. 1.1.
This figure is adapted from [12].
Supervised
Learning

Parametric
Clustering

Unsupervised
Learning

Probabilistic
Learning
Anomaly
Detection

Non-Parametric

Association
Mining
Outlier mining
ANN based
Rough Set based
Fuzzy Logic

Soft
Computing

GA based & Ant Colony

Artificial Immune System

Knowledge
based
Combination
Learners

Rule based & Expert
System based
Ontology & Logic based

Ensemble based
Fusion based
Hybrid

Figure 1.1: Classification of anomaly-based intrusion detection methods
AIS is a fairly new research subfield of Computational intelligence. It was
considered as a system that acts intelligently: What it does is appropriate for its
circumstances and its goal; it is flexible to changing environments and changing goals;
it learns from experience; also it makes appropriate choices given perceptual limitations
and finite computation [68].


8

1.1.4

Tools
IDS tools are used for purposes such as information gathering, victim identi-


fication, packet capture, network traffic analysis and visualization of traffic behavior.
These tools for both commercial and free purposes can be examplified, such as Snort,
Suricata, Bro, OSSEC, Samhain, Cisco Secure IDS, CyberCop, and RealSecure. Some
immune-related IDS tools including LISYS [10], which is based on TCP packages, and
MILA [26], a multilevel immune learning algorithm proposed for novel pattern recognition.
However, despite their initially promising and influential properties, immunebased IDSs never made it beyond the prototype stage [83]. Two main issues that
impeded the progress of immune algorithms were identified: large computational cost
to achieve acceptable coverage of the potentially anomalous region [54], and the failure
of these algorithms to generalize properly beyond the training set [79].

1.2

A brief overview of human immune system
Mainly being inspired by the human immune system, researchers have devel-

oped AISs intellectually and innovatively. Physical barriers, physiological barriers, an
innate immune system, and an adaptive immune system are main factors of a multilayered protection architecture included in our human immune system; among which,
the adaptive immune system being capable of adaptively recognizing specific types of
pathogens, and memorizing them for accelerated future responses is a complex of a
variety of molecules, cells, and organs spread all over the body [46]. Pathogens are foreign substances like viruses, parasites and bacteria which attack the body. Figure 1.2,
adapted from [77], presents a multi-layered protection and elimination architecture.
T cells and B cells cooperate to distinguish self from nonself. On the one hand,
T cells recognize antigens with the help of major histocompatibility complex (MHC)
molecules. Antigen presenting cells ingest and fragment antigens to peptides. MHC
molecules transport these peptides to the surface of antigen presenting cells. T cells,
whose receptors bind with these peptide-MHC combinations, are said to recognize


9


Figure 1.2: Multi-layered protection and elimination architecture
antigens. On the other hand, B cells recognize antigens by binding their receptors
directly to antigens. The bindings actually are chemical bonds between receptors and
epitopes. The more complementary the structure and the charge between receptors and
epitopes are, the more likely binding will occur. The strength of the bond is termed
affinity. To avoid autoimmunity, T cells and B cells must pass a negative selection
stage, where lymphocytes matching self cells are killed.
Prior to negative selection, T cells undergo positive selection. This is because in
order to bind to the peptide-MHC combinations, they must recognize self MHC first.
Thus, the positive selection will eliminate T cells with weak bonds to self MHC. T cells
and B cells, which survive the negative selection, become mature, and enter the blood
stream to perform the detection task. Since these mature lymphocytes have never
encountered antigens, they are naive. Naive T cells and B cells can possibly auto-react
with self cells, because some peripheral self proteins are never presented during the
negative selection stage. To prevent self-attack, naive cells need two signals in order
to be activated: one occurs when they bind to antigens, and the other is from other
sources as a confirmation. Naive T helper cells receive the second signal from innate
system cells. In the event that they are activated, T cells begin to clone. Some of
the clones will send out signals to stimulate macrophages or cytotoxic T cells to kill
antigens, or send out signals to activate B cells. Others will form memory T cells. The
activated B cells migrate to a lymph node. In the lymph node, a B cell will clone itself.


10

Meanwhile, somatic hyper mutation is triggered, whose rate is 10 times higher than
that of the germ line mutation, and is inversely proportional to the affinity. Mutation
changes the receptor structures of offspring; hence offspring have to bind to pathogenic
epitopes captured within the lymph nodes. If they do not bind, they will simply die
after a short time. Whereas, in case they succeed in binding, they will leave the lymph

node and differentiate into plasma or memory B cells.
In summary, the HIS is a distributed, self-organizing and lightweight defense
system for the body. These remarkable features fulfill and benefit the design goals of
an intrusion detection system, thus resulting in a scalable and robust system [53].

1.3
1.3.1

AIS for IDS
AIS model for IDS
Figure 1.3 illustrates the steps necessary to obtain an AIS solution for a secu-

rity problem, as firstly envisioned by de Castro and Timmis [27] and latter adopted
by Fernandes et al. [35]. Firstly, the security domain of the system to model needs
to be identified. Secondly,the immune entities that best fit the needs of the system
should be picked from the immunological theories. That should ease pointing out the
representation of the entities. In the step of the affinity measures one should take into
account a matching rule that outputs if two elements should bind.

Figure 1.3: Multi-layer AIS model for IDS


11

1.3.2

AIS features for IDS
According to Kim et al. [55], AIS features can be illustrated and summarized

as follows.

Firstly, a distributed IDS supports robustness, configurability, extendibility and
scalability. It is robust since the failure of one local intrusion detection process does
not cripple the overall IDS. It is also easy to configure a system since each intrusion
detection process can be simply tailored for the local requirements of a specific host.
The addition of new intrusion detection processes running on different operating systems does not require modification of existing processes and hence it is extensible. It
can also scale better, since the high volume of audit data is distributed amongst many
local hosts and is analyzed by those hosts.
Secondly, a self-organizing IDS provides adaptability and global analysis. Without external management or maintenance, a self organizing IDS automatically detects
intrusion signatures which are previously unknown and/or distributed, and eliminates
and/or repairs compromised components. Such a system is highly adaptive because
there is no need for manual updates of its intrusion signatures as network environments
change. Global analysis emerges from the interactions among a large number of varied
intrusion detection processes.
Next, a lightweight IDS supports efficiency and dynamic features. A lightweight
IDS does not impose a large overhead on a system or place a heavy burden on CPU
and I/O. It places minimal work on each component of the IDS. The primary functions
of hosts and networks are not adversely affected by the monitoring. It also dynamically covers intrusion and non-intrusion pattern spaces at any given time rather than
maintaining entire intrusion 8 and non-intrusion patterns.
One more important feature is a multi-layered IDS which increases robustness.
The failure of one-layer defense does not necessarily allow an entire system to be
compromised. While a distributed IDS allocates intrusion detection processes across
several hosts, a multi-layered IDS places different levels of sensors at one monitoring
place.
Additionally, a diverse IDS provides robustness. A variety of different intrusion


12

detection processes spread across hosts will slow an attack that has successfully compromised one or more hosts. This is because an understanding of the intrusion process
at one site provides limited or no information on intrusion processes at other sites.

Finally, it is a disposable IDS that increases robustness, extendibility and configurability. A disposable IDS does not depend on any single component. Any component
can be easily and automatically replaced with other components. These properties are
important in an effective IDS, as well as being established properties of the HIS.

1.4

Selection algorithms
The main developments within AIS have focussed on three immunological the-

ories: clonal selections, immune networks and negative selections. Negative selection
approaches are based on self-nonself discrimination in biology system. This property
makes it attractive for computer and network security researchers. A survey by G. C.
Silva and D. Dasgupta in [71] showed that in five-year period 2008-2013, NSA predominated all the other models of AIS in term of published papers relating to both network
security and anomaly detection. This trend triggers for much of the research work in
this thesis.
A model of AIS, positive selection algorithm (PSA), is also investigated. Under
some conditions, we will prove in a follow section that PSA is adequate to NSA in term
of anomaly detection performance.

1.4.1

Negative Selection Algorithms
Negative selection is a mechanism employed to protect the body against self-

reactive lymphocytes. Such lymphocytes can occur because the building blocks of
antibodies are different gene segments that are randomly composed and undergo a further somatic hypermutation process. Therefore, this process can produce lymphocytes
which are able to recognise self-antigens [85].
NSAs are among the most popular and extensively studied techniques in artificial immune systems that simulate the negative selection process of the biological
immune system. Stephanie Forrest et al. [38] proposed an algorithmic model of this



×