Tải bản đầy đủ (.pdf) (135 trang)

A STUDY OF GENERALIZATION IN GENETIC PROGRAMMING

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.11 MB, 135 trang )


MINISTRY OF EDUCATION AND TRAINING MINISTRY OF NATIONAL DEFENSE

MILITARY TECHNICAL ACADEMY







NGUYEN THI HIEN






A STUDY OF GENERALIZATION
IN GENETIC PROGRAMMING








THE THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
IN MATHEMATICS










Hanoi – 2014



MINISTRY OF EDUCATION AND TRAINING MINISTRY OF NATIONAL DEFENSE

MILITARY TECHNICAL ACADEMY











A STUDY OF GENERALIZATION
IN GENETIC PROGRAMMING




Specialized in: Applied Mathematics and Computer Science
Code: 60 46 01 10



THE THESIS IS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN MATHEMATICS




ACADEMIC SUPERVISORS
:
1.
Assoc/Prof. Dr. Nguyen Xuan Hoai
2. Prof. Dr. R.I. (Bob) McKay






Hanoi - 2014
i
Abstract
Genetic Programming (GP) is a meta-heuristic technique that simulates biological evolu-
tion. GP differs to other Evolutionary Algorithms (EAs) in that it generates solutions to
problems in the form of computer programs. The quality criterion for each individual is
often referred to as fitness in EAs and it is with which we determine which individuals

shall be selected. GP induces a population of computer programs that improve their fit-
ness automatically as they experience the data on which they are trained. According this,
GP can be seen as one of the machine learning methods. Machine learning is a process
that trains a system over some training data for capturing the succinct causal relationships
among data. The key parts of this process are the ”learning domain”, the ”training set”,
the ”learning system” and the ”testing” (validating) of the learnt results. The quality of a
learnt solution is its ability to predict the outputs from future unseen data (often simulated
on a test set), which is often referred to as ”generalization”. The goal of this thesis is to in-
vestigate the aspects that affect the generalization ability of GP as well as techniques from
more traditional machine learning literature that help to improve the generalization and
learning efficiency of GP. In particular, a number of early stopping criteria were proposed
and tested. The early stopping techniques (with the proposed stopping criteria) helped to
avoid over-fitting in GP producing much simpler solutions in much shorter training time.
Over-fitting generally occurs when a model is excessively complex, such as having too many
parameters relative to the number of observations. A model which has been over-fit will
ii
generally have poor predictive performance, as it can exaggerate minor fluctuations in the
data. The thesis also proposes a framework to combine progressive sampling, a traditional
machine learning technique, with early stopping for GP. The experimental results showed
that progressive sampling in combination with early stopping help to further improve the
learning efficiency of GP (i.e. to avoid over-fitting and produce simpler solutions in much
shorter training time). The early stopping techniques (with the proposed stopping crite-
ria) helped to avoid over-fitting in GP producing much simpler solutions in much shorter
training time.
iii
Acknowledgements
The first person I would like to thank is my supervisor for directly guiding me through the
PhD progress. He’s enthusiasm is the power source to motivate me in this work. His guide
has inspired much of the research in this thesis.
I also wish to thank my co-supervisor. Although staying far away, we usually had on-

line research chats. Working with him, I have learnt how to do research systematically. In
particular, the leaders of the Department of Software Engineering and Faculty of Informa-
tion Technology, Military Technical Academy have frequently supported my research with
regards to facilities and materials for running the experiments. Dr. Nguyen Quang Uy has
discussed a number of issues related to this research with me. I would like to thank him
for his thorough comments and suggestions on my research.
I also would like to acknowledge the supports from my family, especially my parents,
Nguyen Van Hoan and Nguyen Thi Quy, who have worked hard and always believed
strongly in their children and to my husband, Nguyen Hai Trieu for sharing happiness and
difficulty in the life with me. To my beloved daughter Nguyen Thi Hai Phuong, who was
born before this thesis was completed, I would like to express my thanks for being such a
good girl always cheering me up.
i
Contents
Abstract ii
List of Figures v
List of Tables vii
Abbreviations ix
0.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.2 Research Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
0.3 Contributions of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 3
0.4 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1 Backgrounds 6
1.1 Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.1 Representation, Initialization and Operators in GP . . . . . . . . . 7
1.1.2 Some Variants of GP . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2 An Example of Problem Domain . . . . . . . . . . . . . . . . . . . . . . . 19
1.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 GP and Machine Learning Issues . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.1 Search Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.4.2 Bloat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
ii
1.4.3 Generalization and Complexity . . . . . . . . . . . . . . . . . . . . 22
1.5 Generalization in GP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.1 Overview of Studies on GP Generalization Ability . . . . . . . . . . 24
1.5.2 Methods for Improving GP Generalization Ability . . . . . . . . . . 25
1.5.3 Problem of Improving Training Efficiency . . . . . . . . . . . . . . . 33
1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2 Early Stopping, Progressive Sampling, and Layered Learning 36
2.1 Over-training, Early Stopping and Regularization . . . . . . . . . . . . . . 36
2.1.1 Over-training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.1.2 Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.1.3 Regularization Learning . . . . . . . . . . . . . . . . . . . . . . . . 39
2.2 Progressive Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.2.1 Types of Progressive Sampling Schedule . . . . . . . . . . . . . . . 44
2.2.2 Detection of Convergence . . . . . . . . . . . . . . . . . . . . . . . 47
2.3 Layered Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3 An Empirical Study of Early Stopping for GP 51
3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.1.1 Proposed Classes of Stopping Criteria . . . . . . . . . . . . . . . . . 53
3.2 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.1 Comparison with GP . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.2 Comparison with Tarpeian . . . . . . . . . . . . . . . . . . . . . . . 76
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
iii
4 A Progressive Sampling Framework for GP - An Empirical Approach 79
4.1 The Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.1.1 The Sampling Schedule . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.1.2 The Initial Sample Set Size . . . . . . . . . . . . . . . . . . . . . . 84
4.1.3 The Number of Learning Layers . . . . . . . . . . . . . . . . . . . . 86
4.1.4 The Stopping Criterion for Each Layer . . . . . . . . . . . . . . . . 86
4.2 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.2.1 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.2.2 GP Systems Configurations . . . . . . . . . . . . . . . . . . . . . . 87
4.3 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3.1 Learning Effectiveness Comparison . . . . . . . . . . . . . . . . . . 89
4.3.2 Learning Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3.3 Solution Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3.4 Distribution of Best Solutions by Layers . . . . . . . . . . . . . . . 93
4.3.5 Synergies between System Components . . . . . . . . . . . . . . . . 94
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5 Conclusions and Future Work 99
5.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Bibliography 104
iv
List of Figures
1.1 GP’s main loop [57] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 GP expression tree representing max(x*x,x+3y). [57] . . . . . . . . . . . . . 8
1.3 Creation of a full tree having maximum depth 2 (and therefore a total of
seven nodes) using the Full initialisation method (t=time). [57] . . . . . . . 10
1.4 Creation of a five node tree using the Grow initialisation method with a
maximum depth of 2 (t=time). A terminal is chosen at t = 2, causing
the left branch of the root to be terminated at that point even though the
maximum depth has not been reached. [57] . . . . . . . . . . . . . . . . . . 12
1.5 Example of subtree crossover. . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Example of subtree mutation. [57] . . . . . . . . . . . . . . . . . . . . . . . 16
2.1 Idealized training and validation error curves. Vertical axis: errors; hori-

zontal axis:time [88] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2 A real validation error curve. Vertical: validation set error; horizontal: time
(in training epochs) [88]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3 Learning curves and progressive sample. . . . . . . . . . . . . . . . . . . . 44
2.4 Regions of the Efficiency of Arithmetic and Geometric Schedule Sampling [89]. 46
3.1 Distribution of stop time (last generation of the run) of the OF criterion
and GPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
v
3.2 Distribution of stop time (last generation of the run) of the VF criterion
and GPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.3 Distribution of stop time (last generation of the run) of the GL criterion
and GPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.4 Distribution of stop time (last generation of the run) of the PQ criterion
and GPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5 Distribution of stop time (last generation of the run) of the UP criterion
and GPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1 PS framework for GP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.2 GP Evolution of each layer . . . . . . . . . . . . . . . . . . . . . . . . . . 87
vi
List of Tables
1.1 Test Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2 The real-world data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1 Parameter settings for the GP systems . . . . . . . . . . . . . . . . . . . . 58
3.2 Data Sets for Problems. For the synthetic problems, the notation [min, max
] defines the range from which the data points are sampled. . . . . . . . . . 59
3.3 Generalization error and run time of VF, OF, True Adap stopping criterion
and GP, Tar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.4 Size of b est individual of VF, OF and True Adap stopping criterion and GP,
Tar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.5 p-value of run time and GE of VF, OF, True Adap stopping criterion and

Tar vs GP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.6 p-value of complexity of VF, OF, True Adap stopping criterion and Tar vs
GP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.7 Relative Generalization Errors and Run Time of OF, VF and True Adap
stopping criterion (vs GP) . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.8 Generalization error and run time of GL, PQ and UP stopping criterion . . 66
3.9 Size of the best individual of GL, PQ and UP stopping criterion . . . . . . 67
3.10 p-value of GL, PQ and UP stopping criterion vs GP . . . . . . . . . . . . . 68
vii
3.11 p-value of size of best individual of comparison GL, PQ and UP stopping
criterion with GP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.12 Relative Generalization Errors and Run Time of GL, PQ and UP stopping
criterion (vs GP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.13 p-value of GE, run time and size of best solution of True Adap, PQ vs Tarpeian 77
4.1 Data Sets for the Test Functions. . . . . . . . . . . . . . . . . . . . . . . . 88
4.2 Evolutionary Parameter Settings for the Genetic Programming Systems . . 89
4.3 Mean and SD of Generalization Error, GPLL and GP (14 Problems) . . . . 90
4.4 Relative Average Generalization Errors and p-values (GPLL vs GP) (14
Problems) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.5 Mean Run Time, GPLL and GP (14 Problems) . . . . . . . . . . . . . . . 91
4.6 Relative Run Times and p-values (GPLLs vs GP) (14 Problems) . . . . . . 92
4.7 Mean Solution Size and p-values, GPLL vs GP (14 Problems) . . . . . . . 93
4.8 Number of End-of-run Best Solutions First Found by GPLL in Each Layer 94
4.9 Average Generalization Error on 14 Problems . . . . . . . . . . . . . . . . 95
4.10 Average Run Time of All Systems . . . . . . . . . . . . . . . . . . . . . . . 96
4.11 Average Size of Solutions Found by all Systems . . . . . . . . . . . . . . . 97
viii
Abbreviations
Abbreviation Meaning
ARMA AutoregressiveMoving-Average

EA Evolutionary Algorithm
DSS Dynamic Subset Selection
GA Genetic Algorithms
GGGP Grammar Guided Genetic Programming
GP Genetic Programming
GPLL Layered Learning Genetic Programming
ILP Inductive Logic Programming
KDD Knowledge Discovery and Data Mining
LL Layered Learning
ML Machine Learning
MSE Mean Squared Error
PDGP Parallel Distributed GP
PS Progressive Sampling
RST Random Sampling Technique
ix
Introduction
Genetic programming (GP) paradigm was first proposed by Koza [53] can be viewed as the
discovery of computer programs which produce desired outputs for particular inputs. A
computer program could be an expression, formula, plan, control strategy, decision tree or
a model depending on the type of problem. GP is a simple and powerful technique which
has been applied to a wide range of problems in combinatorial optimization, automatic
programming and model induction. The direct encoding of solutions as variable-length
computer programs allows GP to provide solutions that can be evaluated and also examined
to understand their internal workings. In the field of evolutionary and natural computation,
GP plays a special role. In the Koza’s seminal book, the ambition of GP was to evolve in a
population of programs that can learn automatically from the training data. Since its birth,
GP has been developed fast with various fruitful applications. Some of the developments
are in terms of the novel mechanisms regarding to GP evolutionary process, others were on
finding potential areas on which GP could be successfully applied. Despite these advances
in GP research, when it is applied to learning tasks, the issue of generalization of GP, as a

learning machine, has not been taken the attention that it deserves. This thesis focuses on
the generalization aspect of GP and proposes mechanisms to improve GP generalization
ability.
1
0.1. MOTIVATIONS
0.1 Motivations
GP is one of the evolutionary algorithms (EAs). The common underlying idea behind all
EA techniques is the same: given a population of individuals the environmental pressure
causes natural selection (survival of the fittest) and this causes a rise in the fitness of the
population. Given a quality function to be maximised we can randomly create a set of
candidate solutions, i.e., elements of the function’s domain, and apply the quality function
as an abstract fitness measure - the higher the better. Based on this fitness, some of the
better candidates are chosen to seed the next generation by applying recombination and/or
mutation to them. Recombination is an operator applied to two or more selected candidates
(the so-called parents) and results one or more new candidates (the children). Mutation
is applied to one candidate and results in one new candidate. Executing recombination
and mutation leads to a set of new candidates (the offspring) that compete - based on
their fitness (and possibly age) - with the old ones for a place in the next generation. This
process can be iterated until a candidate with sufficient quality (a solution) is found or a
previously set computational limit is reached. GP has been proposed as a machine learning
(ML) method [53], and solutions to various learning problems are sought by means of an
evolutionary process of program discovery. Since GP often tackles ML issues, generalization
that is synonymous with robustness has been the desirable property at the end of the GP
evolutionary process. The main goal when using GP or other ML techniques is not only to
create a program that exactly cover the training examples (as in many GP applications so
far), but also to have a good generalization ability: for unseen and future samples of data,
the program should output values that closely resemble the underlying function. Although,
recently, the issue of generalization in GP has received more attention than it deserves,
the use of more traditional machine learning techniques has been rather limited. It is
strange that generalization has been an old time studied topic in machine learning with

2
0.2. RESEARCH PERSPECTIVES
many proposed practical techniques for improving learning machines (such as for decision
trees, neural networks, etc). It is hoped that adapting these techniques to GP would help
to improve GP generalization ability.
0.2 Research Perspectives
The focus of this thesis is on understanding the generalization of GP, the ways it can b e
improved and the role it plays in learning. Specifically, the generalization of GP is analysed
to uncover key features. Based on that the thesis proposes two new learning mechanisms
for GP, early stopping and progressive sampling. In the process of developing the better
learning frameworks for GP, a clearer description of the scalability of the framework emerges
to facilitate and motivate future enhancements. The current theoretical models of GP
are limited in use and applicability due to their complexity. Therefore, the majority of
theoretical work has been derived from experimentation. The approach taken in this thesis
is also based on the careful designed experiments and the analysis of experimental results.
0.3 Contributions of the Thesis
This thesis makes the following contributions:
1. An early stopping approach for GP that is based on the estimate of the generalization
error as well as propose some new criteria apply to early stopping for GP. The GP
training process will be stopped early at the time which promises to bring the solution
with the smallest generalization error.
2. A progressive sampling framework for GP, which divides GP learning process into
several layers with the training set size, starting from being small, gradually get
3
0.4. ORGANIZATION OF THE THESIS
bigger at each layer. The termination of training on each layer will be based on
certain early stopping criteria.
0.4 Organization of the Thesis
The organization of the rest of the thesis is as follows:
Chapter 1 first introduces the basic components of GP - the algorithm, representation,

and operators as well as some benchmark problem domains (some are used for the ex-
periments in this thesis). Two important research issues and a metaphor of GP search
are then discussed. Next, the major issues of GP are discussed especially when the GP
is considered as a machine learning system. It first overviews the approaches proposed
in the literature that help to improve the generalization ability of GP. Then, the solution
complexity (code bloat), in the particular context of GP, and its relations to GP learning
performance (generalization, training time, solution comprehensibility) are discussed.
Chapter 2 provides a backgrounds on a number of concepts and techniques subsequently
used in the thesis for improving GP learning performance. They include progressive sam-
pling, layer learning, and early stopping in machine learning.
Chapter 3 presents one of the main contributions of the thesis. It proposes some crite-
ria used to determine when to stop the training of GP with an aim to avoid over-fitting.
Experiments are conducted to assess the effectiveness of the proposed early stopping crite-
ria. The results emphasise that GP with early stopping could find solutions with at least
similar generalization errors with standard GP but in much shorter training time and lower
complexity of solutions.
Chapter 4 develops a learning framework for GP that is based layer learning and pro-
gressive sampling. The results show that the solutions of GP with progressive sampling
significantly reduce in size and the training time is much faster when compared to standard
4
0.4. ORGANIZATION OF THE THESIS
GP while the generalization error of the obtained solutions is often smaller.
Chapter 5 concludes the thesis summarizing the main results and proposing suggestions
for future works that extend the research in this thesis.
5
Chapter 1
Backgrounds
GP [53] is an evolutionary algorithm (EA) that uses either a procedural or a functional
representation. This chapter describes the representation and the specific algorithm com-
ponents used in the canonical version of GP. The basics of GP are initially presented,

followed by a discussion of the algorithm and a description of some variants of GP. Then,
a survey of ML related issues of GP is given. The chapter ends with a comprehensive
overview of the literature on the approaches used to improve GP generalization ability.
1.1 Genetic Programming
GP is an evolutionary paradigm that is inspired by biological evolution to find the solu-
tions, in the form of computer programs, for a problem. The form of GP, where a tree
representation of programs was used in conjunction with subtree crossover to evolve a mul-
tiplication function. The evolution through each generation as described in Figure 1.1. GP
adopts a similar search strategy as a genetic algorithm (GA), but uses a program represen-
tation and special operators. It is this representation that makes GP unique. Algorithm 1
shows the basic steps to run GP.
6
1.1. GENETIC PROGRAMMING
Fig. 1.1: GP’s main loop [57]
Algorithm 1: Abstract GP Algorithm [57].
Randomly create an initial population of programs from the available primitives (see
Section 1.1.1.2).
repeat
Execute each program and ascertain its fitness.
Select one or two program(s) from the population with a probability based on
fitness to participate in genetic operations (see Section 1.1.1.3).
Create new individual program(s) by applying genetic operations with specified
probabilities (see Section 1.1.1.4).
until an acceptable solution is found or some other stopping conditions are met (e.g
reaching a maximum number of generations).;
return the best-so-far individual.
1.1.1 Representation, Initialization and Operators in GP
This section will detail the basic steps and terminologies in the process of running GP.
In particular, it will define how the solution is represented in most GP systems in sub-
section 1.1.1.1, how to initialize the first population(subsection 1.1.1.2), and how selection

(subsection 1.1.1.3) as well as recombination and mutation(subsection 1.1.1.4) are used to
create the new individuals in the standard GP (as proposed in [53]).
7
1.1. GENETIC PROGRAMMING
1.1.1.1 Representation
Traditionally, GP programs are expressed as expression trees. Figure 1.2 shows, for ex-
ample, the tree representation of the program max(x*x,x+3*y). The variables and
constants in the program (x, y, and 3), are called terminals in GP, which are the leaves
of the tree. The arithmetic operations (+, *, and max) are internal nodes (called func-
tions in the GP literature). The sets of allowed functions and terminals together form the
primitive symbol set of a GP system.
Fig. 1.2: GP expression tree representing max(x*x,x+3y). [57]
The terminal set The terminal set consists of all inputs to GP programs (individuals),
constants supplied to GP programs, and the zero-argument functions with side-effects
executed by GP programs.
Inputs, constants and other zero-argument nodes are called terminals or leaves as they
appear at the frontier of expression tree-based programs. By this view, GP is no different
from any other ML system, where users have to make decision on the set of features -
inputs, and each of these inputs becomes part of the GP training and test sets as a GP
terminal.
The function set The function set is comprised of statements, operators, and functions
available to GP systems.
8
1.1. GENETIC PROGRAMMING
The function set is usually application-specific, it is selected to fit the problem domain.
Some examples of function sets that have been used in GP are as follows:
• Boolean Functions
For example: AND, OR, NOT, XOR.
• Arithmetic Functions
For example: +, -, *, /

• Transcendental Functions
For example: trigonometric and logarithmic functions.
• Variable Assignment Functions
• Indexed Memory Functions
• Conditional Statements
For example: IF, THEN, ELSE; CASE or SWITCH statements.
• Control Transfer Statements
For example: GO TO, CALL, JUMP.
• Loop Statements
For example: WHILE DO, REPEAT UNTIL, FOR DO.
• Subroutines
Programs can contain multiple components (e.g., subroutines) in more advanced GP
systems. In this case the representation used in GP is a set of trees (one for each
component) grouped together under a special root node that acts as the glue.
The traditional language used in AI research (e.g., Lisp and Prolog), and a number of
modern programming languages (e.g., Ruby and Python), and the languages associated
9
1.1. GENETIC PROGRAMMING
with the scientific programming tools (e.g., MATLAB and Mathemtica) have widely been
used to implement GP systems. They provide automatic garbage collection and dynamic
lists for easy installation of the expression tree and the necessary GP operations. In this
thesis, we implement and operate GP coded in C++.
1.1.1.2 Initializing the Population
In GP, each individuals in the initial population are usually randomly generated. There are
many possible methods to perform tree initialisation. Two common ones are the Full and
Grow methods. In the Full generation method, only functions are selected as arguments
to other functions until a certain depth is reached, whereupon only terminals are selected.
This method ensures that all branches of the tree will be of the same depth. The Grow
method allows both functions and terminals to be selected, with program branches allowed
to be any depth, up to some given maximum depth. When this depth is reached only

terminals are selected. This metho d allows trees with branches of different depths to
be generated but ensures that the trees will never become larger than a certain depth.
Pseudo code for a recursive implementation of both the Full and Grow methods is given
in Algorithm 2.
Fig. 1.3: Creation of a full tree having maximum depth 2 (and therefore a total of seven
nodes) using the Full initialisation method (t=time). [57]
10
1.1. GENETIC PROGRAMMING
Algorithm 2: Pseudo code for recursive generation of trees with the ”Full” and
”Grow” methods [57].
P rocedure : gen rnd expr(f unc set, term set, max d, method)
if max d = 0 or (method = grow and rand() <
|term set|
|term set|+|f unc set|
then
expr = choose random element(term set)
else if then
func = choose random element(f unc set) ;
for i ← 1 to arity(func) do
arg
i
= gen rnd expr (func set, term set, max d − 1, method);
expr = (f unc, arg
1
, arg
2
, ) ;
return expr;
Notes: func set is a function set, term set is a terminal set, max d is the maximum
allowed depth for expression trees, the method is either ”Full” or ”Grow” and expr

is the generated expression in its prefix notation.
The Full and Grow tree generation methods could be combined in the commonly used
Ramped half-and-half method. Here half the initial population is constructed using the
Full method and the other half is constructed using the Grow method. This is done by
using a range of depth limits (hence termed as ”ramped”) to ensure that generated expres-
sion trees have variety of sizes and shapes. While these methods are easy to implement
and use, they often make it difficult to control the statistical distributions of important
properties such as the sizes and shapes of the generated trees. Other initialisation mecha-
nisms, however, have been developed (e.g., [62]) to allow better control of these properties
in instances if such control is important.
11

×