Chapter 2

     2.1    Introduction

     2.2    Defining an Application
            2.2.1     Closure
            2.2.2     Examples

     2.3    Size of Individuals

     2.4    Fitness

     2.5    Population Initialization

     2.6    Selection

     2.7    Operators
          2.7.1     Crossover
          2.7.2     Reproduction
          2.7.3     Mutation


     2.8    Automatically Defined Functions (ADFs)
          2.8.1     ADFs in lil-gp

Background

2.1 Introduction

The term Genetic Programming is associated with the work of John Koza [3,4]. A genetic program (or GP) is an adaptive learning system based on many of the principles of genetic algorithms (GA) as described by Holland's Adaptation in Natural and Artificial Systems [2] and Goldberg's Genetic Algorithms [1].

There are many similarities between a GA and a GP. Both maintain a multitude of independent solutions, represented as individuals in a population. Each runs in a cycle called a generation, in which the members of the present population are moved forward, deleted or modified into a new population. Each uses a set of genetic operators

to modify individuals of the population. Each uses a selection operation to determine which individuals will moved to the next generation based on the fitness of those individuals, typically as measured by an external evaluation function.

However, there are some important, basic differences as well:

  1. The structure of representation for a GP is a tree, while most GA applications use a string for representing an individual in the population.
  2. The nodes of a GP tree are typically functions or terminals, allowing each tree to be interpreted as a program. While this could be true of a GA, it typically is not. It is almost always true of a GP however.
  3. While the length of a GA string is often fixed, the size of a GP individual is intrinsically variable in length.

In the rest of this section we will lay some of the groundwork for what a GP is, and how to work with it. It will, by no means, be a complete exposition on genetic programming. For such a description, we refer you to [3,4].

2.2 Defining an Application

To define a GP application, one must provide both the function and terminal nodes from which the GP tree is constructed. The function nodes constitute the internal nodes of the tree, representing a function whose arguments are the subtrees below that node. The terminal nodes constitute the terminal nodes of the tree, representing non-argument taking functions, or atoms

Thus a GP constructs a parse tree of a function, and the functions contained in that tree take arguments from the evaluation of their subtrees. The GP system generally either runs for a prespecified number of generations or until a satisfactory soluttion has been found.

2.2.1 Closure

Because a GP structure represents a program, and because the program can be constructed in ways not necessarily foreseen, the functions must designed such that they can take any arguments that could possibly be returned by an atom or the evaluation of another GP function. A classic example would be a division operator. A division used in a GP application that has numbers as atoms must be designed so that, when given a divisor of 0, division has some default behavior that allows the program to continue, rather than signaling an error condition and stopping. Koza calls this the closure property. One must take some care in designing the functions for an application such that they are indeed closed under the other functions and atoms.

2.2.2 Examples

There are 5 example programs, taken from GP I [3] and GP II [4], distributed with lil-gp. We define them below to give you a feel for GP applications.

Artificial Ant

In the artificial ant problem a grid is provided with a "trail" of food pellets distributed over the grid. Two examples are provided, the Los Altos trail (100x100 with 105 food pellets) and the Santa Fe trail (32x32 with 89 food pellets). The GP-program generates a path by walking through this map. It is allowed to run for some number of time-steps t (400 in Santa Fe, 3000 in Los Altos), after which fitness is measured by the number of food pellets "run over" by the ant. Each terminal costs one time step to evaluate, each function takes no time

The function set has 4 members. The first is if-food-ahead, which has two arguments_one to be performed if there is food in front of the ant, the other otherwise. It has the for

      (if-food-ahead  (arg1-true-subtree)  (arg2-false-subtree))

The other 3 functions are progn2 (2 args), progn3 (3 args), and progn4 (4 args). Each of these functions simply executes its children from left to right

The terminal set has 3 members: move, which moves the ant one step forward. left, which turns the ant left, and right, turns the ant right.

Boolean 11-multiplexer

The Boolean 11-multiplexer problem generates GP-programs/individuals that show the same be- havior as a multiplexer with 3 address lines and 8 data lines. That is, given all the possible inputs of an 11-multiplexer (211 possibilities), determine GP-program fitness based on how it compares on all 2048 cases

The function set has 4 members, which are: and (2 args), or (2 args), not (1 arg), and an if function (3 args) that evaluates one or the other subtree associated with it based on a condition, as follows

      (if (arg1-cond) (arg2-true-subtree) (arg3-false-subtree))

The terminal set has 11 members which are: the data lines {d0 : : :d7 }, and the address lines {a0 : : :a2 }.

Regression

The GP-programs/individuals are designed to generate a function which matches a target curve. The fitness cases are a number of known (x,y) pairs on the target curve. The GP-program is evaluated at each of these x-coordinates, and the difference between the y value and the value returned by the GP-program, summed over all the cases, is the fitness

The function set has 8 members which are: multiply, protected-divide (returns 1 when dividing by 0), add, subtract, sin, cos, exponentiation and protected-log (log of 0 is 0, otherwise is the log of the absolute value)

The terminal set has one or two members: the input value x, and (optionally) ephemeral random constants or ERCs. An ERC is a special terminal whose value is fixed. When an ERC terminal is generated, either during the filling of the initial population or by mutation later in the run, a value is attached to that terminal and is unchanged by subsequent operations. In our example, ERC's were generated in the range of [-1; 1).

Two-Boxes

Here, the GP attempts to evolve the function l0 h0 w0 - l1 h1 w1 using the four elementary arithmetic functions and six terminals, one for each variable. This problem is included because it provides a simple introduction to the use of ADFs.

Lawnmower

This problem evolves a control program for a lawnmower, with the goal being to mow the grass on every square of a toroidal grid. The mower can move forward (and mow) one square, turn left, or jump any distance (relative to its own position and facing direction). For more information, the reader is referred to [4].

2.3 Size of Individuals

The trees evolved by GP can grow very large. To avoid wasting time evaluating a few very large trees, the user can place limits on the number of nodes and/or the depth of an individual. In problems where individuals are composed of multiple trees (see Section 2.8.1), separate limits can be set for each component tree, as well as for the individual as a whole.

Koza's experiments, in both books [3,4], place a maximum depth limit of 17 (with no restriction on the number of nodes).

2.4 Fitness

The fitness of each individual in the population is determined by use of an evaluation function. Based on the results of the evaluation function, decisions are made with regards to the propagation and recombination of an individual into the next generation. There are a number of types of "fitness" that can be used by a GP, which we note below:

Raw Fitness, fr This measure is some direct measure, based on the application itself, of progress made in solving the problem. More often than not, such a measure is based on some comparison to fitness cases. Much like a training set used in other applications like a neural net, one applies each of the fitness cases to the GP-program being evaluated, and the sum performance of that GP-program on all the cases is the raw fitness.
For the three of the lil-gp test cases, raw fitness is measured as follows:

Regression Raw fitness is based on measuring the GP-generate curve against 20 test points on a test curve. For each GP, if the generated point is within some , (0.01 by default) then it counts as a "hit." The raw fitness is the sum of the distances between the ideal and GP-calculated values over all the fitness cases.
Multiplexer For the "11-multiplexer" (3 address lines, 8 data lines), each of the 2048 cases (211 possible values of the 11 variables) is tested. Raw fitness is the number of cases that the GP gets correct.
Artificial Ant A number of "food" pellets are placed on a map. The GP evolves a control program for the ant, and simulates the repeated execution of it until all the food is consumed or the maximum time limit is reached. Raw fitness is the number of food pellets that the ant hits.
Two-boxes Raw fitness is equal to the absolute difference between the correct answer and the GP-answer, summed over 10 fitness cases. A hit is defined as a fitness case where the GP is within 0.01 of the correct answer.
Lawnmower Raw fitness is equal to the number of squares mowed during one execution of the GP-program.

Standardized Fitness, fs Standardized fitness simply reverses and/or translates the raw fitness so that all fitnesses are nonnegative, with 0 being for the best possible fitness.

Adjusted Fitness, fa This value is calculated from the standardized fitness. It is defined as follows for each individual i

                                           1
                            f

a (i) = ----------- 1 + fs(i)

fa varies from 0 to 1, with 1 being best. It has the advantage of exaggerating small differences as the fitness of the individuals increases. Thus, as the solution is nearly complete, better feedback is give to the GP so that the best solutions can be pursued.

Koza and others also refer to a "normalized fitness." This is simply adjusted fitness divided by the sum of all adjusted fitnesses in the population. lil-gp does not use normalized fitness explicitly, but instead does implicit normalization of adjusted fitness where necessary (for instance, in fitness-proportionate selection).

2.5 Population Initialization

The GP-programs are created from random selections from the function and terminal set. However, even though the selections are random there are some parameters which control the initial population. See page 33 for a listing of lil-gp parameter names

There are three methods available for creating these initial random structures

Full The full method generates only full trees, that is, trees which have all terminal nodes in the same level of the generated program tree. Another way to say this is that the tree path length from any terminal node to the root of the tree is the same.

Grow The grow method chooses any node (function or terminal) for the root, then recursively calls itself to generate child trees for any nodes which need them. It is restricted so that each tree has a maximum depth (if the tree reaches the maximum depth, all further nodes are restricted to be terminals, so growth will cease).

Half-and-half This method merely chooses the full method 50% of the time and the grow method the other 50%.

All of the generation methods can be specified with a "ramp" of initial depth values instead of a single value. For instance, if the ramp is 2 - 5, then 25% of the trees will be generated with depth 2, 25% will be generated with depth 3, and so on. Note that the grow method (and consequently, the half-and-half method), when called to generate a tree of depth n, can produce a tree with actual depth less than n

Half-and-half with a depth ramp is typically the method of choice for initialization since it produces a wide variety of tree shapes and sizes

lil-gp checks each individual generated against node and/or depth limits (if any have been) set before inserting it into the initial population. It also ensures that no duplicates are present in the initial population.

2.6 Selection

A selection method is routines used to select an individual from a population. Currently, selections are used for two purposes in lil-gp: selecting the input individuals for a genetic operator (such as crossover) to work on, and selecting individuals to undergo subpopulation exchange in multiple- population problems

Three commonly used selection methods are

Fitness-proportionate selection This selects an individual based on the proportion of that individual's adjusted fitness in comparison to the total adjusted fitness of the population. When there are n individuals in the population, an individual i will be chosen with probability


f

a(i) pi = ----------------------- Sum(j=1...n)fa (j)

This is also known as the "roulette wheel" algorithm [1]. That is, each individual gets a portion of a roulette wheel based on the above formula (the entire wheel being equal to 1). The wheel is then "spun" to determine which individual is next selected. Individuals with a large portion of the overall fitness have an increased chance of being selected, but every individual has some chance.

(Greedy) Overselection Though fitness-proportionate selection is considered good for most applications, it is sometimes desirable to speed up the process. For large populations (in [3] Koza defines large as over 500), you might use overselection. Overselection partitions the population into two groups. In Koza's standard formulation, 80% of the time, individuals are selected from Group I (based on fitness proportionate selection within the group) and 20% of the time from Group II. The partition into two groups can be arbitrary, but Koza has defined this partition based on fitness. For a population of 1000, the top individuals accounting for 32% of the total adjusted fitness go into the first group, the rest into the second. For popu- lations of 2000, the split occurs at 16% of the adjusted fitness, for populations of 4000, the split occurs at 8% and so on. The particular percentages are parameters in lil-gp and can be altered.

Overselection results in much higher selection pressures on the population than fitness proportionate selection. While such pressure often results in local minima solutions in GAs, Koza does not report such results in GPs.

Tournament Selection Tournament selection was originated in GAs to avoid overselection pres- sure on a population that could cause premature convergence. In tournament selection n individuals are chosen in a uniform random manner, then the best (by fitness) of those n individuals is selected. n is the tournament size. Koza uses this type of selection, with a tournament size of 7, for most of the runs given in GP II [4].

lil-gp provides these and 4 other selection methods. See page 33 for more information on these methods in lil-gp.

2.7 Operators

A genetic operator is a method for creating the individuals in each generation, usually by recombining pieces of individuals in the current generation. Crossover, reproduction, and mutation are the three operators implemented in lil-gp. For information on their use in lil-gp, see page 36.

2.7.1 Crossover

Crossover is the main operator in recombining old solutions into new and potentially better solutions in both GAs and GPs. In a GP, crossover occurs on trees. Thus two individuals are selected (using whatever selection method is in force) for crossover. Let's call them A and B. If the individuals in the current problem are composed of multiple trees (see Section 2.8.1), then one tree is randomly selected from each individual, subject to the restriction that both trees must share the same function set. A node is randomly selected on each tree, nA and nB . Crossover occurs by moving nA and the subtree which has nA as its root, to tree B at the position of nB , and at the same time moving nB and the subtree which has nB as its root, to A at the position of nA

lil-gp allows mixed selection operations. That is, certain operators such as crossover can use one selection method, while other operators use another. This allows for experiments with mixed strategies. lil-gp also allows a different selection method to be used for each

parent in crossover, and the probability that each crossover point is at an internal node versus an external node can be set by the user. If the individuals are composed of multiple trees, the probability that a given tree within an individual is chosen as the crossover tree can be set as well

It is possible for crossover to create a tree that violates some of the node and/or depth maximums, if any are set by the user. In such cases, Koza just reproduces one of the parents into the new population in place of the too-large offspring. lil-gp supports this behavior, but can be set to continue picking random crossover points until two legal offspring are produced.

2.7.2 Reproduction

The simplest operator, reproduction simply chooses an individual in the current population and copies it verbatim into the new population. Apart from the choice of selection method, no options are available (or needed).

2.7.3 Mutation

Mutation in GP is typically a point mutation. An individual is selected, and a mutation point picked. The subtree with the mutation point as its root is deleted and replaced with a randomly generated subtree

The mutation options in lil-gp are similar to those of crossover: method for selecting the individual, probabilities governing the location of the mutation point, what to do when mutation produces a tree that violates node and/or depth limits. In addition, the user can specify the method and depth ramp for creating the new, random subtree.

2.8 Automatically Defined Functions (ADFs)

Koza's second book on genetic programming [4] was devoted mainly to exploring the use of au- tomatically defined functions (ADFs). This technique places constraints on the tree_usually the nodes around the root have a constant structure for all individuals, and this constrant structure has two or more "slots" where the evolved portions of the individual hang off

The running example in this section will be the "two-boxes" problem presented in [4]. This problem attempts to evolve a program to compute l0 w0 h0 - l1 w1 h1 using the four basic arithmetic functions and six terminals representing the six variables. Koza's experiments with this problem use a single three-argument ADF. All individuals in the LISP representation of this problem fit the general framework depicted in Figure 2.1.

                      progn

                     /     \

                  defun     \

                /   |  \     \

            ADF0    |   \     \                     

                    |    \     \

        (ARG0 ARG1 ARG2)  \     \

                          |      \

   - - - - - - - - - - - -|- - - -\- - - - - - - - - - - - - -


                          |         \

               <body of ADF0>     <body of main program>

    Figure 2.1: LISP representation of individual in two-boxes
    problem.  (After figures in GP II.)

The portion of the individual above the dotted line is just setup for the evaluation. It serves to define a new three-argument function ADF0 with the given body, and bind the arguments of the function to the symbols ARG0, ARG1, and ARG2 within the function body. It then evaluates the right, "result-producing" branch of the individual

Each of the two evolved sections of the tree has its own set of functions and terminals. The left branch (ADF0) has the four arithmetic functions available, but only its only terminals are the three arguments ARG0, ARG1, and ARG2. The right branch, though, has the four arithmetic functions, the six terminals representing variables, and an additional three-argument function ADF0 which causes evaluation of the left branch. Genetic operators, such as crossover, must ensure that new individuals fit this scheme. In this case, crossover must not occur within the nodes above the dotted line, and it must ensure that the operation preserves the separate function sets_ADF0s can only cross over with other ADF0s and RPBs can only cross over with other RPBs. Similar retrictions apply for mutation.

2.8.1 ADFs in lil-gp

Representation of ADFs in lil-gp is different but essentially equivalent. lil-gp stores only the "guts" of the individual_the portions below the dotted line. lil-gp defines an "individual" as being a set of trees rather than a single tree. The number of trees per individual is fixed (set by the application code), say at K. Each tree 0...(K - 1) within an individual can have a different function set . The function set for a particular tree number k is the same for all individuals

lil-gp replaces the LISP bookkeeping stuff explicitly stored in the individual (the portions above the dotted line) with C bookkeeping stuff represented (1) implicitly in the user-defined func- tion sets, and (2) explicitly in the structure of the user-written individual evaluation function (app_eval_fitness())

In the two-boxes example, the application code would specify that individuals have two trees each. Let us suppose tree 0 is the RPB and tree 1 is ADF0. The corresponding function sets would look like

    tree 0:     {+, -, *, =, l0, w0, h0, l1, w1, h1, EVAL1}
    tree 1:     {+, -, *, =, ARG0  , ARG1  , ARG2  }

EVAL1 is a special type of function. When the evaluation routine hits an EVAL1, it will evaluate tree 1 of the individual and take that value as the value of EVAL1. ARG0, ARG1, and ARG2 are special terminals that will take on the appropriate values (the arguments to the EVAL1 function) each time an EVAL1 is hit and tree 1 is evaluated.

EVALn functions may be of any arity, including zero. The arity of these functions is determined when the function set is initialized, by counting the number of ARGn terminals in the target tree

To determine the fitness of the individual in this example, the application code would just evaluate tree 0

      value = evaluate_tree ( ind->tr[0].data, 0 );

Any evaluation of tree 1 of the individual will be done via the EVAL1 functions in tree 0.