Chapter 6

     6.1    Basic Definitions

     6.2    Functions and Terminals
          6.2.1     Ephemeral Random Constants
          6.2.2     Evaluation and Argument Functions

     6.3    User Callbacks
          6.3.1     Defining the Function Set(s)
          6.3.2     Fitness Evaluation Function
          6.3.3     Custom Output
          6.3.4     Application Initialization
          6.3.5     Output Streams
          6.3.6     Checkpoint Files

     6.4    Order of Processing

     6.5    Kernel Considerations
          6.5.1     Memory Allocation
          6.5.2     Using Parameters

Implementing Problems

This chapter documents how to implement a new problem in lil-gp. There are five files that the user must write. A set of skeleton user files is provided in the distribution, it is suggested that you copy these files and modify them to create a new problem.

Throughout this chapter, the term "function" refers to functions in the GP sense. "C function" refers to a function in the C language

User-written code can be divided into two categories: C functions implementing functions and terminals, and user callbacks. The user callbacks, usually placed in the app.c file, do application- specific tasks like function set initialization, calculation of fitness, etc. The other group of C functions, usually placed in function.c, are the code that is called by the kernel during tree evaluation.

6.1 Basic Definitions

There are two defined constants that the kernel of lil-gp needs in appdef.h. They are:

constant	value
MAXARGS	the maximum number of arguments (children) for any function
DATATYPE	the C data type returned by all functions and terminals

This is also a good place to put any application-specific #defines that you may need. It is suggested that all application defines be prefixed with APP_ so as not to conflict with any current or future kernel defines.

If your problem requires a more complex data type than the ones available in C, you can use typedef to create a new type. For instance, the lawnmower problem uses an ordered pair of integers as its datatype. Its appdef.h file contains:

typedef struct
{
     short x;
     short y;
} vector;
#define DATATYPE vector

6.2 Functions and Terminals

For every ordinary function and terminal in your problem, you write a C function to implement the action of that node. These C functions are placed in the file function.c, and prototypes for them should be placed in function.h.

Each C function is passed two arguments, an int and a (farg *). What it does with these arguments depends on whether it is implementing a function or a terminal, and if it is a function, what type of function. All these C functions should return the user-defined type DATATYPE.

There are two types of functions, referred to in lil-gp as types "DATA" and "EXPR". If the function is of type DATA, then when it is found in a tree, all its children will be evaluated and their return values passed to the user code implementing the function. The LISP equivalent of this is to implement the function with a defun. If the lil-gp function is of type EXPR, then the user code is passed pointers to its children, which it can then ask the kernel to evaluate if needed. It can evaluate each child as many times as appropriate, or not at all. The LISP equivalent of this type would be to implement the function with a defmacro. Use of the correct type in lil-gp is important, especially when the evaluation of functions and terminals have global side effects (for instance, where the evolved program is controlling a simulation).

If the function is of type DATA, it can ignore the int passed to it. The (farg *) argument will be an array of arguments, one element for each child. The C function should reference the d field of each element to get that child's value. For instance, consider the two-argument addition function from the regression problem:

DATATYPE f_add ( int tree, farg *args 
{
     return args[0].d + args[1].d;
}

When this function occurs in evaluating a tree, the lil-gp kernel will evaluate the children, store their values in the args array, and call this C function.

Now consider another example: the IF_FOOD_AHEAD function from the artificial ant problem. It has two arguments_the first should be evaluated if there is food in front of the ant, the second otherwise. If type DATA were to be used for this function, then both would be evaluated and only their return values passed to the function (which would be doubly useless in this case, since all the functions and terminals in the ant problem ignore the return value). We want to let the function itself choose which child to evaluate. This function must be of type EXPR:

DATATYPE f_if_food_ahead ( int tree, farg *args )
{
     if ( ... ) /* determine if there is food ahead */
          evaluate_tree ( args[0].t, tree );
     else
          evaluate_tree ( args[1].t, tree );
}

For type EXPR functions, the t field of each array element should be accessed-it is a pointer to the corresponding child. This pointer can be passed to the evaluate_tree() C function to actually do the evaluation. evaluate_tree() also needs to be passed the integer argument (called tree in this case).

C functions implementing terminals should ignore both arguments passed to them. A simple example is the independent variable terminal X from the symbolic regression problem:

DATATYPE f_indepvar ( int tree, farg *args )
{
     return g.x;
}

This function just returns the value of the independent variable for the current fitness case, which has previously been stored in a global variable by the application fitness evaluation function.

6.2.1 Ephemeral Random Constants

To create a terminal that acts as an ephemeral random constant, you need to write two C functions. One will generate a new constant, and one will print its value to a string. The first is passed a pointer to a DATATYPE; it should generate a new value and place it in the pointer.

void f_erc_generate ( DATATYPE *r )
{
     *r = random_double() * 10.0;
}

This function generates a random real number in the interval [0; 10) (assuming that DATATYPE is defined to be double or some compatible type.

The second function is used when printing out individuals. It is passed a DATATYPE value. It should create a string representing that value and return it. Typically this will print the value into a buffer and return the buffer's address. The buffer should be declared static_it should not be dynamically allocated (as there is no code to free it). An example:

char *f_erc_print ( DATATYPE v )
{
     static char buffer[20];

     sprintf ( buffer, "%.5f", v );
     return buffer;
}

assuming again that DATATYPE is double or something compatible, this will print the value to five decimal places.

6.2.2 Evaluation and Argument Functions

No user code needs to be written to support the ADF functions or corresponding argument termi- nals. Special entries are made in the function table for them, and the kernel handles the evaluation internally

Evaluation functions with arguments have type DATA or EXPR, just like ordinary functions. If the type is DATA, when the evaluation function is hit, each child is evaluated once, and the return values are made available via the argument terminals in the evaluated tree. If the type is EXPR, then the children are evaluated only when the evaluation of the target tree hits the appropriate argument terminal (and if the same argument terminal is hit multiple times, the child is reevaluated each time).

6.3 User Callbacks

Only two of the user callbacks listed here are required to do anything (app_build_function_sets() to create the function set(s) and app_eval_fitness() to evaluate individuals). All the others must be present, but they can be just stubs if you don't want to make use of them.

6.3.1 Defining the Function Set(s)

The first user callback required is app_build_function_sets(). This C function creates tables for each function set. There may be more than one function set when individuals are represented by multiple trees, since each tree can have its own function set. Each function set is an array of type function. The following tables show, for each type of node, what the eight fields of the corresponding function structure should be. Some general rules apply:

The code, ephem_gen, and ephem_str fields are C function pointers, not strings. You put the name of the function you
are referencing here, but don't quote it.
The string field is the name of the function as a string. It is what gets printed to represent the node when trees are printed to output files. Names may not contain whitespace or any of the characters `:', `(', `)', `[', `]'.
The index field should always be zero.

ordinary function

code	The C function implementing the function.
ephem_gen	NULL
ephem_str	NULL
arity	The arity of the function (greater than zero).
string	The name of the function.
type	FUNC_DATA or FUNC_EXPR, as appropriate.
evaltree	-1
index	0

ordinary terminal

code	The C function implementing the function.
ephem_gen	NULL
ephem_str	NULL
arity	0
string	The name of the terminal
type	TERM_E
evaltree	-1
index	0

ephemeral random constant terminal

code	NULL
ephem_gen	The C function to generate new random values.
ephem_str	The C function to print values to a string.
arity	0
string	The generic name of the terminal. (Printed trees willalmost always have the string representing the value ofthe terminal, rather than this name.
type	TERM_ERC
evaltree	-1
index	0

evaluation function/terminal

code	NULL
ephem_gen	NULL
ephem_str	NULL
arity	-1. (The kernel will determine the arity by lookingat the argument terminals in the target tree.)
string	The name of this function/terminal.
type	EVAL_DATA or EVAL_EXPR, as appropriate.
evaltree	The number of the tree to evaluate when this function is hit.
index	0

argument terminal

code	NULL
ephem_gen	NULL
ephem_str	NULL
arity	0
string	The name of this terminal.
type	TERM_ARG
evaltree	The argument number (which child of the correspondingevaluation function this terminal represents).
index	0

The function sets for the lawnmower problem contain examples of all five types of node:

function sets[3][10] =

     /*** RPB ***/

{ { { f_left, NULL, NULL, 0, "left", TERM_NORM, -1,0 },
     { f_mow, NULL, NULL, 0, "mow", TERM_NORM, -1, 0 },
     { NULL, f_vecgen, f_vecstr, 0, "Rvm", TERM_ERC, -1,0 },

     { f_frog, NULL, NULL, 1, "frog", FUNC_DATA, -1, 0},
     { f_vma, NULL, NULL, 2, "vma", FUNC_DATA, -1, 0 },
     { f_prog2, NULL, NULL, 2, "prog2", FUNC_DATA, -1,0 },
     { NULL, NULL, NULL, -1, "ADF0", EVAL_DATA, 1, 0 },
     { NULL, NULL, NULL, -1, "ADF1", EVAL_DATA, 2, 0 }},



/*** ADF0 ***/

{ { f_vma, NULL, NULL, 2, "vma", FUNC_DATA, -1, 0},
     { f_prog2, NULL, NULL, 2, "prog2", FUNC_DATA, -1,0 },
     { f_left, NULL, NULL, 0, "left", TERM_NORM, -1, 0},
     { f_mow, NULL, NULL, 0, "mow", TERM_NORM, -1, 0 },
     { NULL, f_vecgen, f_vecstr, 0, "Rvm", TERM_ERC, -1,0 } },




/*** ADF1 ***/

{ { f_left, NULL, NULL, 0, "left", TERM_NORM, -1,0 },
     { f_mow, NULL, NULL, 0, "mow", TERM_NORM, -1, 0 },
     { NULL, f_vecgen, f_vecstr, 0, "Rvm", TERM_ERC, -1,0 },
     { NULL, NULL, NULL, 0, "ARG0", TERM_ARG, 0, 0 },


     { f_frog, NULL, NULL, 1, "frog", FUNC_DATA, -1, 0},
     { f_vma, NULL, NULL, 2, "vma", FUNC_DATA, -1, 0 },
     { f_prog2, NULL, NULL, 2, "prog2", FUNC_DATA, -1,0 },
     { NULL, NULL, NULL, -1, "ADF0", EVAL_DATA, 1, 0 }} };

This problem uses two ADFs-the zero-argument ADF0 and the one-argument ADF1. Both ADFs are available to the result-producing branch. In addition, ADF0 can be called from within ADF1

Note that the functions and terminals can appear in the table in any order. Previous versions of lil-gp required all functions to appear first in the table, followed by the terminals, but this is no longer the case

Once the function table is created, a list of function sets needs to be created that references it. You should create an array of type function_set with one member for each function set. The size field should be set to the number of functions and terminals in it, and the cset field should point to the function table. The lawnmower problem uses:

function_set *fset;
. . . .
fset = (function_set *)MALLOC ( 3 * sizeof ( function_set ));
fset[0].size = 8;
fset[0].cset = sets[0];
fset[1].size = 5;
fset[1].cset = sets[1];
fset[2].size = 8;
fset[2].cset = sets[2];

Next you must build a tree map, indicating which trees use which function sets. This is just an array of ints, where the nth element indicates the number of the function set of the nth tree. In the case of the lawnmower problem, there is just one tree per function set:

tree_map = (int *)MALLOC ( 3 * sizeof ( int ) );
tree_map[0] = 0;
tree_map[1] = 1;
tree_map[2] = 2;

If two trees use the same function set, then crossover may exchange genetic material between these trees on different individuals. If this is not desired, you can make a copy of the function set, and have one tree use the copy. This would be accomplished with something like:

fset[2].size = 8;
fset[2].cset = sets[2];
fset[3].size = 8;
fset[3].cset = sets[2]; /* note they refer to the same functions*/

. . .

tree_map[2] = 2;
tree_map[3] = 3;

Now trees 2 and 3 will not crossover with each other, even though their function sets are identical

One last thing to build is a list of tree names--these will be used to label the separate trees when individuals are printed out:

char *tree_name[3];
. . .
tree_name[0] = "RPB";
tree_name[1] = "ADF0";
tree_name[2] = "ADF1";

Now that all the data structures are built, you must pass them as arguments to the kernel function function_sets_init(). This function will do some validity checking and make internal copies of everything. After this function returns, you may destroy your copies. You should also save the return value of this function (an int) and return it to the kernel.

int ret;
. . .
ret = function_sets_init ( fset, 3, tree_map, tree_name, 3);


FREE ( tree_map );
FREE ( fset ) ;


return ret;

The second argument to function_sets_init() is the number of function sets, the fifth argument is the number of trees per individual.

6.3.2 Fitness Evaluation Function

The user function app_eval_fitness() is called whenever an individual is to be evaluated. It is passed a pointer to an individual structure. It should fill in these fields:

r_fitness The raw fitness.

s_fitness The standardized fitness (all values nonnegative, a perfect individual is zero).

a_fitness The adjusted fitness (lies in the interval [0; 1], a perfect individual is one).

hits The auxiliary hits measure.

evald Always set this to EVAL_CACHE_VALID to indicate that the fitness fields are valid.

The function should call set_current_individual() with the pointer passed to it before doing any evaluations. The function can evaluate trees of the individual by calling evaluate_tree(), passing it a pointer to the tree data and the tree number.

Typically the function will iterate over all the fitness cases. The global variable g, which is a user-defined structure, is used to pass information between app_eval_fitness() and the functions and terminals. For example, in the symbolic regression problem, g.x is set to the x value for the current fitness case, then the tree is evaluated. When the evaluation reaches the independent variable terminal, the C function implementing it simply reads this value and returns it.

A typical evaluation function will have this general structure:

void app_eval_fitness ( individual *ind )
{
     set_current_individual ( ind );
     . . .
     for ( <loop over fitness cases> )
     {
          <set up global structure for current fitness case>


          /* here we evaluate tree 0, but you can evaluate any tree of
          * the individual as many times as you like.
          */
          value = evaluate_tree ( ind->tr[0].data, 0 );
          . . .
     }


     ind->hits = <whatever>;
     ind->r_fitness = <whatever>;
     ind->s_fitness = <whatever>;
     ind->a_fitness = <whatever>;


     /* indicate that the fitness fields are correct.*/
     ind->evald = EVAL_CACHE_VALID;
}

More complex problems which require a simulation store the entire state of the simulation in g. app_eval_fitness() resets the simulation, before evaluating the tree. For instance, in the artificial ant problem the tree is evaluated repeatedly until the time expires or all the food has been collected

The functions and terminals read and modify the global state information in order to simulate the ant's senses and movements.

6.3.3 Custom Output

After every the evaluation of each generation, lil-gp calls the function app_end_of_evalulation(). It is passed the generation number, a pointer to the entire population, statistics for the run and generation, and a flag indicating whether a new best-of-run individual has been found or not. It should return a 1 or 0, indicating whether the user termination criterion has been met and the run should stop

Suppose the that the function is declared with the following argument names:

int app_end_of_evaluation ( int gen, multipop *mpop, int newbest,
popstats *gen_stats, popstats *run_stats )

The population is passed as the pointer to a structure of type (multipop *). Everything within this structure should be treated as read-only. This table gives some useful items of information stored in this structure:

mpop->size	number of subpopulations
mpop->pop[p]->size	size of population p
mpop->pop[p]->ind[i]	the i'th individual of population p
mpop->pop[p]->ind[i].r_fitness	raw fitness of individual
mpop->pop[p]->ind[i].s_fitness	standardized fitness of individual
mpop->pop[p]->ind[i].a_fitness	adjusted fitness of individual
mpop->pop[p]->ind[i].hits	hits of individual
mpop->pop[p]->ind[i].tr[n].data	tree n data pointer

The tree data pointer(s) can be passed to evaluate_tree() to evaluate the tree just as in the evaluation function. To print the entire individual, pass its address to print_individual() or pretty_print_individual().

The content of the statistics structure should be discernible to the interested reader from the declaration in types.h. gen_stats[0] is statistics for the whole population in the current generation, while gen_stats[i] gives the same just for subpopulation i. The run_stats array is similar, but accumulates information over the whole run

In many problems it is useful to access the best-of-run or best-of-generation individual for printing or doing extra evaluations. For instance, the symbolic regression problem produces an extra output file with the best-of-run individual evaluated at 200 points over the interval of interest, for easy plotting. A copy of the best-of-run individual is pointed to by run_stats[0].best[0]->ind, and the best-of-generation individual by gen_stats[0].best[0]->ind.

In versions of lil-gp prior to 0.99b, it was an undocumented feature that by modifying the parameter database, the breeding parameters could be altered dynamically during the run. If you took advantage of this, you must now call rebuild_breeding_table() after modifying the parameters, and pass it the multipop pointer passed to you. If you do not, your changes to the parameter database will have no effect. This ability is now considered a bona fide feature of lil-gp, and will be supported in future releases

Changes to the subpopulation exchange topology parameters underwent a similar change. If you change the parameters during the run, you should call rebuild_exchange_topology() after making changes in order for them to have any effect

Some kernel operations (for instance, restarting from a checkpoint file) imply rebuilding the breeding and topology tables from the parameter database. You should only make changes to these parameters when you intend to immediately call the appropriate rebuilding functions, otherwise unpredictable things will occur.

Another user callback app_end_of_breeding() is called after the new population is created each generation. This is passed the generation number and the population structure, just as in the end of evaluation callback, but no statistics information.

6.3.4 Application Initialization

There are two functions provided for application-specific initialization: app_initialize() and app_uninitialize(). app_initialize() is passed an integer flag indicating whether the run is starting from a checkpoint or not. It should return 0 to indicate success, or anything to abort the run

Initialization such as memory allocation and reading parameters should go in app_initialize(). The last function is called at the end of the run, and may used to do things like free memory.

6.3.5 Output Streams

An output stream is a simple abstraction of an output file. This mechanism handles both the naming of the actual file and uses the detail level (the output.detail parameter) to filter the output. Some functions are provided for writing to output streams:

oputs ( int streamid, int detail, char *string ) Prints the string to the given output stream, if the value of detail is less than or equal to the current detail level.

oprintf ( int streamid, int detail, char *format, ... ) Processes the format and succeeding arguments as in printf(), and prints the resulting string to the stream if the detail is less than or equal to the current detail level.

test_detail_level ( int detail ) Returns true if the argument is less than or equal to the current detail level.

output_filehandle ( int streamid ) Returns the filehandle (FILE *) for the given stream. Useful for passing to print_tree() and the like.

The standard output files (.sys, .gen, etc.) are can be printed to with the stream ids OUT_SYS, OUT_GEN, etc. For instance:

oprintf ( OUT_SYS, 30, "Tree %d is:"n", tree_num);
if ( test_detail_level ( 30 ) )
     print_tree ( tree[tree_num], output_filehandle ( OUT_SYS ));

An application can define custom output streams (for instance, the .fn output file of the regression problem). This is done in the application function app_create_output_streams(). This function should be used only to create user output streams. In it, you call create_output_stream() with five arguments

id The id for the stream (an integer). User-defined output streams should have ids OUT_USER, OUT_USER+1, etc.

ext The extension for the filename. This string is appended to a basename (the parameter output.basename) to create the filename).

reset A flag indicating whether the stream can be closed and reopened (using the functions output_stream_close() and output_stream_open()). Reopening a stream overwrites the old file (like the .bst file).

mode The mode string to pass to fopen() when opening the file. Typically will be "w" or "wb".

autoflush Flag indicating whether the file should be flushed after each call to oputs() and oprintf().

app_create_output_streams() is called before any parameters have been loaded, so you should not attempt to read the parameter database in this function.

6.3.6 Checkpoint Files

Two functions are provided for saving user state to checkpoint files, app_write_checkpoint() and app_read_checkpoint(). Each is passed a file handle (FILE *) opened in text mode for writing or reading, respectively. Each function should leave the file pointer at the end of the user section.

6.4 Order of Processing

Here is the order things happen in during a run.

print startup message

initialize parameter database

initialize ERCs

initialize generation space

app_create_output_streams()

initialize output streams

pre_parameter_defaults()

process command-line arguments in order, possibly including loading of checkpoint file

if not starting from checkpoint, post_parameter_defaults()

open output files

if not already done (during loading of checkpoint), app_build_function_sets()

read tree node/depth limits from parameters

if not starting from checkpoint, seed random number generator

app_initialize()

if not starting from checkpoint, create initial random population

initialize subpopulation exchange topology

initialize breeding table

run the GP: until termination

evaluate the population, unless this is first generation after loading checkpoint

compute population statistics

app_end_of_evaluation()

write checkpoint file, if necessary

if this is not the last generation

do subpopulation exchange, if necessary

breed new population

app_end_of_breeding()

app_uninitialize()

free breeding table

free subpopulation exchange topology

free population

free parameter database

free ERCs

free generation spaces

free function sets

print system statistics

close output streams

6.5 Kernel Considerations

6.5.1 Memory Allocation

lil-gp system has a system for tracking memory usage.1 This is helpful in tracking down mem- ory leaks, among other things. To use it, just use MALLOC(), REALLOC(), and FREE() instead of malloc(), realloc(), and free(). The uppercased versions should work exactly like their low- ercased counterparts. You may use the lowercase versions if you do not wish to have the memory included in the statistics, but do not mix pointers returned by the two different sets of functions. Don't FREE memory that you've malloc'ed, etc.

6.5.2 Using Parameters

User code may read and write the parameter database, using the functions get_parameter() and add_parameter(). The implementation of the database is not terribly efficient,2 so you shouldn't, for instance, read a parameter inside the code for a function or terminal. Reading a given parameter once per generation should be considered a maximum. If you need the value more often than that, you should buffer it in a C variable

get_parameter() takes the name of the parameter (the string) and returns a character pointer to its value, or NULL if the parameter is not present in the database. You should not modify the string returned; make a copy if you need to use it in a destructive manner. add_parameter() takes the parameter name, value, and a flag indicating whether the name or the value should be copied, or both. Adding a parameter that is already present overwrites the old value.

1 It can be disabled completely by removing or commenting out the line"#define TRACK_MEMORY" from protos.h.

2 Read "linear search."

	do subpopulation exchange, if necessary
	breed new population
	app_end_of_breeding()