Chapter 6
6.1 Basic Definitions 6.2 Functions and Terminals 6.2.1 Ephemeral Random Constants 6.2.2 Evaluation and Argument Functions 6.3 User Callbacks 6.3.1 Defining the Function Set(s) 6.3.2 Fitness Evaluation Function 6.3.3 Custom Output 6.3.4 Application Initialization 6.3.5 Output Streams 6.3.6 Checkpoint Files 6.4 Order of Processing 6.5 Kernel Considerations 6.5.1 Memory Allocation 6.5.2 Using Parameters
Implementing Problems
This chapter documents how to implement a new problem in lil-gp. There are five files that the user must write. A set of skeleton user files is provided in the distribution, it is suggested that you copy these files and modify them to create a new problem.
Throughout this chapter, the term "function" refers to functions in the GP sense. "C function" refers to a function in the C language
User-written code can be divided into two categories: C functions
implementing functions and terminals, and user callbacks. The
user callbacks, usually placed in the app.c file, do application-
specific tasks like function set initialization, calculation of
fitness, etc. The other group of C functions, usually placed in
function.c, are the code that is called by the kernel during
tree evaluation.
There are two defined constants that the kernel of lil-gp needs
in appdef.h. They are:
constant | value |
MAXARGS | the maximum number of arguments (children) for any function |
DATATYPE | the C data type returned by all functions and terminals |
This is also a good place to put any application-specific #defines that you may need. It is suggested that all application defines be prefixed with APP_ so as not to conflict with any current or future kernel defines.
If your problem requires a more complex data type than the ones available in C, you can use typedef to create a new type. For instance, the lawnmower problem uses an ordered pair of integers as its datatype. Its appdef.h file contains:
typedef struct { short x; short y; } vector; #define DATATYPE vector
For every ordinary function and terminal in your problem, you write a C function to implement the action of that node. These C functions are placed in the file function.c, and prototypes for them should be placed in function.h.
Each C function is passed two arguments, an int and a (farg *). What it does with these arguments depends on whether it is implementing a function or a terminal, and if it is a function, what type of function. All these C functions should return the user-defined type DATATYPE.
There are two types of functions, referred to in lil-gp as types "DATA" and "EXPR". If the function is of type DATA, then when it is found in a tree, all its children will be evaluated and their return values passed to the user code implementing the function. The LISP equivalent of this is to implement the function with a defun. If the lil-gp function is of type EXPR, then the user code is passed pointers to its children, which it can then ask the kernel to evaluate if needed. It can evaluate each child as many times as appropriate, or not at all. The LISP equivalent of this type would be to implement the function with a defmacro. Use of the correct type in lil-gp is important, especially when the evaluation of functions and terminals have global side effects (for instance, where the evolved program is controlling a simulation).
If the function is of type DATA, it can ignore the int passed to it. The (farg *) argument will be an array of arguments, one element for each child. The C function should reference the d field of each element to get that child's value. For instance, consider the two-argument addition function from the regression problem:
DATATYPE f_add ( int tree, farg *args { return args[0].d + args[1].d; }
When this function occurs in evaluating a tree, the lil-gp kernel will evaluate the children, store their values in the args array, and call this C function.
Now consider another example: the IF_FOOD_AHEAD function
from the artificial ant problem. It has two arguments_the first
should be evaluated if there is food in front of the ant, the
second otherwise. If type DATA were to be used for this function,
then both would be evaluated and only their return values passed
to the function (which would be doubly useless in this case, since
all the functions and terminals in the ant problem ignore the
return value). We want to let the function itself choose which
child to evaluate. This function must be of type EXPR:
DATATYPE f_if_food_ahead ( int tree, farg *args ) { if ( ... ) /* determine if there is food ahead */ evaluate_tree ( args[0].t, tree ); else evaluate_tree ( args[1].t, tree ); }
For type EXPR functions, the t field of each array element should be accessed-it is a pointer to the corresponding child. This pointer can be passed to the evaluate_tree() C function to actually do the evaluation. evaluate_tree() also needs to be passed the integer argument (called tree in this case).
C functions implementing terminals should ignore both arguments
passed to them. A simple example is the independent variable terminal
X from the symbolic regression problem:
DATATYPE f_indepvar ( int tree, farg *args ) { return g.x; }
This function just returns the value of the independent variable
for the current fitness case, which has previously been stored
in a global variable by the application fitness evaluation function.
6.2.1 Ephemeral Random Constants
To create a terminal that acts as an ephemeral random constant,
you need to write two C functions. One will generate a new constant,
and one will print its value to a string. The first is passed
a pointer to a DATATYPE; it should generate a new value
and place it in the pointer.
void f_erc_generate ( DATATYPE *r ) { *r = random_double() * 10.0; }
This function generates a random real number in the interval [0; 10) (assuming that DATATYPE is defined to be double or some compatible type.
The second function is used when printing out individuals. It
is passed a DATATYPE value. It should create a string representing
that value and return it. Typically this will print the value
into a buffer and return the buffer's address. The buffer should
be declared static_it should not be dynamically allocated (as
there is no code to free it). An example:
char *f_erc_print ( DATATYPE v ) { static char buffer[20];
sprintf ( buffer, "%.5f", v ); return buffer; }
assuming again that DATATYPE is double or something compatible,
this will print the value to five decimal places.
6.2.2 Evaluation and Argument Functions
No user code needs to be written to support the ADF functions or corresponding argument termi- nals. Special entries are made in the function table for them, and the kernel handles the evaluation internally
Evaluation functions with arguments have type DATA or EXPR, just
like ordinary functions. If the type is DATA, when the evaluation
function is hit, each child is evaluated once, and the return
values are made available via the argument terminals in the evaluated
tree. If the type is EXPR, then the children are evaluated only
when the evaluation of the target tree hits the appropriate argument
terminal (and if the same argument terminal is hit multiple times,
the child is reevaluated each time).
Only two of the user callbacks listed here are required to do
anything (app_build_function_sets() to create the function
set(s) and app_eval_fitness() to evaluate individuals).
All the others must be present, but they can be just stubs if
you don't want to make use of them.
6.3.1 Defining the Function Set(s)
The first user callback required is app_build_function_sets().
This C function creates tables for each function set. There may
be more than one function set when individuals are represented
by multiple trees, since each tree can have its own function set.
Each function set is an array of type function. The following
tables show, for each type of node, what the eight fields of the
corresponding function structure should be. Some general rules
apply:
ordinary function
code | The C function implementing the function. |
ephem_gen | NULL |
ephem_str | NULL |
arity | The arity of the function (greater than zero). |
string | The name of the function. |
type | FUNC_DATA or FUNC_EXPR, as appropriate. |
evaltree | -1 |
index | 0 |
ordinary terminal
code | The C function implementing the function. |
ephem_gen | NULL |
ephem_str | NULL |
arity | 0 |
string | The name of the terminal |
type | TERM_E |
evaltree | -1 |
index | 0 |
ephemeral random constant terminal
code | NULL |
ephem_gen | The C function to generate new random values. |
ephem_str | The C function to print values to a string. |
arity | 0 |
string | The generic name of the terminal. (Printed trees willalmost always have the string representing the value ofthe terminal, rather than this name. |
type | TERM_ERC |
evaltree | -1 |
index | 0 |
evaluation function/terminal
code | NULL |
ephem_gen | NULL |
ephem_str | NULL |
arity | -1. (The kernel will determine the arity by lookingat the argument terminals in the target tree.) |
string | The name of this function/terminal. |
type | EVAL_DATA or EVAL_EXPR, as appropriate. |
evaltree | The number of the tree to evaluate when this function is hit. |
index | 0 |
argument terminal
code | NULL |
ephem_gen | NULL |
ephem_str | NULL |
arity | 0 |
string | The name of this terminal. |
type | TERM_ARG |
evaltree | The argument number (which child of the correspondingevaluation function this terminal represents). |
index | 0 |
The function sets for the lawnmower problem contain examples of
all five types of node:
function sets[3][10] = /*** RPB ***/ { { { f_left, NULL, NULL, 0, "left", TERM_NORM, -1,0 }, { f_mow, NULL, NULL, 0, "mow", TERM_NORM, -1, 0 }, { NULL, f_vecgen, f_vecstr, 0, "Rvm", TERM_ERC, -1,0 }, { f_frog, NULL, NULL, 1, "frog", FUNC_DATA, -1, 0}, { f_vma, NULL, NULL, 2, "vma", FUNC_DATA, -1, 0 }, { f_prog2, NULL, NULL, 2, "prog2", FUNC_DATA, -1,0 }, { NULL, NULL, NULL, -1, "ADF0", EVAL_DATA, 1, 0 }, { NULL, NULL, NULL, -1, "ADF1", EVAL_DATA, 2, 0 }},
/*** ADF0 ***/ { { f_vma, NULL, NULL, 2, "vma", FUNC_DATA, -1, 0}, { f_prog2, NULL, NULL, 2, "prog2", FUNC_DATA, -1,0 }, { f_left, NULL, NULL, 0, "left", TERM_NORM, -1, 0}, { f_mow, NULL, NULL, 0, "mow", TERM_NORM, -1, 0 }, { NULL, f_vecgen, f_vecstr, 0, "Rvm", TERM_ERC, -1,0 } },
/*** ADF1 ***/ { { f_left, NULL, NULL, 0, "left", TERM_NORM, -1,0 }, { f_mow, NULL, NULL, 0, "mow", TERM_NORM, -1, 0 }, { NULL, f_vecgen, f_vecstr, 0, "Rvm", TERM_ERC, -1,0 }, { NULL, NULL, NULL, 0, "ARG0", TERM_ARG, 0, 0 },
{ f_frog, NULL, NULL, 1, "frog", FUNC_DATA, -1, 0}, { f_vma, NULL, NULL, 2, "vma", FUNC_DATA, -1, 0 }, { f_prog2, NULL, NULL, 2, "prog2", FUNC_DATA, -1,0 }, { NULL, NULL, NULL, -1, "ADF0", EVAL_DATA, 1, 0 }} };
This problem uses two ADFs-the zero-argument ADF0 and the one-argument ADF1. Both ADFs are available to the result-producing branch. In addition, ADF0 can be called from within ADF1
Note that the functions and terminals can appear in the table in any order. Previous versions of lil-gp required all functions to appear first in the table, followed by the terminals, but this is no longer the case
Once the function table is created, a list of function sets needs
to be created that references it. You should create an array of
type function_set with one member for each function set.
The size field should be set to the number of functions and terminals
in it, and the cset field should point to the function
table. The lawnmower problem uses:
function_set *fset; . . . . fset = (function_set *)MALLOC ( 3 * sizeof ( function_set )); fset[0].size = 8; fset[0].cset = sets[0]; fset[1].size = 5; fset[1].cset = sets[1]; fset[2].size = 8; fset[2].cset = sets[2];
Next you must build a tree map, indicating which trees use which function sets. This is just an array of ints, where the nth element indicates the number of the function set of the nth tree. In the case of the lawnmower problem, there is just one tree per function set:
tree_map = (int *)MALLOC ( 3 * sizeof ( int ) ); tree_map[0] = 0; tree_map[1] = 1; tree_map[2] = 2;
If two trees use the same function set, then crossover may exchange genetic material between these trees on different individuals. If this is not desired, you can make a copy of the function set, and have one tree use the copy. This would be accomplished with something like:
fset[2].size = 8; fset[2].cset = sets[2]; fset[3].size = 8; fset[3].cset = sets[2]; /* note they refer to the same functions*/ . . . tree_map[2] = 2; tree_map[3] = 3;
Now trees 2 and 3 will not crossover with each other, even though their function sets are identical
One last thing to build is a list of tree names--these will be used to label the separate trees when individuals are printed out:
char *tree_name[3]; . . . tree_name[0] = "RPB"; tree_name[1] = "ADF0"; tree_name[2] = "ADF1";
Now that all the data structures are built, you must pass them as arguments to the kernel function function_sets_init(). This function will do some validity checking and make internal copies of everything. After this function returns, you may destroy your copies. You should also save the return value of this function (an int) and return it to the kernel.
int ret; . . . ret = function_sets_init ( fset, 3, tree_map, tree_name, 3);
FREE ( tree_map ); FREE ( fset ) ;
return ret;
The second argument to function_sets_init() is the number
of function sets, the fifth argument is the number of trees per
individual.
6.3.2 Fitness Evaluation Function
The user function app_eval_fitness() is called whenever an individual is to be evaluated. It is passed a pointer to an individual structure. It should fill in these fields:
r_fitness The raw fitness.
s_fitness The standardized fitness (all values nonnegative,
a perfect individual is zero).
a_fitness The adjusted fitness (lies in the interval [0;
1], a perfect individual is one).
hits The auxiliary hits measure.
evald Always set this to EVAL_CACHE_VALID to indicate
that the fitness fields are valid.
The function should call set_current_individual() with the pointer passed to it before doing any evaluations. The function can evaluate trees of the individual by calling evaluate_tree(), passing it a pointer to the tree data and the tree number.
Typically the function will iterate over all the fitness cases. The global variable g, which is a user-defined structure, is used to pass information between app_eval_fitness() and the functions and terminals. For example, in the symbolic regression problem, g.x is set to the x value for the current fitness case, then the tree is evaluated. When the evaluation reaches the independent variable terminal, the C function implementing it simply reads this value and returns it.
A typical evaluation function will have this general structure:
void app_eval_fitness ( individual *ind ) { set_current_individual ( ind ); . . . for ( <loop over fitness cases> ) { <set up global structure for current fitness case>
/* here we evaluate tree 0, but you can evaluate any tree of * the individual as many times as you like. */ value = evaluate_tree ( ind->tr[0].data, 0 ); . . . }
ind->hits = <whatever>; ind->r_fitness = <whatever>; ind->s_fitness = <whatever>; ind->a_fitness = <whatever>;
/* indicate that the fitness fields are correct.*/ ind->evald = EVAL_CACHE_VALID; }
More complex problems which require a simulation store the entire state of the simulation in g. app_eval_fitness() resets the simulation, before evaluating the tree. For instance, in the artificial ant problem the tree is evaluated repeatedly until the time expires or all the food has been collected
The functions and terminals read and modify the global state information
in order to simulate the ant's senses and movements.
After every the evaluation of each generation, lil-gp calls the function app_end_of_evalulation(). It is passed the generation number, a pointer to the entire population, statistics for the run and generation, and a flag indicating whether a new best-of-run individual has been found or not. It should return a 1 or 0, indicating whether the user termination criterion has been met and the run should stop
Suppose the that the function is declared with the following argument names:
int app_end_of_evaluation ( int gen, multipop *mpop, int newbest, popstats *gen_stats, popstats *run_stats )
The population is passed as the pointer to a structure of type (multipop *). Everything within this structure should be treated as read-only. This table gives some useful items of information stored in this structure:
mpop->size | number of subpopulations |
mpop->pop[p]->size | size of population p |
mpop->pop[p]->ind[i] | the i'th individual of population p |
mpop->pop[p]->ind[i].r_fitness | raw fitness of individual |
mpop->pop[p]->ind[i].s_fitness | standardized fitness of individual |
mpop->pop[p]->ind[i].a_fitness | adjusted fitness of individual |
mpop->pop[p]->ind[i].hits | hits of individual |
mpop->pop[p]->ind[i].tr[n].data | tree n data pointer |
The tree data pointer(s) can be passed to evaluate_tree() to evaluate the tree just as in the evaluation function. To print the entire individual, pass its address to print_individual() or pretty_print_individual().
The content of the statistics structure should be discernible to the interested reader from the declaration in types.h. gen_stats[0] is statistics for the whole population in the current generation, while gen_stats[i] gives the same just for subpopulation i. The run_stats array is similar, but accumulates information over the whole run
In many problems it is useful to access the best-of-run or best-of-generation individual for printing or doing extra evaluations. For instance, the symbolic regression problem produces an extra output file with the best-of-run individual evaluated at 200 points over the interval of interest, for easy plotting. A copy of the best-of-run individual is pointed to by run_stats[0].best[0]->ind, and the best-of-generation individual by gen_stats[0].best[0]->ind.
In versions of lil-gp prior to 0.99b, it was an undocumented feature that by modifying the parameter database, the breeding parameters could be altered dynamically during the run. If you took advantage of this, you must now call rebuild_breeding_table() after modifying the parameters, and pass it the multipop pointer passed to you. If you do not, your changes to the parameter database will have no effect. This ability is now considered a bona fide feature of lil-gp, and will be supported in future releases
Changes to the subpopulation exchange topology parameters underwent a similar change. If you change the parameters during the run, you should call rebuild_exchange_topology() after making changes in order for them to have any effect
Some kernel operations (for instance, restarting from a checkpoint file) imply rebuilding the breeding and topology tables from the parameter database. You should only make changes to these parameters when you intend to immediately call the appropriate rebuilding functions, otherwise unpredictable things will occur.
Another user callback app_end_of_breeding() is called after
the new population is created each generation. This is passed
the generation number and the population structure, just as in
the end of evaluation callback, but no statistics information.
6.3.4 Application Initialization
There are two functions provided for application-specific initialization: app_initialize() and app_uninitialize(). app_initialize() is passed an integer flag indicating whether the run is starting from a checkpoint or not. It should return 0 to indicate success, or anything to abort the run
Initialization such as memory allocation and reading parameters
should go in app_initialize(). The last function is called
at the end of the run, and may used to do things like free memory.
An output stream is a simple abstraction of an output file. This mechanism handles both the naming of the actual file and uses the detail level (the output.detail parameter) to filter the output. Some functions are provided for writing to output streams:
oputs ( int streamid, int detail, char *string ) Prints
the string to the given output stream, if the value of
detail is less than or equal to the current detail level.
oprintf ( int streamid, int detail, char *format, ... )
Processes the format and succeeding arguments as in
printf(), and prints the resulting string to the stream if the
detail is less than or equal to the current detail level.
test_detail_level ( int detail ) Returns true if the argument
is less than or equal to the current detail level.
output_filehandle ( int streamid ) Returns the filehandle
(FILE *) for the given stream. Useful for passing to
print_tree() and the like.
The standard output files (.sys, .gen, etc.) are can be printed to with the stream ids OUT_SYS, OUT_GEN, etc. For instance:
oprintf ( OUT_SYS, 30, "Tree %d is:"n", tree_num); if ( test_detail_level ( 30 ) ) print_tree ( tree[tree_num], output_filehandle ( OUT_SYS ));
An application can define custom output streams (for instance, the .fn output file of the regression problem). This is done in the application function app_create_output_streams(). This function should be used only to create user output streams. In it, you call create_output_stream() with five arguments
id The id for the stream (an integer). User-defined output
streams should have ids OUT_USER,
OUT_USER+1, etc.
ext The extension for the filename. This string is appended
to a basename (the parameter output.basename)
to create the filename).
reset A flag indicating whether the stream can be closed
and reopened (using the functions
output_stream_close() and output_stream_open()).
Reopening a stream overwrites the old file (like
the .bst file).
mode The mode string to pass to fopen() when opening
the file. Typically will be "w" or "wb".
autoflush Flag indicating whether the file should be flushed
after each call to oputs() and oprintf().
app_create_output_streams() is called before any parameters
have been loaded, so you should not attempt to read the parameter
database in this function.
Two functions are provided for saving user state to checkpoint
files, app_write_checkpoint() and app_read_checkpoint().
Each is passed a file handle (FILE *) opened in text mode
for writing or reading, respectively. Each function should leave
the file pointer at the end of the user section.
Here is the order things happen in during a run.
print startup message
initialize parameter database
initialize ERCs
initialize generation space
app_create_output_streams()
initialize output streams
pre_parameter_defaults()
process command-line arguments in order, possibly including loading of checkpoint file
if not starting from checkpoint, post_parameter_defaults()
open output files
if not already done (during loading of checkpoint), app_build_function_sets()
read tree node/depth limits from parameters
if not starting from checkpoint, seed random number generator
app_initialize()
if not starting from checkpoint, create initial random population
initialize subpopulation exchange topology
initialize breeding table
run the GP: until termination
evaluate the population, unless this is first generation after loading checkpoint
compute population statistics
app_end_of_evaluation()
write checkpoint file, if necessary
if this is not the last generation
do subpopulation exchange, if necessary breed new population app_end_of_breeding()
app_uninitialize()
free breeding table
free subpopulation exchange topology
free population
free parameter database
free ERCs
free generation spaces
free function sets
print system statistics
close output streams
lil-gp system has a system for tracking memory usage.1 This is
helpful in tracking down mem- ory leaks, among other things. To
use it, just use MALLOC(), REALLOC(), and FREE()
instead of malloc(), realloc(), and free(). The
uppercased versions should work exactly like their low- ercased
counterparts. You may use the lowercase versions if you do not
wish to have the memory included in the statistics, but do
not mix pointers returned by the two different sets of
functions. Don't FREE memory that you've malloc'ed,
etc.
User code may read and write the parameter database, using the functions get_parameter() and add_parameter(). The implementation of the database is not terribly efficient,2 so you shouldn't, for instance, read a parameter inside the code for a function or terminal. Reading a given parameter once per generation should be considered a maximum. If you need the value more often than that, you should buffer it in a C variable
get_parameter() takes the name of the parameter (the string)
and returns a character pointer to its value, or NULL if
the parameter is not present in the database. You should not modify
the string returned; make a copy if you need to use it in a destructive
manner. add_parameter() takes the parameter name, value,
and a flag indicating whether the name or the value should be
copied, or both. Adding a parameter that is already present overwrites
the old value.
1 It can be disabled completely by removing or commenting out the line"#define TRACK_MEMORY" from protos.h.
2 Read "linear search."