380 lines
14 KiB
HTML
380 lines
14 KiB
HTML
|
<HTML>
|
||
|
<HEAD>
|
||
|
<TITLE></TITLE>
|
||
|
</HEAD>
|
||
|
<BODY BGCOLOR="#FFFFFF">
|
||
|
<P><B><FONT SIZE="5">Chapter 7</FONT></B></P>
|
||
|
<PRE> 7.1 <A HREF="#7.1">Tree Representation</A>
|
||
|
|
||
|
7.2 <A HREF="#7.2">Selection Methods</A>
|
||
|
7.3 <A HREF="#7.3">Operators</A>
|
||
|
7.4 <A HREF="#7.4">Miscellany</A>
|
||
|
7.4.1 <A HREF="#7.4.1">Tree Generation Spaces</A>
7.4.2 <A HREF="#7.4.2">Saved Individuals</A>
7.4.3 <A HREF="#7.4.3">Ephemeral Random Constants</A></PRE>
|
||
|
<P>
|
||
|
|
||
|
<HR>
|
||
|
</P>
|
||
|
<P>
|
||
|
<B><FONT SIZE="5">Extending the Kernel<BR>
|
||
|
</FONT></B>
|
||
|
</P>
|
||
|
<P>
|
||
|
Internally, lil-gp is fairly simple. I have attempted to keep
|
||
|
the structure reasonably clean and modular, without going too
|
||
|
overboard about avoiding global variables and such. 24 C files
|
||
|
comprise the kernel:<BR>
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<B>main.c</B> Initialization and cleanup.</P>
|
||
|
<P>
|
||
|
<B>gp.c</B> The main evaluate-and-breed cycle, and population
|
||
|
statistics calculation.</P>
|
||
|
<P>
|
||
|
<B>eval.c</B> The tree evaluator.</P>
|
||
|
<P>
|
||
|
<B>tree.c </B>Utility routines dealing with trees--counting nodes
|
||
|
and depth, printing, generating random trees, finding
|
||
|
subtrees, copying trees.</P>
|
||
|
<P>
|
||
|
<B>change.c</B> Breeding of the new population each generation.
|
||
|
</P>
|
||
|
<P>
|
||
|
<B>crossovr.c </B>The crossover operator.</P>
|
||
|
<P>
|
||
|
<B>reproduc.c </B>The reproduction operator.</P>
|
||
|
<P>
|
||
|
<B>mutate.c</B> The mutation operator.</P>
|
||
|
<P>
|
||
|
<B>select.c</B> Utility routines for selection methods. </P>
|
||
|
<P>
|
||
|
<B>tournmnt.c</B> The <B>tournament</B> selection method.</P>
|
||
|
<P>
|
||
|
<B>bstworst.c</B> The <B>best,</B> <B>worst</B>, and <B>random</B>
|
||
|
selection methods.</P>
|
||
|
<P>
|
||
|
<B>fitness.c</B> The <B>fitness, fitness_overselect</B>, and <B>inverse_fitness</B>
|
||
|
selection methods.</P>
|
||
|
<P>
|
||
|
<B>genspace.c</B> Utility routines for allocating space to grow
|
||
|
new trees in.</P>
|
||
|
<P>
|
||
|
<B>exch.c</B> The subpopulation exchange system.</P>
|
||
|
<P>
|
||
|
<B>populate.c </B>Utility routines for population--copying, freeing,
|
||
|
random generation.</P>
|
||
|
<P>
|
||
|
<B>ephem.c</B> Utility routines for ephemeral random constants.
|
||
|
</P>
|
||
|
<P>
|
||
|
<B>ckpoint.c</B> Reading and writing checkpoint files. </P>
|
||
|
<P>
|
||
|
<B>event.c </B>System-dependent module for tracking execution
|
||
|
time.</P>
|
||
|
<P>
|
||
|
<B>pretty.c</B> The tree pretty-printer.</P>
|
||
|
<P>
|
||
|
<B>individ.c</B> Utility routines for whole individuals--printing
|
||
|
and calculating size.</P>
|
||
|
<P>
|
||
|
<B>params.c</B> The parameter database.</P>
|
||
|
<P>
|
||
|
<B>random.c</B> The portable pseudorandom number generator, adapted
|
||
|
from <I>Numerical Recipes in FORTRAN</I>.</P>
|
||
|
<P>
|
||
|
<B>memory.c</B> The implementations of <B>MALLOC(), FREE(),</B>
|
||
|
and <B>REALLOC()</B> that track memory usage.</P>
|
||
|
<P>
|
||
|
<B>output.c</B> The output subsystem (<B>oprintf()</B> and <B>oputs(),</B>
|
||
|
among others).</P>
|
||
|
<P> </P>
|
||
|
<P>
|
||
|
<B><A NAME="7.1">7.1 Tree Representation</A></B></P>
|
||
|
<P>
|
||
|
A tree is stored as an array of type <B>lnode.</B> An lnode is
|
||
|
a union which can can a pointer to a function structure, a pointer
|
||
|
to an ephemeral random constant (ERC) structure, or an integer.
|
||
|
The tree is stored in prefix order. The first lnode is always
|
||
|
a pointer to a function. If the function is an ERC terminal, then
|
||
|
the next lnode in the array has the pointer to the ERC structure.
|
||
|
If a function is of type EXPR (the user code controls evaluation
|
||
|
of the function's arguments) then there is an extra lnode just
|
||
|
before the start of each child_it contains an integer, the number
|
||
|
of lnodes used in representing the child subtree. The evaluation
|
||
|
code uses this value to skip the child during evaluation.</P>
|
||
|
<P>
|
||
|
Consider this expression in the symbolic regression problem:</P>
|
||
|
<PRE>
|
||
|
<B>(+ X (iflte X (* .34 X) .56))<BR></B></PRE>
|
||
|
<P>
|
||
|
This would be represented in lil-gp as the array:<FONT SIZE="2"></FONT></P>
|
||
|
<P><FONT SIZE="2">
|
||
|
<BR>
|
||
|
</FONT>
|
||
|
</P>
|
||
|
<TABLE BORDER="0" ALIGN="DEFAULT" WIDTH="100%">
|
||
|
<TR>
|
||
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="10%">+ </TD>
|
||
|
<TD ALIGN="LEFT" VALIGN="TOP">pointer to structure for function +
</TD>
|
||
|
</TR>
|
||
|
<TR>
|
||
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="10%">X </TD>
|
||
|
<TD ALIGN="LEFT" VALIGN="TOP">pointer to structure for function X
</TD>
|
||
|
</TR>
|
||
|
<TR>
|
||
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="10%">iflte</TD>
|
||
|
<TD>pointer to structure for function iflte
</TD>
|
||
|
</TR>
|
||
|
<TR>
|
||
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="10%">1</TD>
|
||
|
<TD>
first argument to iflte takes 1 lnode to store
</TD>
|
||
|
</TR>
|
||
|
<TR>
|
||
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="10%">X</TD>
|
||
|
<TD>
|
||
|
</TD>
|
||
|
</TR>
|
||
|
<TR>
|
||
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="10%">4 </TD>
|
||
|
<TD>second argument to iflte takes 4 lnodes to store
</TD>
|
||
|
</TR>
|
||
|
<TR>
|
||
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="10%">* </TD>
|
||
|
<TD>pointer to structure for function *
</TD>
|
||
|
</TR>
|
||
|
<TR>
|
||
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="10%">R</TD>
|
||
|
<TD>pointer to structure for ERC function
</TD>
|
||
|
</TR>
|
||
|
<TR>
|
||
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="10%">.34</TD>
|
||
|
<TD>pointer to structure containing ERC value
</TD>
|
||
|
</TR>
|
||
|
<TR>
|
||
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="10%">X</TD>
|
||
|
<TD></TD>
|
||
|
</TR>
|
||
|
<TR>
|
||
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="10%">2</TD>
|
||
|
<TD> third argument to iflte takes 2 lnodes to store
</TD>
|
||
|
</TR>
|
||
|
<TR>
|
||
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="10%">R </TD>
|
||
|
<TD></TD>
|
||
|
</TR>
|
||
|
<TR>
|
||
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="10%">.56 </TD>
|
||
|
<TD></TD>
|
||
|
</TR>
|
||
|
</TABLE>
|
||
|
|
||
|
<P>
|
||
|
<FONT SIZE="2"><BR>
|
||
|
</FONT>
|
||
|
</P>
|
||
|
<P>
|
||
|
This representation, while it may seem cumbersome, has two major
|
||
|
advantages over a more traditional C representation (with individual
|
||
|
structs for each node, linked by pointers). First, it uses much
|
||
|
less memory--approximately 1-2 words per node versus the 4-5 it
|
||
|
would take otherwise. This is because the structure of the tree
|
||
|
is represented implicitly in the ordering of the nodes rather
|
||
|
than explicitly via pointers. Second, it results in much faster
|
||
|
tree evaluation. For instance, consider the crossover operator.
|
||
|
In the traditional representation crossover is performed by just
|
||
|
swapping two pointers. While this is very fast and easy, over
|
||
|
time it means that the nodes of a given tree become spread out
|
||
|
over the process's address space. On a system with virtual memory,
|
||
|
this slows evaluation (or any traversal of the tree) to a crawl
|
||
|
as the tree is spread across dozens of pages which must constantly
|
||
|
be swapped in and out. lil-gp's representation complicates crossover
|
||
|
somewhat, but leaves each offspring tree as a single continuous
|
||
|
block of memory, able to fit on just one or two pages.<BR>
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<B><FONT SIZE="4"><A NAME="7.2">7.2 Selection Methods</A></FONT></B></P>
|
||
|
<P>
|
||
|
A selection method is implemented with two functions: one to perform
|
||
|
initialization and cleanup, and another to do the actual selection.
|
||
|
The first function, when called for initialization, creates and
|
||
|
returns a data structure called a selection context. This contains
|
||
|
any state information needed for the selection method. It should
|
||
|
also store a pointer to the population that the selection is being
|
||
|
done on. This structure will be passed to the second function,
|
||
|
which should return an index of an individual within the population.
|
||
|
</P>
|
||
|
<P>
|
||
|
To create a new selection method, it is suggested that you copy
|
||
|
and modify an existing one (the <B>random</B> method in <B>bstworst.c</B>
|
||
|
is an especially simple one). If your of selection can be expressed
|
||
|
as randomly selecting an individual from a set where each individual
|
||
|
has a fixed probability of selection, then there is already an
|
||
|
efficient implementation of the second function. See the code
|
||
|
for the <B>fitness, fitness_overselect,</B> and <B>inverse_fitness</B>
|
||
|
methods for examples</P>
|
||
|
<P>
|
||
|
Finally, you must add a record describing your selection method
|
||
|
to the array <B>select_method_table</B> at the top of <B>select.c.</B>
|
||
|
This lists the names and initialization functions of each selection
|
||
|
method available.<BR>
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<B><FONT SIZE="4"><A NAME="7.3">7.3 Operator</A></FONT></B></P>
|
||
|
<P>
|
||
|
The breeding of the population is controlled by a table of breeding
|
||
|
phases. This is built from the parameter database at the start
|
||
|
of the run (and whenever <B>rebuild_breeding_table()</B> is called
|
||
|
from user code. Each subpopulation has its own breeding table.
|
||
|
For each phase, there is a record in the table. Each record has
|
||
|
pointers to the four methods for the operator, the rate for that
|
||
|
phase, and a pointer to an operator-specific structure</P>
|
||
|
<P>
|
||
|
To implement a new operator, it is suggested that you copy and
|
||
|
modify the code of an existing operator. The reproduction operator
|
||
|
is the simplest of the three included. Suppose that your new operator
|
||
|
is called "foo". You would need to provide five functions
|
||
|
(the following naming scheme is strongly recommended):<BR>
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<B>operator_foo_init</B> Parses the operator's options string
|
||
|
and builds the operator table entry. The kernel
|
||
|
functions <B>parse_o_rama()</B> and <B>free_o_rama()</B> are available
|
||
|
to parse the options string just like the built-in operators do.
|
||
|
<BR>
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<B>operator_foo_free</B> Frees the operator-specific part of the
|
||
|
operator table entry.<BR>
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<B>operator_foo_start</B> This is called at the start of breeding.
|
||
|
Selection methods should be initialized here.<BR>
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<B>operator_foo_end</B> This is called at the end of breeding.
|
||
|
Selection contexts should be freed here.<BR>
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<B>operator_foo</B> This performs the actual operation.<FONT SIZE="2">
|
||
|
<BR>
|
||
|
</FONT>
|
||
|
</P>
|
||
|
<P>
|
||
|
You then add a record to the array <B>operator_table</B> at the
|
||
|
top of <B>change.c</B>, listing the name of the operator (a string)
|
||
|
and the operator's initialization method (a function pointer),
|
||
|
for example, after adding the foo operator the table would look
|
||
|
like:</P>
|
||
|
<PRE>
|
||
|
<B>operator operator_table[] =</B>
|
||
|
<B>{ { "crossover", operator_crossover_init },</B>
|
||
|
<B> { "reproduction", operator_reproduce_init },</B>
|
||
|
<B> { "mutation", operator_mutate_init },</B>
|
||
|
<B> { "foo", operator_foo_init },</B>
|
||
|
<B> { NULL, NULL } };<BR>
|
||
|
</B>
|
||
|
</PRE>
|
||
|
<P>
|
||
|
The <B>next</B> field of the new population structure gives the
|
||
|
index at which the operator should place the new individual. After
|
||
|
adding an individual, increment the <B>next</B> field. If your
|
||
|
operator produces multiple offspring with a single call then you
|
||
|
must make sure that you don't overfill the population (the <B>next</B>
|
||
|
field should not exceed the <B>size</B> field). The crossover
|
||
|
operator, for instance, normally produces two offspring on each
|
||
|
call. If it is called when there is only one more space in the
|
||
|
population, it fills the space and throws the other offspring
|
||
|
away.</P>
|
||
|
<P>
|
||
|
Your operator should always add at least one individual to the
|
||
|
new population per call. If it does not, infinite loops may occur
|
||
|
when the <B>probabilistic_operators</B> parameter is off.<BR>
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<B><FONT SIZE="4"><A NAME="7.4">7.4 Miscellany</A></FONT></B></P>
|
||
|
<P>
|
||
|
This section gives a general overview of how some of the nonobvious
|
||
|
parts of lil-gp work and how they fit together.<BR>
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<B><A NAME="7.4.1">7.4.1 Tree Generation Spaces</A></B></P>
|
||
|
<P>
|
||
|
When building a new tree, lilgp needs a continuous block of memory
|
||
|
to put the tree in, but can't allocate the final location of the
|
||
|
tree because the size isn't known ahead of time. Therefore, special
|
||
|
blocks of memory are allocated to grow the trees in. Every time
|
||
|
an lnode is added to a tree-in- progress, the function <B>gensp_next</B>
|
||
|
(or <B>gensp_next_int</B>) is called to enlarge the memory block
|
||
|
if needed. Once the tree is finished and its size is known, its
|
||
|
final location is allocated and the tree is copied from the generation
|
||
|
space</P>
|
||
|
<P>
|
||
|
Currently there are two generation spaces needed. If you are implementing
|
||
|
an operator or some- thing that requires three or more trees to
|
||
|
be grown simultaneously, increase <B>GENSPACE_COUNT</B> in <B>defines.h.
|
||
|
</B></P>
|
||
|
<P>
|
||
|
Generation spaces are initially allocated to hold <B>GENSPACE_START</B>
|
||
|
lnodes, and grow in steps of <B>GENSPACE_GROW</B> lnodes. These
|
||
|
constants are in <B>defines.h</B>.<BR>
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<B><A NAME="7.4.2">7.4.2 Saved Individuals</A></B></P>
|
||
|
<P>
|
||
|
lilgp tracks the best <I>n</I> individuals of each population,
|
||
|
where <I>n</I> is a user-settable parameter. Pointers to these
|
||
|
best individuals are passed to the user function <B>app_end_of_evaluation()</B>.
|
||
|
To ensure that these pointers are always good, the kernel makes
|
||
|
a copy of each individual in the top n and passes the address
|
||
|
of the copy (since the original individual may not survive the
|
||
|
breeding process)</P>
|
||
|
<P>
|
||
|
An individual can be referenced in multiple top-n lists (best-of-gen,
|
||
|
best-of-run, etc.). To avoid making multiple copies, a reference
|
||
|
count is kept for each individual. All these saved individuals
|
||
|
are kept in a linked list, and once per generation a garbage collection
|
||
|
procedure traverses the list and frees any individuals which have
|
||
|
no references to them.<BR>
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<A NAME="7.4.3"><B>7.4.3 Ephemeral Random Constants</B></A><B></B></P>
|
||
|
<P>
|
||
|
Whenever a new ephemeral random constant terminal is inserted
|
||
|
into a tree (during the generation of the initial population,
|
||
|
or during mutation operations) the function <B>new_ephemeral_const()</B>
|
||
|
is called to create the constant. It calls the user-supplied generation
|
||
|
function to create a new value. Each ERC record stores the value,
|
||
|
along with a reference count of how many tree nodes point to that
|
||
|
value. The ERC records are maintained in a linked list. Once per
|
||
|
generation, a garbage collection routine traverses the linked
|
||
|
list and removes any ERCs which are no longer needed (that have
|
||
|
a reference count of zero)</P>
|
||
|
<P>
|
||
|
The ERC records are not allocated individually but in large blocks.
|
||
|
This ensures that all the ERCs are kept on a few memory pages
|
||
|
at most, which reduces the need for paging during evaluation (when
|
||
|
there are many scattered references to the ERCs). When an ERC
|
||
|
is freed by the garbage collection routine, it is added to the
|
||
|
end of the free list. If the free list ever becomes empty, a new
|
||
|
large block of ERC records is allocated and all of them added
|
||
|
on to the end of the free list</P>
|
||
|
<P>
|
||
|
Pointers to the blocks themselves are kept in an array so that
|
||
|
the blocks can be freed at the conclusion of the run. Since the
|
||
|
number of blocks can increase, this array is reallocated as necessary.
|
||
|
<BR>
|
||
|
|
||
|
</P>
|
||
|
</BODY>
|
||
|
</HTML>
|