Chapter 4
4.1 Invoking lil-gp 4.1.1 Parameter Files 4.2 Output Files
Running lil-gp
This chapter describes how to use lil-gp once you have built the executable program, including the various command-line options available.
The default executable name produced by the makefile is "gp".
Once you have built this, you can run it from the command line.
There are six command-line arguments you can use:
-f <parameter_filename>
This loads a parameter file into lil-gp's parameter database.
The format of this file is explained in section 4.1.1. You can
have multiple -f options.
-p <parametername=value>
This sets the value of an individual parameter. You
can have multiple -p options.
-c <checkpoint_filename>
This restarts lil-gp from the specified checkpoint file. You may
only have one -c option.
-d <defined_symbol>
This defines the given parameter file symbol. The space between
the "d" and the symbol is optional.
-u <parameter_file_directive>
This undefines the given parameter file symbol. The space between
the "u" and the symbol is optional.
-q
Causes lil-gp to run in quiet mode, producing nothing on standard
output. Without this option, everything written to the .SYS
output file is also printed to the terminal.
These options can be specified in any order, but the order is significant. Options that modify the parameter database (all except -q) are processed in the order they appear on the command line. If a parameter is multiply defined, it is the last occurrence of it that takes effect. A listing and explanation of all parameters available is given in chapter 5. Order of directives is important too, so for instance
% gp -dBLAH -f file1 -uBLAH -f file2
will cause the symbol "BLAH" to be defined during the
processing of file1 but not file2.
Note that all parameter settings are saved in the checkpoint files,
so if you are just restarting an interrupted run you need only
the -c option. You can, however, use -f and -p
to modify parameter
settings--they should, of course, come after the -c option
or they will have no effect.
Parameter files have a simple syntax. Comments can begin with
a '#' or a ';' and continue to the end of a line.
The file is a series of "name=value" pairs. If the
last nonwhitespace, noncomment character on a line is a backslash
then the next nonblank line is considered a continuation of the
line. Blank lines and lines that contain only comments are completely
ignored, even in the middle of a continuation. For instance:
sample = here is \
a single parameter that is continued \ # this comment is followed by a blank line
# this comment is preceded by a blank line right through a comment onto this line.
Whitespace (spaces and tabs) are ignored before the name, on either side of the equals sign, and after the value. The following are all equivalent:
max_generations = 100 max_generations= 100 ; here is a comment max_generations=100 # this is a comment too.
Both the name and the value of a parameter are just strings.
In many cases, the names have been chosen to convey the impression
of a hierarchical structure, but as far as lil-gp is concerned,
all parameter names are plain strings and no interpretation is
applied. So, for example, the parameter names breed[1].rate,
breed[01].rate, and breed[1.0].rate are all different.
"Preprocessor" Directives
Parameter files also support a simple set of C-preprocessor-like
directives to allow parts of the file to be processed optionally.
They all begin with a "%" character, which must occur
on the first column of the line. They are:
%define symbol
Defines the given symbol. Equivalent to the -d
command line option.
%undefine symbol
Undefines the given symbol. Equivalent to the -u
command line option.
%ifdef symbol
If the given symbol is defined, has no effect. Otherwise skips
all lines up until the next %ifdef, %ifndef, %endif,
or end-of-file.
%ifndef symbol
Same as %ifdef, but reverses the sense of the
test.
%endif
Cancels a previous %ifdef or %ifndef.
Symbols are case-sensitive. Leading and trailing whitespace are ignored, but internal whitespace is allowed.1 It is important to remember that these may look like C preprocessor directives, but they don't work the same. In particular, you can't nest ifs and there are no Boolean operations (i.e. %ifdef FIRST && SECOND is legal, but it tests a single symbol named "FIRST && SECOND"). Also, %define does not assign a value to the symbol, and no text substitution occurs. This is merely meant as a simple mechanism for optionally setting groups of parameters, without having to fiddle with lots of different parameter files.
The idea is that you can keep multiple sets of parameters in one parameter file, and switch among them using directives on the command line. You can have a parameter file that looks like:
<some common parameters> %ifdef FIRST <one set of parameters> %endif %ifdef SECOND <another set of parameters> %endif
You can then use one set or the other with a command line like:
gp -d FIRST -f parameter.file
to select the first set.
lil-gp produces a number of output files, with statistics on tree size and fitness for each generation. The filenames are produced by appending a three-character extension to the value of the output.basename parameter. They are:
.sys | general information about the run |
.gen | statistics on tree size and depth |
.prg | statistics on fitness and hits |
.bst | information about the current best-of-run individual(s) |
.his | history of the .bst file |
.stt | condensed version of all statistics |
the .bst is rewritten every generation, and all the other files are flushed to disk at the completion of every generation. This lets you see current information on backgrounded runs.
Each message printed to an output file has an integer priority associated with it, ranging from 0 to 100. A given message is produced only if its priority is less than or equal to the setting of the output.detail parameter. You can experiment to find a level of output detail that you like.
The .gen and .prg files can grow very large, and are basically just a human-readable form of the information in the .stt file. This information is not written to them unless you really ask for it (by setting the output.detail parameter to at least 90)
The .stt file has one line per subpopulation per generation, each line consisting of 20 space- separated numbers. The parameter output.stt_interval can be set to modify how often the .stt file is written to. The default is 1, meaning every generation. For multiple-population problems, a line is written for each subpopulation, plus one more line for the population as a whole. For single-population problems, only one line (the overall population line) is written, since the overall population is just the single subpop. The meanings of each column are listed in Table 4.
The aux directory in the distribution contains a short Perl script called splitstt. This will split an .stt file into separate files for each subpopulation. When invoked as
% splitstt myfile.stt
it will produce files named myfile.stt.pop0, myfile.stt.pop1,
etc.
column |
contents |
---|---|
1 2 |
generation number
subpopulation number (0 indicates the overall population) |
3 4 5 6 7 8 9 10 11 |
mean standardized fitness of generation
standardized fitness of best-of-generation individual standardized fitness of worst-of-generation individual mean tree size of generation mean tree depth of generation tree size of best-of-generation individual tree depth of best-of-generation individual tree size of worst-of-generation individual tree depth of worst-of-generation individual |
12 13 14 15 16 17 18 19 20 |
mean standardized fitness of run
standardized fitness of best-of-run individual standardized fitness of worst-of-run individual mean tree size of run mean tree depth of run tree size of best-of-run individual tree depth of best-of-run individual tree size of worst-of-run individual tree depth of worst-of-run individual |
Table 4.1: Columns of the .stt output file.