lil-gp 1

Chapter 4

     4.1    Invoking lil-gp
          4.1.1     Parameter Files

     4.2    Output Files

Running lil-gp

This chapter describes how to use lil-gp once you have built the executable program, including the various command-line options available.

4.1 Invoking lil-gp

The default executable name produced by the makefile is "gp". Once you have built this, you can run it from the command line. There are six command-line arguments you can use:

-f <parameter_filename>

This loads a parameter file into lil-gp's parameter database. The format of this file is explained in section 4.1.1. You can have multiple -f options.

-p <parametername=value>

This sets the value of an individual parameter. You can have multiple -p options.

-c <checkpoint_filename>

This restarts lil-gp from the specified checkpoint file. You may only have one -c option.

-d <defined_symbol>

This defines the given parameter file symbol. The space between the "d" and the symbol is optional.

-u <parameter_file_directive>

This undefines the given parameter file symbol. The space between the "u" and the symbol is optional.

-q

Causes lil-gp to run in quiet mode, producing nothing on standard output. Without this option, everything written to the .SYS output file is also printed to the terminal.

These options can be specified in any order, but the order is significant. Options that modify the parameter database (all except -q) are processed in the order they appear on the command line. If a parameter is multiply defined, it is the last occurrence of it that takes effect. A listing and explanation of all parameters available is given in chapter 5. Order of directives is important too, so for instance

% gp -dBLAH -f file1 -uBLAH -f file2

will cause the symbol "BLAH" to be defined during the processing of file1 but not file2.

Note that all parameter settings are saved in the checkpoint files, so if you are just restarting an interrupted run you need only the -c option. You can, however, use -f and -p to modify parameter settings--they should, of course, come after the -c option or they will have no effect.

4.1.1 Parameter Files

Parameter files have a simple syntax. Comments can begin with a '#' or a ';' and continue to the end of a line.

The file is a series of "name=value" pairs. If the last nonwhitespace, noncomment character on a line is a backslash then the next nonblank line is considered a continuation of the line. Blank lines and lines that contain only comments are completely ignored, even in the middle of a continuation. For instance:

sample = here is \

     a single parameter that is continued \

# this comment is followed by a blank line

# this comment is preceded by a blank line

   right through a comment onto this line.

Whitespace (spaces and tabs) are ignored before the name, on either side of the equals sign, and after the value. The following are all equivalent:

max_generations  =  100

      max_generations=   100            ;  here  is  a  comment

max_generations=100                     #  this  is  a  comment too.

Both the name and the value of a parameter are just strings. In many cases, the names have been chosen to convey the impression of a hierarchical structure, but as far as lil-gp is concerned, all parameter names are plain strings and no interpretation is applied. So, for example, the parameter names breed[1].rate, breed[01].rate, and breed[1.0].rate are all different.

"Preprocessor" Directives

Parameter files also support a simple set of C-preprocessor-like directives to allow parts of the file to be processed optionally. They all begin with a "%" character, which must occur on the first column of the line. They are:

%define symbol

Defines the given symbol. Equivalent to the -d command line option.

%undefine symbol

Undefines the given symbol. Equivalent to the -u command line option.

%ifdef symbol

If the given symbol is defined, has no effect. Otherwise skips all lines up until the next %ifdef, %ifndef, %endif, or end-of-file.

%ifndef symbol

Same as %ifdef, but reverses the sense of the test.

%endif

Cancels a previous %ifdef or %ifndef.

Symbols are case-sensitive. Leading and trailing whitespace are ignored, but internal whitespace is allowed.¹ It is important to remember that these may look like C preprocessor directives, but they don't work the same. In particular, you can't nest ifs and there are no Boolean operations (i.e. %ifdef FIRST && SECOND is legal, but it tests a single symbol named "FIRST && SECOND"). Also, %define does not assign a value to the symbol, and no text substitution occurs. This is merely meant as a simple mechanism for optionally setting groups of parameters, without having to fiddle with lots of different parameter files.

The idea is that you can keep multiple sets of parameters in one parameter file, and switch among them using directives on the command line. You can have a parameter file that looks like:

<some common parameters>

%ifdef FIRST

<one set of parameters>

%endif

%ifdef SECOND

<another set of parameters>

%endif

You can then use one set or the other with a command line like:

gp -d FIRST -f parameter.file

to select the first set.

4.2 Output Files

lil-gp produces a number of output files, with statistics on tree size and fitness for each generation. The filenames are produced by appending a three-character extension to the value of the output.basename parameter. They are:

.sys	general information about the run
.gen	statistics on tree size and depth
.prg	statistics on fitness and hits
.bst	information about the current best-of-run individual(s)
.his	history of the .bst file
.stt	condensed version of all statistics

the .bst is rewritten every generation, and all the other files are flushed to disk at the completion of every generation. This lets you see current information on backgrounded runs.

Each message printed to an output file has an integer priority associated with it, ranging from 0 to 100. A given message is produced only if its priority is less than or equal to the setting of the output.detail parameter. You can experiment to find a level of output detail that you like.

The .gen and .prg files can grow very large, and are basically just a human-readable form of the information in the .stt file. This information is not written to them unless you really ask for it (by setting the output.detail parameter to at least 90)

The .stt file has one line per subpopulation per generation, each line consisting of 20 space- separated numbers. The parameter output.stt_interval can be set to modify how often the .stt file is written to. The default is 1, meaning every generation. For multiple-population problems, a line is written for each subpopulation, plus one more line for the population as a whole. For single-population problems, only one line (the overall population line) is written, since the overall population is just the single subpop. The meanings of each column are listed in Table 4.

The aux directory in the distribution contains a short Perl script called splitstt. This will split an .stt file into separate files for each subpopulation. When invoked as

% splitstt myfile.stt

it will produce files named myfile.stt.pop0, myfile.stt.pop1, etc.

column	contents
1 2	generation number subpopulation number (0 indicates the overall population)
3 4 5 6 7 8 9 10 11	mean standardized fitness of generation standardized fitness of best-of-generation individual standardized fitness of worst-of-generation individual mean tree size of generation mean tree depth of generation tree size of best-of-generation individual tree depth of best-of-generation individual tree size of worst-of-generation individual tree depth of worst-of-generation individual
12 13 14 15 16 17 18 19 20	mean standardized fitness of run standardized fitness of best-of-run individual standardized fitness of worst-of-run individual mean tree size of run mean tree depth of run tree size of best-of-run individual tree depth of best-of-run individual tree size of worst-of-run individual tree depth of worst-of-run individual

column

contents

generation number

subpopulation number (0 indicates the overall population)

mean standardized fitness of generation

standardized fitness of best-of-generation individual

standardized fitness of worst-of-generation individual

mean tree size of generation

mean tree depth of generation

tree size of best-of-generation individual

tree depth of best-of-generation individual

tree size of worst-of-generation individual

tree depth of worst-of-generation individual

mean standardized fitness of run

standardized fitness of best-of-run individual

standardized fitness of worst-of-run individual

mean tree size of run

mean tree depth of run

tree size of best-of-run individual

tree depth of best-of-run individual

tree size of worst-of-run individual

tree depth of worst-of-run individual

Table 4.1: Columns of the .stt output file.