262 lines
9.9 KiB
TeX
262 lines
9.9 KiB
TeX
|
\documentclass[]{report}
|
||
|
|
||
|
\usepackage{graphicx}
|
||
|
\usepackage{float}
|
||
|
\usepackage{enumitem}
|
||
|
\usepackage{tabularx}
|
||
|
\usepackage{hyperref}
|
||
|
\usepackage[normalem]{ulem}
|
||
|
\usepackage{listings}
|
||
|
\usepackage[most]{tcolorbox}
|
||
|
|
||
|
\definecolor{commentsColor}{rgb}{0.497495, 0.497587, 0.497464}
|
||
|
\definecolor{keywordsColor}{rgb}{0.000000, 0.000000, 0.635294}
|
||
|
\definecolor{stringColor}{rgb}{0.558215, 0.000000, 0.135316}
|
||
|
|
||
|
\lstset{
|
||
|
columns=flexible,
|
||
|
breaklines=true,
|
||
|
backgroundcolor=\color{white}, % choose the background color
|
||
|
basicstyle=\small\ttfamily, % the size of the fonts that are used for the code
|
||
|
breakatwhitespace=false, % sets if automatic breaks should only happen at whitespace
|
||
|
breaklines=true, % sets automatic line breaking
|
||
|
captionpos=b, % sets the caption-position to bottom
|
||
|
commentstyle=\color{commentsColor}\textit, % comment style
|
||
|
deletekeywords={}, % if you want to delete keywords from the given language
|
||
|
escapeinside={\%*}{*)}, % if you want to add LaTeX within your code
|
||
|
extendedchars=true, % lets you use non-ASCII characters; for 8-bits encodings only, does not work with UTF-8
|
||
|
%frame=tb, % adds a frame around the code
|
||
|
keepspaces=true, % keeps spaces in text, useful for keeping indentation of code (possibly needs columns=flexible)
|
||
|
keywordstyle=\color{keywordsColor}\bfseries, % keyword style
|
||
|
language=C++, % the language of the code (can be overrided per snippet)
|
||
|
otherkeywords={rank\_t, customerID\_t, distance\_t, fitness\_t}, % if you want to add more keywords to the set
|
||
|
numbers=left, % where to put the line-numbers; possible values are (none, left, right)
|
||
|
numbersep=5pt, % how far the line-numbers are from the code
|
||
|
numberstyle=\tiny\color{commentsColor}, % the style that is used for the line-numbers
|
||
|
rulecolor=\color{black}, % if not set, the frame-color may be changed on line-breaks within not-black text (e.g. comments (green here))
|
||
|
showspaces=false, % show spaces everywhere adding particular underscores; it overrides 'showstringspaces'
|
||
|
keepspaces=true,
|
||
|
showstringspaces=false, % underline spaces within strings only
|
||
|
showtabs=false, % show tabs within strings adding particular underscores
|
||
|
stepnumber=1, % the step between two line-numbers. If it's 1, each line will be numbered
|
||
|
stringstyle=\color{stringColor}, % string literal style
|
||
|
tabsize=4, % sets default tabsize to 2 spaces
|
||
|
title=\lstname, % show the filename of files included with \lstinputlisting; also try caption instead of title
|
||
|
columns=fixed % Using fixed column width (for e.g. nice alignment)
|
||
|
}
|
||
|
\lstMakeShortInline|
|
||
|
|
||
|
\newtcolorbox{answerbox}[2][]{%
|
||
|
attach boxed title to top center
|
||
|
= {yshift=-8pt},
|
||
|
colback = black!5!white,
|
||
|
colframe = black!75!black,
|
||
|
fonttitle = \bfseries,
|
||
|
colbacktitle = gray!85!black,
|
||
|
title = #2,#1,
|
||
|
enhanced,
|
||
|
}
|
||
|
|
||
|
\lstset{
|
||
|
basicstyle=\small\ttfamily,
|
||
|
columns=flexible,
|
||
|
breaklines=true
|
||
|
}
|
||
|
|
||
|
\usepackage{float}
|
||
|
|
||
|
\renewcommand{\thesection}{\arabic{section}}
|
||
|
|
||
|
% Title Page
|
||
|
\title{\textbf{COSC 4P82 Assignment 1}}
|
||
|
\author{\textbf{Brett Terpstra}\\
|
||
|
bt19ex@brocku.ca - 692021}
|
||
|
|
||
|
\begin{document}
|
||
|
\maketitle
|
||
|
\tableofcontents
|
||
|
|
||
|
\section{Introduction}
|
||
|
|
||
|
\section{Symbolic regression}
|
||
|
\subsection{Introduction}
|
||
|
\subsection{Parameter Table}
|
||
|
\begin{center}
|
||
|
\begin{tabularx}{0.8\textwidth}{ | >{\centering\arraybackslash}X | >{\centering\arraybackslash}X | }
|
||
|
\hline
|
||
|
Parameter & Value \\ [0.25ex]
|
||
|
\hline\hline
|
||
|
Runs & 10 \\
|
||
|
\hline
|
||
|
Population Size & 5000 \\
|
||
|
\hline
|
||
|
Generations & 50 \\
|
||
|
\hline
|
||
|
Training Set & N/A \\
|
||
|
\hline
|
||
|
Testing Set & N/A \\
|
||
|
\hline
|
||
|
Crossover Operator & Subtree Crossover\\
|
||
|
\hline
|
||
|
Mutation Operator & Grow Tree, Max Depth 4 \\
|
||
|
\hline
|
||
|
Crossover Rate & 0.9 or 1.0* \\
|
||
|
\hline
|
||
|
Mutation Rate & 0.1 or 1.0* \\
|
||
|
\hline
|
||
|
Elitism & Best 2 or 0 individuals Survive* \\
|
||
|
\hline
|
||
|
Selection & Fitness Proportionate \\
|
||
|
\hline
|
||
|
Function Set & *, /, +, -, exp, log, sin, cos \\
|
||
|
\hline
|
||
|
Terminal Set & X, Ephemeral Value \\
|
||
|
\hline
|
||
|
Tree Initialization & Half and Half, Max Depth 2-6 \\
|
||
|
\hline
|
||
|
Max Tree Depth & 17 \\
|
||
|
\hline
|
||
|
Raw Fitness & See Fitness Evaluation \\
|
||
|
\hline
|
||
|
Standardized Fitness & = Raw Fitness \\
|
||
|
\hline
|
||
|
\end{tabularx}
|
||
|
\end{center}
|
||
|
*4 Tests were run, 0.9 crossover, 0.9 mutation with 0 elitism and 2 elitism, and 1.0 crossover, 1.0 mutation with 0 elitism and 2 elitism.
|
||
|
\subsection{Fitness Evaluation}
|
||
|
Fitness is evaluated by taking the absolute value of the predicted y value minus the actual y value.
|
||
|
If the difference is less than a user provided (default 1.e15) value cutoff it is added to the fitness value. If the difference value is less than the float epsilon value (\~= 0) the number of hits is incremented. Lower fitness values are preferred.
|
||
|
\subsection{Fitness Plots}
|
||
|
\begin{figure}[H]
|
||
|
\centering
|
||
|
\includegraphics[width=1.0\linewidth]{fp5}
|
||
|
\caption{2 Elites, 10 Runs Averaged}
|
||
|
\label{fig:fp4}
|
||
|
\end{figure}
|
||
|
\begin{figure}[H]
|
||
|
\centering
|
||
|
\includegraphics[width=1.0\linewidth]{fp3}
|
||
|
\caption[]{0 Elites, 10 Runs Averaged}
|
||
|
\label{fig:fp2}
|
||
|
\end{figure}
|
||
|
\subsection{Analysis and Conclusion}
|
||
|
The best average fitness of all the tests was 0.19384 using 0.9 crossover and 0.1 mutation.
|
||
|
|
||
|
\section{Rice Classification}
|
||
|
\subsection{Introduction}
|
||
|
\subsection{Parameter Table}
|
||
|
\begin{center}
|
||
|
\begin{tabularx}{0.8\textwidth}{ | >{\centering\arraybackslash}X | >{\centering\arraybackslash}X | }
|
||
|
\hline
|
||
|
Parameter & Value \\ [0.25ex]
|
||
|
\hline\hline
|
||
|
Runs & 10 \\
|
||
|
\hline
|
||
|
Population Size & 5000 \\
|
||
|
\hline
|
||
|
Generations & 51 \\
|
||
|
\hline
|
||
|
Training Set & Rice Classification (Cammeo and Osmancik) \\
|
||
|
\hline
|
||
|
Testing Set & Rice Classification (Cammeo and Osmancik) \\
|
||
|
\hline
|
||
|
Crossover Operator & Subtree Crossover\\
|
||
|
\hline
|
||
|
Mutation Operator & Grow Tree, Max Depth 4 \\
|
||
|
\hline
|
||
|
Crossover Rate & 0.9 or 0.9* \\
|
||
|
\hline
|
||
|
Mutation Rate & 0.1 or 0.9* \\
|
||
|
\hline
|
||
|
Elitism & Best 2 individuals Survive \\
|
||
|
\hline
|
||
|
Selection & Fitness Proportionate \\
|
||
|
\hline
|
||
|
Function Set & *, /, +, -, exp, log \\
|
||
|
\hline
|
||
|
Terminal Set & area, perimeter, major, minor, eccentricity, convex, extent, Ephemeral Value \\
|
||
|
\hline
|
||
|
Tree Initialization & Half and Half, Max Depth 2-6 \\
|
||
|
\hline
|
||
|
Max Tree Depth & 17 \\
|
||
|
\hline
|
||
|
Raw Fitness & See Fitness Evaluation \\
|
||
|
\hline
|
||
|
Standardized Fitness & = Raw Fitness \\
|
||
|
\hline
|
||
|
\end{tabularx}
|
||
|
\end{center}
|
||
|
\subsection{Fitness Evaluation}
|
||
|
Tested on the input terminal values the GP produces a positive or negative value which is interpreted as either Cammeo (+) or Osmancik (-). Raw fitness is equal to the number of hits which is the number of correct identifications. The adjusted fitness is then calculated and subtracted from 1 in order to invert and produce the required lowest fitness better.
|
||
|
\subsection{Fitness Plots}
|
||
|
\begin{figure}[H]
|
||
|
\centering
|
||
|
\includegraphics[width=1.0\linewidth]{fp6}
|
||
|
\caption{2 Elites, 10 Runs Averaged}
|
||
|
\label{fig:fp6}
|
||
|
\end{figure}
|
||
|
\begin{figure}[H]
|
||
|
\centering
|
||
|
\includegraphics[width=1.0\linewidth]{fp7}
|
||
|
\caption{2 Elites, 10 Runs Averaged}
|
||
|
\label{fig:fp7}
|
||
|
\end{figure}
|
||
|
|
||
|
|
||
|
\subsection{Confusion Matrix}
|
||
|
|
||
|
\begin{figure}[H]
|
||
|
\centering
|
||
|
\includegraphics[width=1.0\linewidth]{fp10}
|
||
|
\caption{0.9 Crossover 0.1 Mutation 2 Elites Best Program Results}
|
||
|
\label{fig:fp10}
|
||
|
\end{figure}
|
||
|
|
||
|
\begin{figure}[H]
|
||
|
\centering
|
||
|
\includegraphics[width=1.0\linewidth]{fp11}
|
||
|
\caption{0.9 Crossover 0.1 Mutation 2 Elites 10 Run Average Results}
|
||
|
\label{fig:fp11}
|
||
|
\end{figure}
|
||
|
|
||
|
|
||
|
|
||
|
\begin{figure}[H]
|
||
|
\centering
|
||
|
\includegraphics[width=1.0\linewidth]{fp12}
|
||
|
\caption{0.9 Crossover 0.9 Mutation 2 Elites Best Program Results}
|
||
|
\label{fig:fp8}
|
||
|
\end{figure}
|
||
|
\begin{figure}[H]
|
||
|
\centering
|
||
|
\includegraphics[width=1.0\linewidth]{fp13}
|
||
|
\caption{0.9 Crossover 0.9 Mutation 2 Elites 10 Run Average Results}
|
||
|
\label{fig:fp9}
|
||
|
\end{figure}
|
||
|
|
||
|
\subsection{Analysis and Conclusion}
|
||
|
The best results found was a correct classification rate of 91.9\%. On average the 0.9 crossover with 0.1 mutation produced the best results with the 0.9/0.9 best result almost being equal.
|
||
|
|
||
|
|
||
|
\section{Compiling / Executing}
|
||
|
This assignment was made for linux using GCC 13.2.0, however any C++17 compliant compiler should work.
|
||
|
The minimum GCC version appears to be 8.5, meaning this assignment can be built on sandcastle.
|
||
|
\begin{lstlisting}
|
||
|
cd your_path_to_this_source/
|
||
|
mkdir build
|
||
|
cd build
|
||
|
cmake ../
|
||
|
make -j 32
|
||
|
\end{lstlisting}
|
||
|
The actual assignment executable is called |Assignment_1| while the automatic run system is called |Assignment_1_RUNNER|. |Assignment_1_RUNNER| has a help menu with options but the defaults will work assuming you run from the build directory and are using part b only. If you want to build for Part A run |cmake -DPART_B=OFF| and run |Assignment_1_RUNNER| with |-b|
|
||
|
|
||
|
\section{Conclusion}
|
||
|
I made a few changes to lilgp, mostly memory fixes along with elitism with a number of individuals instead of a proportion. There appear to be some kind of issue in the GP, of which won't matter as assignment two will likely use my own gp system. I might look into it, but I was not aware there was an issue until compiling the stats here. My results have been generally positive, however, I did notice in the course of collecting data that at some point the Part A results stopped being consistently good however part B results have remained unchanged. Might have happened when I changed my custom random number seeder to not produce div by zero errors during testing. Could be anything. I don't like writing reports and have procrastinated on writing and instead have spent the last couple of weeks messing around with the GP. Fun fact a bunch of additions to my standard lib were made for this assignment. Next time will be better hopefully
|
||
|
|
||
|
\section{References}
|
||
|
Next assignment these will be proper. Latex is being annoying to setup for bib.\\\\
|
||
|
https://archive.ics.uci.edu/dataset/545/rice+cammeo+and+osmancik\\
|
||
|
http://garage.cse.msu.edu/software/lil-gp/
|
||
|
|
||
|
\end{document}
|