COSC-4P82-Final-Project/report/template_Report.tex

262 lines
9.9 KiB
TeX
Raw Permalink Normal View History

2024-04-01 09:50:07 -04:00
\documentclass[]{report}
\usepackage{graphicx}
\usepackage{float}
\usepackage{enumitem}
\usepackage{tabularx}
\usepackage{hyperref}
\usepackage[normalem]{ulem}
\usepackage{listings}
\usepackage[most]{tcolorbox}
\definecolor{commentsColor}{rgb}{0.497495, 0.497587, 0.497464}
\definecolor{keywordsColor}{rgb}{0.000000, 0.000000, 0.635294}
\definecolor{stringColor}{rgb}{0.558215, 0.000000, 0.135316}
\lstset{
columns=flexible,
breaklines=true,
backgroundcolor=\color{white}, % choose the background color
basicstyle=\small\ttfamily, % the size of the fonts that are used for the code
breakatwhitespace=false, % sets if automatic breaks should only happen at whitespace
breaklines=true, % sets automatic line breaking
captionpos=b, % sets the caption-position to bottom
commentstyle=\color{commentsColor}\textit, % comment style
deletekeywords={}, % if you want to delete keywords from the given language
escapeinside={\%*}{*)}, % if you want to add LaTeX within your code
extendedchars=true, % lets you use non-ASCII characters; for 8-bits encodings only, does not work with UTF-8
%frame=tb, % adds a frame around the code
keepspaces=true, % keeps spaces in text, useful for keeping indentation of code (possibly needs columns=flexible)
keywordstyle=\color{keywordsColor}\bfseries, % keyword style
language=C++, % the language of the code (can be overrided per snippet)
otherkeywords={rank\_t, customerID\_t, distance\_t, fitness\_t}, % if you want to add more keywords to the set
numbers=left, % where to put the line-numbers; possible values are (none, left, right)
numbersep=5pt, % how far the line-numbers are from the code
numberstyle=\tiny\color{commentsColor}, % the style that is used for the line-numbers
rulecolor=\color{black}, % if not set, the frame-color may be changed on line-breaks within not-black text (e.g. comments (green here))
showspaces=false, % show spaces everywhere adding particular underscores; it overrides 'showstringspaces'
keepspaces=true,
showstringspaces=false, % underline spaces within strings only
showtabs=false, % show tabs within strings adding particular underscores
stepnumber=1, % the step between two line-numbers. If it's 1, each line will be numbered
stringstyle=\color{stringColor}, % string literal style
tabsize=4, % sets default tabsize to 2 spaces
title=\lstname, % show the filename of files included with \lstinputlisting; also try caption instead of title
columns=fixed % Using fixed column width (for e.g. nice alignment)
}
\lstMakeShortInline|
\newtcolorbox{answerbox}[2][]{%
attach boxed title to top center
= {yshift=-8pt},
colback = black!5!white,
colframe = black!75!black,
fonttitle = \bfseries,
colbacktitle = gray!85!black,
title = #2,#1,
enhanced,
}
\lstset{
basicstyle=\small\ttfamily,
columns=flexible,
breaklines=true
}
\usepackage{float}
\renewcommand{\thesection}{\arabic{section}}
% Title Page
\title{\textbf{COSC 4P82 Assignment 1}}
\author{\textbf{Brett Terpstra}\\
bt19ex@brocku.ca - 692021}
\begin{document}
\maketitle
\tableofcontents
\section{Introduction}
\section{Symbolic regression}
\subsection{Introduction}
\subsection{Parameter Table}
\begin{center}
\begin{tabularx}{0.8\textwidth}{ | >{\centering\arraybackslash}X | >{\centering\arraybackslash}X | }
\hline
Parameter & Value \\ [0.25ex]
\hline\hline
Runs & 10 \\
\hline
Population Size & 5000 \\
\hline
Generations & 50 \\
\hline
Training Set & N/A \\
\hline
Testing Set & N/A \\
\hline
Crossover Operator & Subtree Crossover\\
\hline
Mutation Operator & Grow Tree, Max Depth 4 \\
\hline
Crossover Rate & 0.9 or 1.0* \\
\hline
Mutation Rate & 0.1 or 1.0* \\
\hline
Elitism & Best 2 or 0 individuals Survive* \\
\hline
Selection & Fitness Proportionate \\
\hline
Function Set & *, /, +, -, exp, log, sin, cos \\
\hline
Terminal Set & X, Ephemeral Value \\
\hline
Tree Initialization & Half and Half, Max Depth 2-6 \\
\hline
Max Tree Depth & 17 \\
\hline
Raw Fitness & See Fitness Evaluation \\
\hline
Standardized Fitness & = Raw Fitness \\
\hline
\end{tabularx}
\end{center}
*4 Tests were run, 0.9 crossover, 0.9 mutation with 0 elitism and 2 elitism, and 1.0 crossover, 1.0 mutation with 0 elitism and 2 elitism.
\subsection{Fitness Evaluation}
Fitness is evaluated by taking the absolute value of the predicted y value minus the actual y value.
If the difference is less than a user provided (default 1.e15) value cutoff it is added to the fitness value. If the difference value is less than the float epsilon value (\~= 0) the number of hits is incremented. Lower fitness values are preferred.
\subsection{Fitness Plots}
\begin{figure}[H]
\centering
\includegraphics[width=1.0\linewidth]{fp5}
\caption{2 Elites, 10 Runs Averaged}
\label{fig:fp4}
\end{figure}
\begin{figure}[H]
\centering
\includegraphics[width=1.0\linewidth]{fp3}
\caption[]{0 Elites, 10 Runs Averaged}
\label{fig:fp2}
\end{figure}
\subsection{Analysis and Conclusion}
The best average fitness of all the tests was 0.19384 using 0.9 crossover and 0.1 mutation.
\section{Rice Classification}
\subsection{Introduction}
\subsection{Parameter Table}
\begin{center}
\begin{tabularx}{0.8\textwidth}{ | >{\centering\arraybackslash}X | >{\centering\arraybackslash}X | }
\hline
Parameter & Value \\ [0.25ex]
\hline\hline
Runs & 10 \\
\hline
Population Size & 5000 \\
\hline
Generations & 51 \\
\hline
Training Set & Rice Classification (Cammeo and Osmancik) \\
\hline
Testing Set & Rice Classification (Cammeo and Osmancik) \\
\hline
Crossover Operator & Subtree Crossover\\
\hline
Mutation Operator & Grow Tree, Max Depth 4 \\
\hline
Crossover Rate & 0.9 or 0.9* \\
\hline
Mutation Rate & 0.1 or 0.9* \\
\hline
Elitism & Best 2 individuals Survive \\
\hline
Selection & Fitness Proportionate \\
\hline
Function Set & *, /, +, -, exp, log \\
\hline
Terminal Set & area, perimeter, major, minor, eccentricity, convex, extent, Ephemeral Value \\
\hline
Tree Initialization & Half and Half, Max Depth 2-6 \\
\hline
Max Tree Depth & 17 \\
\hline
Raw Fitness & See Fitness Evaluation \\
\hline
Standardized Fitness & = Raw Fitness \\
\hline
\end{tabularx}
\end{center}
\subsection{Fitness Evaluation}
Tested on the input terminal values the GP produces a positive or negative value which is interpreted as either Cammeo (+) or Osmancik (-). Raw fitness is equal to the number of hits which is the number of correct identifications. The adjusted fitness is then calculated and subtracted from 1 in order to invert and produce the required lowest fitness better.
\subsection{Fitness Plots}
\begin{figure}[H]
\centering
\includegraphics[width=1.0\linewidth]{fp6}
\caption{2 Elites, 10 Runs Averaged}
\label{fig:fp6}
\end{figure}
\begin{figure}[H]
\centering
\includegraphics[width=1.0\linewidth]{fp7}
\caption{2 Elites, 10 Runs Averaged}
\label{fig:fp7}
\end{figure}
\subsection{Confusion Matrix}
\begin{figure}[H]
\centering
\includegraphics[width=1.0\linewidth]{fp10}
\caption{0.9 Crossover 0.1 Mutation 2 Elites Best Program Results}
\label{fig:fp10}
\end{figure}
\begin{figure}[H]
\centering
\includegraphics[width=1.0\linewidth]{fp11}
\caption{0.9 Crossover 0.1 Mutation 2 Elites 10 Run Average Results}
\label{fig:fp11}
\end{figure}
\begin{figure}[H]
\centering
\includegraphics[width=1.0\linewidth]{fp12}
\caption{0.9 Crossover 0.9 Mutation 2 Elites Best Program Results}
\label{fig:fp8}
\end{figure}
\begin{figure}[H]
\centering
\includegraphics[width=1.0\linewidth]{fp13}
\caption{0.9 Crossover 0.9 Mutation 2 Elites 10 Run Average Results}
\label{fig:fp9}
\end{figure}
\subsection{Analysis and Conclusion}
The best results found was a correct classification rate of 91.9\%. On average the 0.9 crossover with 0.1 mutation produced the best results with the 0.9/0.9 best result almost being equal.
\section{Compiling / Executing}
This assignment was made for linux using GCC 13.2.0, however any C++17 compliant compiler should work.
The minimum GCC version appears to be 8.5, meaning this assignment can be built on sandcastle.
\begin{lstlisting}
cd your_path_to_this_source/
mkdir build
cd build
cmake ../
make -j 32
\end{lstlisting}
The actual assignment executable is called |Assignment_1| while the automatic run system is called |Assignment_1_RUNNER|. |Assignment_1_RUNNER| has a help menu with options but the defaults will work assuming you run from the build directory and are using part b only. If you want to build for Part A run |cmake -DPART_B=OFF| and run |Assignment_1_RUNNER| with |-b|
\section{Conclusion}
I made a few changes to lilgp, mostly memory fixes along with elitism with a number of individuals instead of a proportion. There appear to be some kind of issue in the GP, of which won't matter as assignment two will likely use my own gp system. I might look into it, but I was not aware there was an issue until compiling the stats here. My results have been generally positive, however, I did notice in the course of collecting data that at some point the Part A results stopped being consistently good however part B results have remained unchanged. Might have happened when I changed my custom random number seeder to not produce div by zero errors during testing. Could be anything. I don't like writing reports and have procrastinated on writing and instead have spent the last couple of weeks messing around with the GP. Fun fact a bunch of additions to my standard lib were made for this assignment. Next time will be better hopefully
\section{References}
Next assignment these will be proper. Latex is being annoying to setup for bib.\\\\
https://archive.ics.uci.edu/dataset/545/rice+cammeo+and+osmancik\\
http://garage.cse.msu.edu/software/lil-gp/
\end{document}