COSC_4P82_Assignment_1/report/template_Report.tex

\documentclass[]{report}

\usepackage{graphicx}
\usepackage{float}
\usepackage{enumitem}
\usepackage{tabularx}
\usepackage{hyperref}
\usepackage[normalem]{ulem}
\usepackage{listings}
\usepackage[most]{tcolorbox}

\definecolor{commentsColor}{rgb}{0.497495, 0.497587, 0.497464}
\definecolor{keywordsColor}{rgb}{0.000000, 0.000000, 0.635294}
\definecolor{stringColor}{rgb}{0.558215, 0.000000, 0.135316}

\lstset{
	columns=flexible,
	breaklines=true,
	backgroundcolor=\color{white},   % choose the background color
	basicstyle=\small\ttfamily,        % the size of the fonts that are used for the code
	breakatwhitespace=false,         % sets if automatic breaks should only happen at whitespace
	breaklines=true,                 % sets automatic line breaking
	captionpos=b,                    % sets the caption-position to bottom
	commentstyle=\color{commentsColor}\textit,    % comment style
	deletekeywords={},            % if you want to delete keywords from the given language
	escapeinside={\%*}{*)},          % if you want to add LaTeX within your code
	extendedchars=true,              % lets you use non-ASCII characters; for 8-bits encodings only, does not work with UTF-8
	%frame=tb,	                   	   % adds a frame around the code
	keepspaces=true,                 % keeps spaces in text, useful for keeping indentation of code (possibly needs columns=flexible)
	keywordstyle=\color{keywordsColor}\bfseries,       % keyword style
	language=C++,                 % the language of the code (can be overrided per snippet)
	otherkeywords={rank\_t, customerID\_t, distance\_t, fitness\_t},           % if you want to add more keywords to the set
	numbers=left,                    % where to put the line-numbers; possible values are (none, left, right)
	numbersep=5pt,                   % how far the line-numbers are from the code
	numberstyle=\tiny\color{commentsColor}, % the style that is used for the line-numbers
	rulecolor=\color{black},         % if not set, the frame-color may be changed on line-breaks within not-black text (e.g. comments (green here))
	showspaces=false,                % show spaces everywhere adding particular underscores; it overrides 'showstringspaces'
	keepspaces=true,
	showstringspaces=false,          % underline spaces within strings only
	showtabs=false,                  % show tabs within strings adding particular underscores
	stepnumber=1,                    % the step between two line-numbers. If it's 1, each line will be numbered
	stringstyle=\color{stringColor}, % string literal style
	tabsize=4,	                   % sets default tabsize to 2 spaces
	title=\lstname,                  % show the filename of files included with \lstinputlisting; also try caption instead of title
	columns=fixed                    % Using fixed column width (for e.g. nice alignment)
}
\lstMakeShortInline|

\newtcolorbox{answerbox}[2][]{%
	attach boxed title to top center
	= {yshift=-8pt},
	colback      = black!5!white,
	colframe     = black!75!black,
	fonttitle    = \bfseries,
	colbacktitle = gray!85!black,
	title        = #2,#1,
	enhanced,
}

\lstset{
	basicstyle=\small\ttfamily,
	columns=flexible,
	breaklines=true
}

\usepackage{float}

\renewcommand{\thesection}{\arabic{section}}

% Title Page
\title{\textbf{COSC 4P82 Assignment 1}}
\author{\textbf{Brett Terpstra}\\
	bt19ex@brocku.ca - 692021}

\begin{document}
\maketitle
\tableofcontents

\section{Introduction}

\section{Symbolic regression}
\subsection{Introduction}
\subsection{Parameter Table}
\begin{center}
\begin{tabularx}{0.8\textwidth}{ | >{\centering\arraybackslash}X | >{\centering\arraybackslash}X | }
	\hline
	Parameter & Value \\ [0.25ex]
	\hline\hline
	Runs & 10 \\
	\hline
	Population Size & 5000 \\
	\hline
	Generations & 50 \\
	\hline
	Training Set & N/A \\
	\hline
	Testing Set & N/A \\
	\hline
	Crossover Operator & Subtree Crossover\\
	\hline
	Mutation Operator & Grow Tree, Max Depth 4 \\
	\hline
	Crossover Rate & 0.9 or 1.0* \\
	\hline
	Mutation Rate & 0.1 or 1.0* \\
	\hline
	Elitism & Best 2 or 0 individuals Survive* \\
	\hline
	Selection & Fitness Proportionate \\
	\hline
	Function Set & *, /, +, -, exp, log, sin, cos \\
	\hline
	Terminal Set & X, Ephemeral Value \\
	\hline
	Tree Initialization & Half and Half, Max Depth 2-6 \\
	\hline
	Max Tree Depth & 17 \\
	\hline
	Raw Fitness & See Fitness Evaluation \\
	\hline
	Standardized Fitness & = Raw Fitness \\
	\hline
\end{tabularx}
\end{center}
*4 Tests were run, 0.9 crossover, 0.9 mutation with 0 elitism and 2 elitism, and 1.0 crossover, 1.0 mutation with 0 elitism and 2 elitism.
\subsection{Fitness Evaluation}
Fitness is evaluated by taking the absolute value of the predicted y value minus the actual y value. 
If the difference is less than a user provided (default 1.e15) value cutoff it is added to the fitness value. If the difference value is less than the float epsilon value (\~= 0) the number of hits is incremented. Lower fitness values are preferred.  
\subsection{Fitness Plots}
\begin{figure}[H]
	\centering
	\includegraphics[width=1.0\linewidth]{fp5}
	\caption{2 Elites, 10 Runs Averaged}
	\label{fig:fp4}
\end{figure}
\begin{figure}[H]
	\centering
	\includegraphics[width=1.0\linewidth]{fp3}
	\caption[]{0 Elites, 10 Runs Averaged}
	\label{fig:fp2}
\end{figure}
\subsection{Analysis and Conclusion}
The best average fitness of all the tests was 0.19384 using 0.9 crossover and 0.1 mutation. 

\section{Rice Classification}
\subsection{Introduction}
\subsection{Parameter Table}
\begin{center}
	\begin{tabularx}{0.8\textwidth}{ | >{\centering\arraybackslash}X | >{\centering\arraybackslash}X | }
		\hline
		Parameter & Value \\ [0.25ex]
		\hline\hline
		Runs & 10 \\
		\hline
		Population Size & 5000 \\
		\hline
		Generations & 51 \\
		\hline
		Training Set & Rice Classification (Cammeo and Osmancik) \\
		\hline
		Testing Set & Rice Classification (Cammeo and Osmancik) \\
		\hline
		Crossover Operator & Subtree Crossover\\
		\hline
		Mutation Operator & Grow Tree, Max Depth 4 \\
		\hline
		Crossover Rate & 0.9 or 0.9* \\
		\hline
		Mutation Rate & 0.1 or 0.9* \\
		\hline
		Elitism & Best 2 individuals Survive \\
		\hline
		Selection & Fitness Proportionate \\
		\hline
		Function Set & *, /, +, -, exp, log \\
		\hline
		Terminal Set & area, perimeter, major, minor, eccentricity, convex, extent, Ephemeral Value \\
		\hline
		Tree Initialization & Half and Half, Max Depth 2-6 \\
		\hline
		Max Tree Depth & 17 \\
		\hline
		Raw Fitness & See Fitness Evaluation \\
		\hline
		Standardized Fitness & = Raw Fitness \\
		\hline
	\end{tabularx}
\end{center}
\subsection{Fitness Evaluation}
Tested on the input terminal values the GP produces a positive or negative value which is interpreted as either Cammeo (+) or Osmancik (-). Raw fitness is equal to the number of hits which is the number of correct identifications. The adjusted fitness is then calculated and subtracted from 1 in order to invert and produce the required lowest fitness better.
\subsection{Fitness Plots}
\begin{figure}[H]
	\centering
	\includegraphics[width=1.0\linewidth]{fp6}
	\caption{2 Elites, 10 Runs Averaged}
	\label{fig:fp6}
\end{figure}
\begin{figure}[H]
	\centering
	\includegraphics[width=1.0\linewidth]{fp7}
	\caption{2 Elites, 10 Runs Averaged}
	\label{fig:fp7}
\end{figure}


\subsection{Confusion Matrix}

\begin{figure}[H]
	\centering
	\includegraphics[width=1.0\linewidth]{fp10}
	\caption{0.9 Crossover 0.1 Mutation 2 Elites Best Program Results}
	\label{fig:fp10}
\end{figure}

\begin{figure}[H]
	\centering
	\includegraphics[width=1.0\linewidth]{fp11}
	\caption{0.9 Crossover 0.1 Mutation 2 Elites 10 Run Average Results}
	\label{fig:fp11}
\end{figure}


\begin{figure}[H]
	\centering
	\includegraphics[width=1.0\linewidth]{fp12}
	\caption{0.9 Crossover 0.9 Mutation 2 Elites Best Program Results}
	\label{fig:fp8}
\end{figure}
\begin{figure}[H]
	\centering
	\includegraphics[width=1.0\linewidth]{fp13}
	\caption{0.9 Crossover 0.9 Mutation 2 Elites 10 Run Average Results}
	\label{fig:fp9}
\end{figure}

\subsection{Analysis and Conclusion}
The best results found was a correct classification rate of 91.9\%. On average the 0.9 crossover with 0.1 mutation produced the best results with the 0.9/0.9 best result almost being equal.


\section{Compiling / Executing}
This assignment was made for linux using GCC 13.2.0, however any C++17 compliant compiler should work. 
The minimum GCC version appears to be 8.5, meaning this assignment can be built on sandcastle.
\begin{lstlisting}
	cd your_path_to_this_source/
	mkdir build
	cd build
	cmake ../
	make -j 32
\end{lstlisting}
The actual assignment executable is called |Assignment_1| while the automatic run system is called |Assignment_1_RUNNER|. |Assignment_1_RUNNER| has a help menu with options but the defaults will work assuming you run from the build directory and are using part b only. If you want to build for Part A run |cmake -DPART_B=OFF| and run |Assignment_1_RUNNER| with |-b|

\section{Conclusion}
I made a few changes to lilgp, mostly memory fixes along with elitism with a number of individuals instead of a proportion. There appear to be some kind of issue in the GP, of which won't matter as assignment two will likely use my own gp system. I might look into it, but I was not aware there was an issue until compiling the stats here. My results have been generally positive, however, I did notice in the course of collecting data that at some point the Part A results stopped being consistently good however part B results have remained unchanged. Might have happened when I changed my custom random number seeder to not produce div by zero errors during testing. Could be anything. I don't like writing reports and have procrastinated on writing and instead have spent the last couple of weeks messing around with the GP. Fun fact a bunch of additions to my standard lib were made for this assignment. Next time will be better hopefully

\section{References}
Next assignment these will be proper. Latex is being annoying to setup for bib.\\\\
https://archive.ics.uci.edu/dataset/545/rice+cammeo+and+osmancik\\
http://garage.cse.msu.edu/software/lil-gp/

\end{document}
sexy 2024-02-16 00:05:14 -05:00			`\documentclass[]{report}`

			`\usepackage{graphicx}`
			`\usepackage{float}`
			`\usepackage{enumitem}`
			`\usepackage{tabularx}`
			`\usepackage{hyperref}`
			`\usepackage[normalem]{ulem}`
			`\usepackage{listings}`
			`\usepackage[most]{tcolorbox}`

			`\definecolor{commentsColor}{rgb}{0.497495, 0.497587, 0.497464}`
			`\definecolor{keywordsColor}{rgb}{0.000000, 0.000000, 0.635294}`
			`\definecolor{stringColor}{rgb}{0.558215, 0.000000, 0.135316}`

			`\lstset{`
			`columns=flexible,`
			`breaklines=true,`
			`backgroundcolor=\color{white}, % choose the background color`
			`basicstyle=\small\ttfamily, % the size of the fonts that are used for the code`
			`breakatwhitespace=false, % sets if automatic breaks should only happen at whitespace`
			`breaklines=true, % sets automatic line breaking`
			`captionpos=b, % sets the caption-position to bottom`
			`commentstyle=\color{commentsColor}\textit, % comment style`
			`deletekeywords={}, % if you want to delete keywords from the given language`
			`escapeinside={\%}{)}, % if you want to add LaTeX within your code`
			`extendedchars=true, % lets you use non-ASCII characters; for 8-bits encodings only, does not work with UTF-8`
			`%frame=tb, % adds a frame around the code`
			`keepspaces=true, % keeps spaces in text, useful for keeping indentation of code (possibly needs columns=flexible)`
			`keywordstyle=\color{keywordsColor}\bfseries, % keyword style`
			`language=C++, % the language of the code (can be overrided per snippet)`
			`otherkeywords={rank\_t, customerID\_t, distance\_t, fitness\_t}, % if you want to add more keywords to the set`
			`numbers=left, % where to put the line-numbers; possible values are (none, left, right)`
			`numbersep=5pt, % how far the line-numbers are from the code`
			`numberstyle=\tiny\color{commentsColor}, % the style that is used for the line-numbers`
			`rulecolor=\color{black}, % if not set, the frame-color may be changed on line-breaks within not-black text (e.g. comments (green here))`
			`showspaces=false, % show spaces everywhere adding particular underscores; it overrides 'showstringspaces'`
			`keepspaces=true,`
			`showstringspaces=false, % underline spaces within strings only`
			`showtabs=false, % show tabs within strings adding particular underscores`
			`stepnumber=1, % the step between two line-numbers. If it's 1, each line will be numbered`
			`stringstyle=\color{stringColor}, % string literal style`
			`tabsize=4, % sets default tabsize to 2 spaces`
			`title=\lstname, % show the filename of files included with \lstinputlisting; also try caption instead of title`
			`columns=fixed % Using fixed column width (for e.g. nice alignment)`
			`}`
uwu 2024-02-16 00:36:38 -05:00			`\lstMakeShortInline\|`
sexy 2024-02-16 00:05:14 -05:00
			`\newtcolorbox{answerbox}[2][]{%`
			`attach boxed title to top center`
			`= {yshift=-8pt},`
			`colback = black!5!white,`
			`colframe = black!75!black,`
			`fonttitle = \bfseries,`
			`colbacktitle = gray!85!black,`
			`title = #2,#1,`
			`enhanced,`
			`}`

			`\lstset{`
			`basicstyle=\small\ttfamily,`
			`columns=flexible,`
			`breaklines=true`
			`}`

			`\usepackage{float}`

			`\renewcommand{\thesection}{\arabic{section}}`

			`% Title Page`
			`\title{\textbf{COSC 4P82 Assignment 1}}`
			`\author{\textbf{Brett Terpstra}\\`
			`bt19ex@brocku.ca - 692021}`

			`\begin{document}`
			`\maketitle`
			`\tableofcontents`

			`\section{Introduction}`

			`\section{Symbolic regression}`
			`\subsection{Introduction}`
			`\subsection{Parameter Table}`
			`\begin{center}`
			`\begin{tabularx}{0.8\textwidth}{ \| >{\centering\arraybackslash}X \| >{\centering\arraybackslash}X \| }`
			`\hline`
			`Parameter & Value \\ [0.25ex]`
			`\hline\hline`
			`Runs & 10 \\`
			`\hline`
			`Population Size & 5000 \\`
			`\hline`
			`Generations & 50 \\`
			`\hline`
			`Training Set & N/A \\`
			`\hline`
			`Testing Set & N/A \\`
			`\hline`
			`Crossover Operator & Subtree Crossover\\`
			`\hline`
			`Mutation Operator & Grow Tree, Max Depth 4 \\`
			`\hline`
			`Crossover Rate & 0.9 or 1.0* \\`
			`\hline`
			`Mutation Rate & 0.1 or 1.0* \\`
			`\hline`
			`Elitism & Best 2 or 0 individuals Survive* \\`
			`\hline`
			`Selection & Fitness Proportionate \\`
			`\hline`
			`Function Set & *, /, +, -, exp, log, sin, cos \\`
			`\hline`
			`Terminal Set & X, Ephemeral Value \\`
			`\hline`
			`Tree Initialization & Half and Half, Max Depth 2-6 \\`
			`\hline`
			`Max Tree Depth & 17 \\`
			`\hline`
			`Raw Fitness & See Fitness Evaluation \\`
			`\hline`
			`Standardized Fitness & = Raw Fitness \\`
			`\hline`
			`\end{tabularx}`
			`\end{center}`
			`*4 Tests were run, 0.9 crossover, 0.9 mutation with 0 elitism and 2 elitism, and 1.0 crossover, 1.0 mutation with 0 elitism and 2 elitism.`
			`\subsection{Fitness Evaluation}`
			`Fitness is evaluated by taking the absolute value of the predicted y value minus the actual y value.`
			`If the difference is less than a user provided (default 1.e15) value cutoff it is added to the fitness value. If the difference value is less than the float epsilon value (\~= 0) the number of hits is incremented. Lower fitness values are preferred.`
			`\subsection{Fitness Plots}`
			`\begin{figure}[H]`
			`\centering`
			`\includegraphics[width=1.0\linewidth]{fp5}`
			`\caption{2 Elites, 10 Runs Averaged}`
			`\label{fig:fp4}`
			`\end{figure}`
			`\begin{figure}[H]`
			`\centering`
			`\includegraphics[width=1.0\linewidth]{fp3}`
			`\caption[]{0 Elites, 10 Runs Averaged}`
			`\label{fig:fp2}`
			`\end{figure}`
submit 2024-02-16 10:50:33 -05:00			`\subsection{Analysis and Conclusion}`
sexy 2024-02-16 00:05:14 -05:00			`The best average fitness of all the tests was 0.19384 using 0.9 crossover and 0.1 mutation.`

			`\section{Rice Classification}`
			`\subsection{Introduction}`
			`\subsection{Parameter Table}`
			`\begin{center}`
			`\begin{tabularx}{0.8\textwidth}{ \| >{\centering\arraybackslash}X \| >{\centering\arraybackslash}X \| }`
			`\hline`
			`Parameter & Value \\ [0.25ex]`
			`\hline\hline`
			`Runs & 10 \\`
			`\hline`
			`Population Size & 5000 \\`
			`\hline`
submit 2024-02-16 10:50:33 -05:00			`Generations & 51 \\`
sexy 2024-02-16 00:05:14 -05:00			`\hline`
submit 2024-02-16 10:50:33 -05:00			`Training Set & Rice Classification (Cammeo and Osmancik) \\`
sexy 2024-02-16 00:05:14 -05:00			`\hline`
submit 2024-02-16 10:50:33 -05:00			`Testing Set & Rice Classification (Cammeo and Osmancik) \\`
sexy 2024-02-16 00:05:14 -05:00			`\hline`
			`Crossover Operator & Subtree Crossover\\`
			`\hline`
			`Mutation Operator & Grow Tree, Max Depth 4 \\`
			`\hline`
submit 2024-02-16 10:50:33 -05:00			`Crossover Rate & 0.9 or 0.9* \\`
sexy 2024-02-16 00:05:14 -05:00			`\hline`
submit 2024-02-16 10:50:33 -05:00			`Mutation Rate & 0.1 or 0.9* \\`
sexy 2024-02-16 00:05:14 -05:00			`\hline`
submit 2024-02-16 10:50:33 -05:00			`Elitism & Best 2 individuals Survive \\`
sexy 2024-02-16 00:05:14 -05:00			`\hline`
			`Selection & Fitness Proportionate \\`
			`\hline`
submit 2024-02-16 10:50:33 -05:00			`Function Set & *, /, +, -, exp, log \\`
sexy 2024-02-16 00:05:14 -05:00			`\hline`
submit 2024-02-16 10:50:33 -05:00			`Terminal Set & area, perimeter, major, minor, eccentricity, convex, extent, Ephemeral Value \\`
sexy 2024-02-16 00:05:14 -05:00			`\hline`
			`Tree Initialization & Half and Half, Max Depth 2-6 \\`
			`\hline`
			`Max Tree Depth & 17 \\`
			`\hline`
			`Raw Fitness & See Fitness Evaluation \\`
			`\hline`
			`Standardized Fitness & = Raw Fitness \\`
			`\hline`
			`\end{tabularx}`
			`\end{center}`
submit 2024-02-16 10:50:33 -05:00			`\subsection{Fitness Evaluation}`
			`Tested on the input terminal values the GP produces a positive or negative value which is interpreted as either Cammeo (+) or Osmancik (-). Raw fitness is equal to the number of hits which is the number of correct identifications. The adjusted fitness is then calculated and subtracted from 1 in order to invert and produce the required lowest fitness better.`
			`\subsection{Fitness Plots}`
			`\begin{figure}[H]`
			`\centering`
			`\includegraphics[width=1.0\linewidth]{fp6}`
			`\caption{2 Elites, 10 Runs Averaged}`
			`\label{fig:fp6}`
			`\end{figure}`
			`\begin{figure}[H]`
			`\centering`
			`\includegraphics[width=1.0\linewidth]{fp7}`
			`\caption{2 Elites, 10 Runs Averaged}`
			`\label{fig:fp7}`
			`\end{figure}`


			`\subsection{Confusion Matrix}`

			`\begin{figure}[H]`
			`\centering`
			`\includegraphics[width=1.0\linewidth]{fp10}`
			`\caption{0.9 Crossover 0.1 Mutation 2 Elites Best Program Results}`
			`\label{fig:fp10}`
			`\end{figure}`

			`\begin{figure}[H]`
			`\centering`
			`\includegraphics[width=1.0\linewidth]{fp11}`
			`\caption{0.9 Crossover 0.1 Mutation 2 Elites 10 Run Average Results}`
			`\label{fig:fp11}`
			`\end{figure}`



			`\begin{figure}[H]`
			`\centering`
			`\includegraphics[width=1.0\linewidth]{fp12}`
			`\caption{0.9 Crossover 0.9 Mutation 2 Elites Best Program Results}`
			`\label{fig:fp8}`
			`\end{figure}`
			`\begin{figure}[H]`
			`\centering`
			`\includegraphics[width=1.0\linewidth]{fp13}`
			`\caption{0.9 Crossover 0.9 Mutation 2 Elites 10 Run Average Results}`
			`\label{fig:fp9}`
			`\end{figure}`

			`\subsection{Analysis and Conclusion}`
			`The best results found was a correct classification rate of 91.9\%. On average the 0.9 crossover with 0.1 mutation produced the best results with the 0.9/0.9 best result almost being equal.`

sexy 2024-02-16 00:05:14 -05:00
			`\section{Compiling / Executing}`
uwu 2024-02-16 00:36:38 -05:00			`This assignment was made for linux using GCC 13.2.0, however any C++17 compliant compiler should work.`
			`The minimum GCC version appears to be 8.5, meaning this assignment can be built on sandcastle.`
			`\begin{lstlisting}`
			`cd your_path_to_this_source/`
			`mkdir build`
			`cd build`
			`cmake ../`
			`make -j 32`
			`\end{lstlisting}`
			`The actual assignment executable is called \|Assignment_1\| while the automatic run system is called \|Assignment_1_RUNNER\|. \|Assignment_1_RUNNER\| has a help menu with options but the defaults will work assuming you run from the build directory and are using part b only. If you want to build for Part A run \|cmake -DPART_B=OFF\| and run \|Assignment_1_RUNNER\| with \|-b\|`
sexy 2024-02-16 00:05:14 -05:00
submit 2024-02-16 10:50:33 -05:00			`\section{Conclusion}`
			I made a few changes to lilgp, mostly memory fixes along with elitism with a number of individuals instead of a proportion. There appear to be some kind of issue in the GP, of which won't matter as assignment two will likely use my own gp system. I might look into it, but I was not aware there was an issue until compiling the stats here. My results have been generally positive, however, I did notice in the course of collecting data that at some point the Part A results stopped being consistently good however part B results have remained unchanged. Might have happened when I changed my custom random number seeder to not produce div by zero errors during testing. Could be anything. I don't like writing reports and have procrastinated on writing and instead have spent the last couple of weeks messing around with the GP. Fun fact a bunch of additions to my standard lib were made for this assignment. Next time will be better hopefully

			`\section{References}`
			`Next assignment these will be proper. Latex is being annoying to setup for bib.\\\\`
			`https://archive.ics.uci.edu/dataset/545/rice+cammeo+and+osmancik\\`
			`http://garage.cse.msu.edu/software/lil-gp/`
sexy 2024-02-16 00:05:14 -05:00
			`\end{document}`