\documentclass[]{report} \usepackage{graphicx} \usepackage{float} \usepackage{enumitem} \usepackage{tabularx} \usepackage{hyperref} \usepackage[normalem]{ulem} \usepackage{listings} \usepackage[most]{tcolorbox} \definecolor{commentsColor}{rgb}{0.497495, 0.497587, 0.497464} \definecolor{keywordsColor}{rgb}{0.000000, 0.000000, 0.635294} \definecolor{stringColor}{rgb}{0.558215, 0.000000, 0.135316} \lstset{ columns=flexible, breaklines=true, backgroundcolor=\color{white}, % choose the background color basicstyle=\small\ttfamily, % the size of the fonts that are used for the code breakatwhitespace=false, % sets if automatic breaks should only happen at whitespace breaklines=true, % sets automatic line breaking captionpos=b, % sets the caption-position to bottom commentstyle=\color{commentsColor}\textit, % comment style deletekeywords={}, % if you want to delete keywords from the given language escapeinside={\%*}{*)}, % if you want to add LaTeX within your code extendedchars=true, % lets you use non-ASCII characters; for 8-bits encodings only, does not work with UTF-8 %frame=tb, % adds a frame around the code keepspaces=true, % keeps spaces in text, useful for keeping indentation of code (possibly needs columns=flexible) keywordstyle=\color{keywordsColor}\bfseries, % keyword style language=C++, % the language of the code (can be overrided per snippet) otherkeywords={rank\_t, customerID\_t, distance\_t, fitness\_t}, % if you want to add more keywords to the set numbers=left, % where to put the line-numbers; possible values are (none, left, right) numbersep=5pt, % how far the line-numbers are from the code numberstyle=\tiny\color{commentsColor}, % the style that is used for the line-numbers rulecolor=\color{black}, % if not set, the frame-color may be changed on line-breaks within not-black text (e.g. comments (green here)) showspaces=false, % show spaces everywhere adding particular underscores; it overrides 'showstringspaces' keepspaces=true, showstringspaces=false, % underline spaces within strings only showtabs=false, % show tabs within strings adding particular underscores stepnumber=1, % the step between two line-numbers. If it's 1, each line will be numbered stringstyle=\color{stringColor}, % string literal style tabsize=4, % sets default tabsize to 2 spaces title=\lstname, % show the filename of files included with \lstinputlisting; also try caption instead of title columns=fixed % Using fixed column width (for e.g. nice alignment) } \lstMakeShortInline| \newtcolorbox{answerbox}[2][]{% attach boxed title to top center = {yshift=-8pt}, colback = black!5!white, colframe = black!75!black, fonttitle = \bfseries, colbacktitle = gray!85!black, title = #2,#1, enhanced, } \lstset{ basicstyle=\small\ttfamily, columns=flexible, breaklines=true } \usepackage{float} \renewcommand{\thesection}{\arabic{section}} % Title Page \title{\textbf{COSC 4P82 Assignment 1}} \author{\textbf{Brett Terpstra}\\ bt19ex@brocku.ca - 692021} \begin{document} \maketitle \tableofcontents \section{Introduction} \section{Symbolic regression} \subsection{Introduction} \subsection{Parameter Table} \begin{center} \begin{tabularx}{0.8\textwidth}{ | >{\centering\arraybackslash}X | >{\centering\arraybackslash}X | } \hline Parameter & Value \\ [0.25ex] \hline\hline Runs & 10 \\ \hline Population Size & 5000 \\ \hline Generations & 50 \\ \hline Training Set & N/A \\ \hline Testing Set & N/A \\ \hline Crossover Operator & Subtree Crossover\\ \hline Mutation Operator & Grow Tree, Max Depth 4 \\ \hline Crossover Rate & 0.9 or 1.0* \\ \hline Mutation Rate & 0.1 or 1.0* \\ \hline Elitism & Best 2 or 0 individuals Survive* \\ \hline Selection & Fitness Proportionate \\ \hline Function Set & *, /, +, -, exp, log, sin, cos \\ \hline Terminal Set & X, Ephemeral Value \\ \hline Tree Initialization & Half and Half, Max Depth 2-6 \\ \hline Max Tree Depth & 17 \\ \hline Raw Fitness & See Fitness Evaluation \\ \hline Standardized Fitness & = Raw Fitness \\ \hline \end{tabularx} \end{center} *4 Tests were run, 0.9 crossover, 0.9 mutation with 0 elitism and 2 elitism, and 1.0 crossover, 1.0 mutation with 0 elitism and 2 elitism. \subsection{Fitness Evaluation} Fitness is evaluated by taking the absolute value of the predicted y value minus the actual y value. If the difference is less than a user provided (default 1.e15) value cutoff it is added to the fitness value. If the difference value is less than the float epsilon value (\~= 0) the number of hits is incremented. Lower fitness values are preferred. \subsection{Fitness Plots} \begin{figure}[H] \centering \includegraphics[width=1.0\linewidth]{fp5} \caption{2 Elites, 10 Runs Averaged} \label{fig:fp4} \end{figure} \begin{figure}[H] \centering \includegraphics[width=1.0\linewidth]{fp3} \caption[]{0 Elites, 10 Runs Averaged} \label{fig:fp2} \end{figure} \subsection{Analysis and Conclusion} The best average fitness of all the tests was 0.19384 using 0.9 crossover and 0.1 mutation. \section{Rice Classification} \subsection{Introduction} \subsection{Parameter Table} \begin{center} \begin{tabularx}{0.8\textwidth}{ | >{\centering\arraybackslash}X | >{\centering\arraybackslash}X | } \hline Parameter & Value \\ [0.25ex] \hline\hline Runs & 10 \\ \hline Population Size & 5000 \\ \hline Generations & 51 \\ \hline Training Set & Rice Classification (Cammeo and Osmancik) \\ \hline Testing Set & Rice Classification (Cammeo and Osmancik) \\ \hline Crossover Operator & Subtree Crossover\\ \hline Mutation Operator & Grow Tree, Max Depth 4 \\ \hline Crossover Rate & 0.9 or 0.9* \\ \hline Mutation Rate & 0.1 or 0.9* \\ \hline Elitism & Best 2 individuals Survive \\ \hline Selection & Fitness Proportionate \\ \hline Function Set & *, /, +, -, exp, log \\ \hline Terminal Set & area, perimeter, major, minor, eccentricity, convex, extent, Ephemeral Value \\ \hline Tree Initialization & Half and Half, Max Depth 2-6 \\ \hline Max Tree Depth & 17 \\ \hline Raw Fitness & See Fitness Evaluation \\ \hline Standardized Fitness & = Raw Fitness \\ \hline \end{tabularx} \end{center} \subsection{Fitness Evaluation} Tested on the input terminal values the GP produces a positive or negative value which is interpreted as either Cammeo (+) or Osmancik (-). Raw fitness is equal to the number of hits which is the number of correct identifications. The adjusted fitness is then calculated and subtracted from 1 in order to invert and produce the required lowest fitness better. \subsection{Fitness Plots} \begin{figure}[H] \centering \includegraphics[width=1.0\linewidth]{fp6} \caption{2 Elites, 10 Runs Averaged} \label{fig:fp6} \end{figure} \begin{figure}[H] \centering \includegraphics[width=1.0\linewidth]{fp7} \caption{2 Elites, 10 Runs Averaged} \label{fig:fp7} \end{figure} \subsection{Confusion Matrix} \begin{figure}[H] \centering \includegraphics[width=1.0\linewidth]{fp10} \caption{0.9 Crossover 0.1 Mutation 2 Elites Best Program Results} \label{fig:fp10} \end{figure} \begin{figure}[H] \centering \includegraphics[width=1.0\linewidth]{fp11} \caption{0.9 Crossover 0.1 Mutation 2 Elites 10 Run Average Results} \label{fig:fp11} \end{figure} \begin{figure}[H] \centering \includegraphics[width=1.0\linewidth]{fp12} \caption{0.9 Crossover 0.9 Mutation 2 Elites Best Program Results} \label{fig:fp8} \end{figure} \begin{figure}[H] \centering \includegraphics[width=1.0\linewidth]{fp13} \caption{0.9 Crossover 0.9 Mutation 2 Elites 10 Run Average Results} \label{fig:fp9} \end{figure} \subsection{Analysis and Conclusion} The best results found was a correct classification rate of 91.9\%. On average the 0.9 crossover with 0.1 mutation produced the best results with the 0.9/0.9 best result almost being equal. \section{Compiling / Executing} This assignment was made for linux using GCC 13.2.0, however any C++17 compliant compiler should work. The minimum GCC version appears to be 8.5, meaning this assignment can be built on sandcastle. \begin{lstlisting} cd your_path_to_this_source/ mkdir build cd build cmake ../ make -j 32 \end{lstlisting} The actual assignment executable is called |Assignment_1| while the automatic run system is called |Assignment_1_RUNNER|. |Assignment_1_RUNNER| has a help menu with options but the defaults will work assuming you run from the build directory and are using part b only. If you want to build for Part A run |cmake -DPART_B=OFF| and run |Assignment_1_RUNNER| with |-b| \section{Conclusion} I made a few changes to lilgp, mostly memory fixes along with elitism with a number of individuals instead of a proportion. There appear to be some kind of issue in the GP, of which won't matter as assignment two will likely use my own gp system. I might look into it, but I was not aware there was an issue until compiling the stats here. My results have been generally positive, however, I did notice in the course of collecting data that at some point the Part A results stopped being consistently good however part B results have remained unchanged. Might have happened when I changed my custom random number seeder to not produce div by zero errors during testing. Could be anything. I don't like writing reports and have procrastinated on writing and instead have spent the last couple of weeks messing around with the GP. Fun fact a bunch of additions to my standard lib were made for this assignment. Next time will be better hopefully \section{References} Next assignment these will be proper. Latex is being annoying to setup for bib.\\\\ https://archive.ics.uci.edu/dataset/545/rice+cammeo+and+osmancik\\ http://garage.cse.msu.edu/software/lil-gp/ \end{document}