COSC_3P95_Assignment_1/template_Report.tex

\documentclass[]{report}


\usepackage{graphicx}
\usepackage{float}
\usepackage{enumitem}
\usepackage{hyperref}
\usepackage{xcolor}
\usepackage[normalem]{ulem}
\usepackage{listings}

\definecolor{commentsColor}{rgb}{0.497495, 0.497587, 0.497464}
\definecolor{keywordsColor}{rgb}{0.000000, 0.000000, 0.635294}
\definecolor{stringColor}{rgb}{0.558215, 0.000000, 0.135316}

\lstset{
	columns=flexible,
	breaklines=true,
	backgroundcolor=\color{white},   % choose the background color
	basicstyle=\small\ttfamily,        % the size of the fonts that are used for the code
	breakatwhitespace=false,         % sets if automatic breaks should only happen at whitespace
	breaklines=true,                 % sets automatic line breaking
	captionpos=b,                    % sets the caption-position to bottom
	commentstyle=\color{commentsColor}\textit,    % comment style
	deletekeywords={...},            % if you want to delete keywords from the given language
	escapeinside={\%*}{*)},          % if you want to add LaTeX within your code
	extendedchars=true,              % lets you use non-ASCII characters; for 8-bits encodings only, does not work with UTF-8
	frame=tb,	                   	   % adds a frame around the code
	keepspaces=true,                 % keeps spaces in text, useful for keeping indentation of code (possibly needs columns=flexible)
	keywordstyle=\color{keywordsColor}\bfseries,       % keyword style
	language=C++,                 % the language of the code (can be overrided per snippet)
	otherkeywords={*,...},           % if you want to add more keywords to the set
	numbers=left,                    % where to put the line-numbers; possible values are (none, left, right)
	numbersep=5pt,                   % how far the line-numbers are from the code
	numberstyle=\tiny\color{commentsColor}, % the style that is used for the line-numbers
	rulecolor=\color{black},         % if not set, the frame-color may be changed on line-breaks within not-black text (e.g. comments (green here))
	showspaces=false,                % show spaces everywhere adding particular underscores; it overrides 'showstringspaces'
	keepspaces=true,
	showstringspaces=false,          % underline spaces within strings only
	showtabs=false,                  % show tabs within strings adding particular underscores
	stepnumber=1,                    % the step between two line-numbers. If it's 1, each line will be numbered
	stringstyle=\color{stringColor}, % string literal style
	tabsize=2,	                   % sets default tabsize to 2 spaces
	title=\lstname,                  % show the filename of files included with \lstinputlisting; also try caption instead of title
	columns=fixed                    % Using fixed column width (for e.g. nice alignment)
}
\lstMakeShortInline|


\usepackage{float}

\renewcommand{\thesection}{\arabic{section}}
\renewcommand{\lstlistingname}{Algorithm}% Listing -> Algorithm
\renewcommand{\lstlistlistingname}{List of \lstlistingname s}% List of Listings -> List of Algorithms

% Title Page
\title{\textbf{COSC 3P95 Assignment 1}}
\author{\textbf{Brett Terpstra}\\
	bt19ex@brocku.ca - 692021}


\begin{document}
\maketitle
\tableofcontents

\section{Question 1}
Explain the difference between "sound" and "complete" analysis in software analysis. Then,
define what true positive, true negative, false positive, and false negative mean. How would
these terms change if the goal of the analysis changes, particularly when "positive" means
finding a bug, and then when "positive" means not finding a bug.\\
\noindent\rule{\textwidth}{1pt}
\subsection{Soundness vs Completeness}
\subsubsection{Soundness}
Soundness is the ability for the analysis to not report false positives. So having a high soundness means that you can be sure that the bugs that were found are actually bugs in the program.
\subsubsection{Completeness}
Completeness refers to the ability to find all the bugs in the software by an analysis. So if you can be sure that your analysis is complete, you can be sure that if all bugs are fixed there will be no more bugs. (unless fixing it adds more bugs.) Of course in pratice it's nearly impossible for any tool to be fully complete.
\subsection{Positive = Find bug}
\subsubsection{True Positive}
When the program reports a bug and the bug actually exists
\subsubsection{True Negative}
When a program doesn't report a bug and there is no bug to be found.
\subsubsection{False Positive}
When a bug is said to be found but none actually exists.
\subsubsection{False Negative}
When a bug isn't found but does actually exist. That is to say, the program fails to report the bug.
\subsection{Positive = Not find bug}
You literally invert the logic. Why would you ever do this? Take the logical not of each sentence and that's the answer. This results in confusing statements and shouldn't be used in practice:
\subsubsection{True Positive}
When the program does not report a bug, there is not a bug present.
\subsubsection{True Negative}
When a program reports a bug, there is a bug to be found.
\subsubsection{False Positive}
When a bug is not found but one actually exists.
\subsubsection{False Negative}
When a bug is found, but does doesn't actually exist.

\section{Question 2}
\subsection{A}
Using your preferred programming language, implement a random test case generator for a
sorting algorithm program that sorts integers in ascending order. The test case generator
should be designed to produce arrays of integers with random lengths, and values for each
sorting method.\\
\noindent\rule{\textwidth}{1pt}
\subsubsection{Source}
Please find the source code in the ZIP attachment for the submission of this assignment. Compiling instructions are found within the source file. Any C++20 compiler with support for |.contains| and |std::find\_if()| will work.
\subsubsection{Explanation}
Since we are only testing random int32s of random sizes I took a very simplistic approach. For our convenience there are two global config variables you can change:
\begin{itemize}
	\item |DEFAULT_TEST_COUNT|: Number of tests to run, defaults 10, and can be provided at runtime via the optional command line argument:\\ |./main.run [TEST\_COUNT]|
	\item |MAX_RAM_USAGE|: The Max number of bytes a vector used in the test can consume, this is a rough estimate used to put an upper bound on the number of integers in the test. Each test invocation vector is independent and memory is cleared between cycles. This was not designed with buffer reuse or any amount of efficiency so keep that in mind when setting this value too high. The default of 1MiB is fine for this assignment (fails about 15\% of the time).
\end{itemize}
To cover all possible test cases I generate values in the range |[INTEGER_MIN, INTEGER_MAX]| which populate a vector of size |[0, MAX_SIZE]|
\newpage
\begin{lstlisting}[caption={Test case generator code}]
	std::vector<std::int32_t> generateRandomData() {
		// setup random numbers
		static std::random_device dev;
		static std::mt19937_64 engine(dev());
		// we will generate numbers from integer min to integer max to cover the full possible test cases
		std::uniform_int_distribution dist(std::numeric_limits<std::int32_t>::min(), std::numeric_limits<std::int32_t>::max());
		// put a upper limit on the size of the array
		std::uniform_int_distribution dist_size(0ul, maxSize);
		// pick a random size
		size_t size = dist_size(engine);
		// populate the array
		std::vector<std::int32_t> ret;
		for (size_t i = 0; i < size; i++)
			ret.push_back(dist(engine));
		return ret;
	}
\end{lstlisting}
To make sure there are demonstrable bugs, the sorting code intentionally creates errors within the sorted array. To actually sort the values I used the C++ standard algorithm |std::sort| which won't fail. I did this because the standard sort is \underline{\textit{fast}} and creating a sorting algorithm which fails some of the time but not others is easier said than done.
\begin{lstlisting}[caption={Sort algorithm fail generator}]
	[...]
	std::uniform_int_distribution<int> dist(0, (int) v.size() - 1);
	std::uniform_real_distribution realDist(0.0, 1.0);
	int p1 = dist(engine);
	int p2 = dist(engine);
	if (!v.empty() && realDist(engine) > 0.85)
		v[p1] = v[p2];
	[...]
\end{lstlisting}
This method of creating a failure also has the benefit of sometimes not failing, so it could execute, but the array may still be valid. It also shows that out of order pairs which are not neighbours will cause a failure condition that is detected. It should be noted that the validation will only report the first instance of an out of order number. This was done purely for performance. The report is generated as a tuple |{i, j}| where |i| is the failed number index and |j| is the first value index which causes the failure.
\subsubsection{Sample Output}
I took the liberty of automatically formatting the output because I like when things look pretty :3
\begin{figure}[H]
	\centering
	\includegraphics[width=1.0\linewidth]{screenshot001}
	\caption{Sample output of the random test generator.}
	\label{fig:screenshot001}
\end{figure}
\subsection{B}
Provide a context-free grammar to generate all the possible test-cases.\\
\noindent\rule{\textwidth}{1pt}
\begin{center}
	\begin{enumerate}[label=(\arabic*)]
		\item $<$Array$>$ $\rightarrow$ "\{" $<$Array\_Expression$>$ "\}" $\vert$ "\{\}"
		\item $<$Array\_Expression$>$ $\rightarrow$ $<$Integral\_t$>$ $\vert$ $<$Integral\_t$>$ "," $<$Array\_Expression$>$
		\item $<$Integral\_t$>$ $\rightarrow$ $<$Value$>$ $\vert$ -$<$Value$>$
		\item $<$Value$>$ $\rightarrow$ $<$Integer$>$ $\vert$  $<$Value$>$$<$Integer$>$
		\item $<$Integer$>$ $\rightarrow$ "0" $\vert$ "1" $\vert$ "2" $\vert$ ... $\vert$ "9"
	\end{enumerate}
\end{center}
\newpage
\subsubsection{(2.B) Explanation}
I tried to be as explicit as possible based on my understanding of formal context free grammars (very limited, mostly with regard to simple compilers). I'm sure it could be simplified down, but the derivation of the grammar should generate a valid C++ initializer list.\\
My reasoning on each:
\begin{enumerate}[label=(\arabic*)]
	\item Construct the actual array object itself. Can have any number of values in it or be an empty array.
	\item The contents of an array can either be a single integral type (termination / base case) or an integral type with more array content after it, which recursively expands to any number of integers.
	\item Needed this to make sure the negative sign is placed first and only once. Alternative would be: $<$Value$>$ $\rightarrow$ -$<$Value$>$ $\vert$ $<$Integer$>$ $\vert$  $<$Value$>$$<$Integer$>$, but that could (in theory) result in |-------1| which is why I included the integral\_t step.
	\item Integers can be one of 10 characters so we recursively concat any number of $<$Integer$>$ together to create our value. The examples in the slides don't do this, but it would not make sense without this step.
	\item The base characters for creating integer values.
\end{enumerate}
\section{Question 3}
\subsection{A}
\begin{figure}[H]
	\centering
	\includegraphics[width=0.80\linewidth]{Q3A1COSC3P95.drawio.png}
	\caption{Direct translation of the code, logic errors and all.}
	\label{fig:Q3A1COSC3P95}
\end{figure}

\subsection{B}
Explain and provide detailed steps for “random testing” the above code. No need to run any
code, just present the coding strategy or describe your testing method in detail.\\
\noindent\rule{\textwidth}{1pt}
I don't like the lack of static types in python. So, to remedy this I will assume items are numbers which can be trivially concatenated with strings. First start by generating a list of N random integers where N is also selected randomly from the whole numbers set. Second, pick a random subset of those integers, these will be your exception types. Randomly select an integer to act as the limit. Run the function and check the output to make sure it is what is expected. Repeat this until the number of tests you want have completed.
\section{Question 4}
\subsection{A}
Develop 4 distinct test cases to test the above code, with code coverage ranging from 30%
to 100\%. For each test-case calculate and mention its code coverage.\\
\noindent\rule{\textwidth}{1pt}
Using statement coverage since it was not specified which type of code coverage to use.
\begin{figure}[H]
	\centering
	\includegraphics[width=0.8\linewidth]{screenshot003}
	\caption{12 Statements in the function}
	\label{fig:screenshot003}
\end{figure}
\subsubsection{Test 1}
To get 30\% code coverage we can input with empty data. This will run statements, 1,2,3, and 12 for a total of 33.3\% coverage. The values of limit and exceptions doesn't matter in this case.
\subsubsection{Test 2}
|data = {55, 12, 66}|\\
|limit = 18|\\
|exceptions = {55}|\\\\
As statements 1,2,3 and 12 all run no matter what we add 4 to the base coverage. Since we have items in data statement 4 and 5 get executed. Since 55 is in exceptions statement 6 also gets executed. value 12 is less than limit so statement 7 and 9 gets executed. Since 66 is above the limit so does statement 8 also runs. 10 and 11 will run for all 3 values resulting in 12/12 coverage or 100\%.
\subsubsection{Test 3}
|data = {12}|\\
|limit = 18|\\
|exceptions = {}|\\\\
\begin{figure}[H]
	\centering
	\includegraphics[width=0.7\linewidth]{screenshot004}
	\caption{Test 3 Graphical Results}
	\label{fig:screenshot004}
\end{figure}
Thus this test produces 10/12 or 83\% code coverage.
\newpage
\subsubsection{Test 4}
|data = {69}|\\
|limit = 420|\\
|exceptions = {69}|\\
\begin{figure}[H]
	\centering
	\includegraphics[width=0.7\linewidth]{screenshot005}
	\caption{Test 4 Graphical Results}
	\label{fig:screenshot005}
\end{figure}
This test produces 66\% code coverage.
\subsection{B}
Generate 6 modified (mutated) versions of the above code.\\
\noindent\rule{\textwidth}{1pt}

Method call mutations not possible due to the lack of external function calls by the given function.
(existing calls not possible to modify due to lack of viable alternatives)
\begin{lstlisting}[language=python,caption={Mutation 1}]
	def filterData1(data, limit, exceptions):
		filtered_data = []
		index = 0
		while index < len(data):
			item = data[index]
			if item in exceptions:
				modified_item = str(item) + "_EXCEPTION"
			elif item > limit:
				modified_item = item + 2	# Arithmetic mutation
			else:
				modified_item = item / limit
			filtered_data.append(modified_item)
			index += 1
		return filtered_data
\end{lstlisting}
\begin{lstlisting}[language=python,caption={Mutation 2}]
	def filterData2(data, limit, exceptions):
		filtered_data = []
		index = 0
		while index < len(data):
			item = data[index]
			if item not in exceptions:      # Boolean mutation
				modified_item = str(item) + "_EXCEPTION"
			elif item > limit:
				modified_item = item * 2
			else:
				modified_item = item / limit
			filtered_data.append(modified_item)
			index += 1
		return filtered_data
\end{lstlisting}
\begin{lstlisting}[language=python,caption={Mutation 3}]
	def filterData3(data, limit, exceptions):
		filtered_data = []
		index = 1       # statement mutation
		while index < len(data):
			item = data[index]
			if item in exceptions:
				modified_item = str(item) + "_EXCEPTION"
			elif item > limit:
				modified_item = item * 2
			else:
				modified_item = item / limit
			filtered_data.append(modified_item)
			index += 1
		return filtered_data
\end{lstlisting}
\begin{lstlisting}[language=python,caption={Mutation 4}]
	def filterData4(data, limit, exceptions):
		filtered_data = []
		index = 0
		while index < len(data):
			item = data[index]
			if item in data:        # Variable mutation
				modified_item = str(item) + "_EXCEPTION"
			elif item > limit:
				modified_item = item * 2
			else:
				modified_item = item / limit
			filtered_data.append(modified_item)
			index += 1
		return filtered_data
\end{lstlisting}
\begin{lstlisting}[language=python,caption={Mutation 5}]
	def filterData5(data, limit, exceptions):
		filtered_data = []
		index = 0
		while index < len(data):
			item = data[index]
			if item in exceptions:
				modified_item = str(item) + "_EXCEPTION"
			elif item > limit:
				modified_item = item * 2
			else:
				modified_item = item / 2        # Statement mutation
			filtered_data.append(modified_item)
			index += 1
		return filtered_data
\end{lstlisting}
\begin{lstlisting}[language=python,caption={Mutation 6}]
	def filterData6(data, limit, exceptions):
		filtered_data = []
		index = 0
		while index < len(data):        # Boolean mutation
			item = data[index]
			if item in exceptions:
				modified_item = str(item) + "_EXCEPTION"
			elif item > limit:
				modified_item = item * 2
			else:
				modified_item = item / limit
			filtered_data.append(modified_item)
			index += 1
		return filtered_data
\end{lstlisting}
\subsubsection{C}
Assess the effectiveness of the test cases from part A by using mutation analysis in
conjunction with the mutated codes from part B. Rank the test-cases and explain your answer.\\
\noindent\rule{\textwidth}{1pt}
I made a script to all \textbf{24} cases for me :/
Ranked results are at the bottom.
\lstinputlisting{results.txt}
The first test case always passes because it's a trivial example. Most mutations will not cause it to fail. Other than that the results stand for themselves.
\subsubsection{D}
Discuss how you would use path, branch, and statement static analysis to evaluate/analyse
the above code.\\
\noindent\rule{\textwidth}{1pt}
\paragraph{Branch Static Analysis}
Easy sweep and mark the code for branches, identifying the test cases in a type of tree (the tree would include code which modifies any variables used in the conditionals otherwise we don't care!) then preorder traverse the tree resulting in every test case being covered, in the correct order of execution. Print a warning to the user if any syntactic or common logical errors are found (since this is static we are not actually executing the code but looking at it for errors). Note: the left subtree would but the true case, the right subtree would be the false case, assuming |else| is present.
\paragraph{Path Static Analysis}
I put this second because you could do the exact same thing as branch analysis but instead of doing preorder, which covers all code paths, you pick a path you want to follow and execute through it. You would also want to record every statement (in order of function call from init/main) instead of just what is relevant to the conditions. As with the branch analysis we don't actually execute the path but rather look at the code for errors. (Clang tidy ftw)
\paragraph{Statement Static Analysis}
Parse through the file line by line looking at all the statements in the code printing errors as we find them. I'm not sure if you are looking for how we would actually detect that, other than syntax errors that question has a complicated answer. Even finding syntax errors can be hard, although you would only have to parse the AST and if you find an unexpected token it's pretty clear the programmer made an error.

\section{Question 5}
The code snippet below aims to switch uppercase characters to their lowercase counterparts
and vice versa. Numeric characters are supposed to remain unchanged. The function contains
at least one known bug that results in incorrect output for specific inputs.\\
\begin{lstlisting}[language=python]
	def processString(input_str):
		output_str = ""
		for char in input_str:
			if char.isupper():
				output_str += char.lower()
			elif char.isnumeric():
				output_str += char * 2
			else:
				output_str += char.upper()
		return output_str
\end{lstlisting}
\subsection{A}
Identify the bug(s) in the code. You can either manually review the code (a form of
static analysis) or run it with diverse input values (a form of manual random testing).
If you are unable to pinpoint the bug using these methods, you may utilize a random
testing tool or implement random test case generator in code. Provide a detailed
explanation of the bug, identify the line of code causing it, and describe your
strategy for finding it.\\
\noindent\rule{\textwidth}{1pt}
\begin{itemize}
	\item Bug on line 7: numbers should remain the same. This code not only is trying to double the value but it won't even do that! |char * 2| in python will add 2 characters of that number to the output string. Found via inspection and by running the program; at first I thought char * 2 would double the integer version of the char but I was wrong. It is a good thing I tested it in python as well.
	\item This is the only bug determined by visual analysis and python tests.
\end{itemize}
\subsection{B}
Implement Delta Debugging, in your preferred programming language to minimize
the input string that reveals the bug. Test your Delta Debugging code for the
following input values provided.
\lstinputlisting[language=python]{delta_debugger.py}
(This file is included in the zip for this assignment. I'm not completely sure why I pasted this here but not the C++. Something Something tired Something Something Python being cleaner than C++)\\
And some test results running on my computer:
\lstinputlisting{delta_debugging_results.txt}
I would've preferred C++ but trying to run the python function from C++ in a cross-platform way isn't something I think you or I want to deal with. This python code is a simple but clean implementation of delta debugging\footnote{Not technically full delta debugging, but I don't have any more time to work on this. Other assignments / research projects / the midterm have to be done.} which finds a set of minimal strings which cause errors within the provided function. This delta debugging implementation uses a binary search method where the input string is recursively split in half until either a minimal error string is found or the string can no longer be split. A string is known to error if the result of the incorrect function does not match a known correct output which is generated at runtime. Since there is no hard way to pre-generate valid and invalid strings we must assume the output of my 'correct' function is in fact valid.\footnote{The function matches the requirements} The actual implementation of this concept uses a while loop with a queue to do the recursion; a queue was used because using a stack is unnecessary. If I had implemented this with C++ I would have used a stack like structure as |std::vector| has a |pop_back| function.

\newpage
\section{Question 6}
I've create a git repo on my semi-private git server, which can be accessed via the link:\\
\href{https://git.tpgc.me/tri11paragon/COSC_3P95_Assignment_1}{https://git.tpgc.me/tri11paragon/COSC\_3P95\_Assignment\_1}\\\\
\subsection{Note}
By proving this repo as a public repository it technically violates the university's academic integrity policies. Specifically:
\begin{itemize}
	\item 4.e. Ensuring that a student’s work is not used inappropriately by others; \begin{itemize}
		\item I am unable to insure the correct use of this repository as it is publicly available.
	\end{itemize}
	\item Appendix 2.C.4. "Allowing one's essay, thesis or assignment to be copied by someone else" \begin{itemize}
		\item As this is required to be a public git repository I cannot prevent someone from copying this assignment.
	\end{itemize}
\end{itemize}
Thus, in an attempt to better comply with university policy and the requirements of this assignment (and avoid the Microsoft monopoly) I have elected to host the assignment on my self-hosted git server.

\end{document}