From 926433c7c1f726a76f6e6fee57a6be41214afcdb Mon Sep 17 00:00:00 2001
From: Francesco Mecca <me@francescomecca.eu>
Date: Thu, 12 Mar 2020 12:20:07 +0100
Subject: [PATCH] maggiore introduzione

---
 tesi/Makefile         |   2 +-
 tesi/tesi_unicode.org | 171 ++++++++++++++++++++++++++++++++++++++----
 2 files changed, 159 insertions(+), 14 deletions(-)

diff --git a/tesi/Makefile b/tesi/Makefile
index 930f6e1..7e4f052 100644
--- a/tesi/Makefile
+++ b/tesi/Makefile
@@ -9,7 +9,7 @@ tesi:
 	rm -f $(DEL)
 	@echo
 	@echo "=== Building from scratch ==="
-	emacs tesi_unicode.org  -f org-latex-export-to-latex --kill
+	emacs -batch tesi_unicode.org  -f org-latex-export-to-latex --kill
 	python3 conv.py tesi_unicode.tex tesi.tex
 	pdflatex $(SRC)
 	bibtex $(AUX)
diff --git a/tesi/tesi_unicode.org b/tesi/tesi_unicode.org
index f4d024b..5182657 100644
--- a/tesi/tesi_unicode.org
+++ b/tesi/tesi_unicode.org
@@ -38,25 +38,170 @@
 #+LaTeX_HEADER: \usepackage{graphicx} 
 #+LaTeX_HEADER: \usepackage{listings} 
 #+LaTeX_HEADER: \usepackage{color} 
+#+LaTeX_HEADER: \usepackage{stmaryrd}
+#+LaTeX_HEADER: \newcommand{\sem}[1]{{\llbracket{#1}\rrbracket}}
 #+EXPORT_SELECT_TAGS: export
 #+EXPORT_EXCLUDE_TAGS: noexport
 #+OPTIONS: H:2 toc:nil \n:nil @:t ::t |:t ^:{} _:{} *:t TeX:t LaTeX:t
 #+STARTUP: showall
-\begin{abstract}
+\section{Introduction}
+
+This dissertation presents an algorithm for the translation validation of the OCaml pattern
+matching compiler. Given a source program and its compiled version the
+algorithm checks wheter the two are equivalent or produce a counter
+example in case of a mismatch.
+
+Our equivalence algorithm works with decision trees. Source patterns are
+converted into a decision tree using a matrix decomposition algorithm.
+Target programs, described in the Lambda intermediate
+representation language of the OCaml compiler, are turned into decision trees
+by applying symbolic execution.
+
+\subsection{Translation validation}
+A pattern matching compiler turns a series of pattern matching clauses
+into simple control flow structures such as \texttt{if, switch}, for example:
+\begin{lstlisting}
+  match x with
+  | [] -> (0, None)
+  | x::[] -> (1, Some x)
+  | _::y::_ -> (2, Some y)
+\end{lstlisting}
+\begin{lstlisting}
+(if scrutinee
+    (let (field_1 =a (field 1 scrutinee))
+        (if field_1
+            (let
+                (field_1_1 =a (field 1 field_1)
+                 x =a (field 0 field_1))
+                (makeblock 0 2 (makeblock 0 x)))
+            (let (y =a (field 0 scrutinee))
+                (makeblock 0 1 (makeblock 0 y)))))
+    [0: 0 0a])
+\end{lstlisting}
+%% TODO: side by side
+The code in the right is in the Lambda intermediate representation of
+the OCaml compiler. The Lambda representation of a program is shown by
+calling the \texttt{ocamlc} compiler with \texttt{-drawlambda} flag.
+
+The OCaml pattern matching compiler is a critical part of the OCaml compiler
+in terms of correctness because any bug would result in wrong code
+production rather than triggering compilation failures.
+Such bugs also are hard to catch by testing because they arise in
+corner cases of complex patterns which are typically not in the
+compiler test suite.
+
+The OCaml core developers group considered evolving the pattern matching compiler, either by
+using a new algorithm or by incremental refactorings of its codebase.
+For this reason we want to verify that new implementations of the
+compiler avoid the introduction of new bugs and that such
+modifications don't result in a different behaviour than the current one. 
+
+One possible approach is to formally verify the pattern matching compiler
+implementation using a machine checked proof.
+Another possibility, albeit with a weaker result, is to verify that
+each source program and target program pair are semantically correct.
+We chose the latter technique, translation validation because is easier to adopt in
+the case of a production compiler and to integrate with an existing codebase. The compiler is treated as a
+blackbox and proof only depends on our equivalence algorithm.
+
+\subsection{Our approach}
+%% replace common TODO
+Our algorithm translates both source and target programs into a common
+representation, decision trees. Decision trees where choosen because
+they model the space of possible values at a given branch of
+execution.
+Here is the decision tree for the source example program.
+\begin{verbatim}
+       Node(Root)
+       /        \
+     (= [])    (= ::)
+     /             \
+   Leaf         Node(Root.1)
+(0, None)       /         \
+             (= [])      (= ::)
+             /               \
+          Leaf              Leaf
+   (1, Some(Root.0))   (2, Some(Root.1.0))
+\end{verbatim}
+\texttt{(Root.0)} is called an \emph{accessor}, that represents the
+access path to a value that can be reached by deconstructing the
+scrutinee. In this example \texttt{Root.0} is the first subvalue of
+the scrutinee.
+
+Target decision trees have a similar shape but the tests on the
+branches are related to the low level representation of values in
+Lambda code. For example, cons cells \texttt{x::xs} are blocks with
+tag 0.
+
+To check the equivalence of a source and a target decision tree,
+we proceed by case analysis.
+If we have two terminals, such as leaves in the previous example,
+we check that the two right-hand-sides are equivalent.
+If we have a node $N$ and another tree $T$ we check equivalence for
+each child of $N$, which is a pair of a branch condition $\pi_i$ and a
+subtree $C_i$. For every child $(\pi_i, C_i)$ we reduce $T$ by killing all
+the branches that are incompatible with $\pi_i$ and check that the
+reduced tree is equivalent to $C_i$.
+
+For the internship we have choosen a simple subset of the OCaml
+language and implemented a prototype equivalence checker along with a
+formal statement of correctness and proof sketches.
+The prototype is to be included in the OCaml compiler infrastructure
+and will aid the development. 
+
+\subsection{From source programs to decision trees}
+Our source language supports integers, lists, tuples and all algebraic
+datatypes. Patterns support wildcards, constructors and literals, or
+patterns $(p_1|p_2)$ and pattern variables.
+We also support \texttt{when} guards.
+Decision trees have nodes of the form:
+\begin{lstlisting}
+ type decision_tree =
+  | Unreachable
+  | Failure
+  | Leaf of source_expr
+  | Guard of source_expr * decision_tree * decision_tree
+  | Switch of accessor * (constructor * decision_tree) list * decision_tree
+\end{lstlisting}
+In the \texttt{Switch} node we have one subtree for every head constructor
+that appears in the pattern matching clauses and a fallback case for
+other values. The branch condition $\pi_i$ expresses that the value at the
+switch accessor starts with the given constructor.
+\texttt{Failure} nodes express match failures for values that are not
+matched by the source clauses.
+\texttt{Unreachable} is used when we statically know that no value
+can flow to that subtree.
+
+We write $\sem{t_S}_S$ for the decision tree of the source program
+$t_S$, computed by a matrix decomposition algorithm (each column
+decomposition step gives a \texttt{Switch} node).
+It satisfies the following correctness statement:
+\[
+\forall t_S, \forall v_S, \quad t_S(v_S) = \sem{t_S}_S(v_S)
+\]
+Running any source values $v_S$ against the source program gives the
+same result as running it against the decision tree.
+
+\subsection{From target programs to decision trees}
+The target programs include the following Lambda constructs:
+\texttt{let, if, switch, Match\_failure, catch, exit, field} and
+various comparation operations, guards. The symbolic execution engine
+traverses the target program and builds an environment that maps
+variables to accessors. It branches at every control flow statement
+and emits a Switch node. The branch condition $\pi_i$ is expressed as
+an interval set of possible values at that point.
+Guards result in branching. In comparison with the source decision
+trees, \texttt{Unreachable} nodes are never emitted.
+
+We write $\sem{t_T}_T$ for the decision tree of the target program
+$t_T$, satisfying the following correctness statement:
+\[
+\forall t_T, \forall v_T, \quad t_T(v_T) = \sem{t_T}_T(v_T)
+\]
+
+\subsection{Equivalence checking}
 
-This dissertation presents an algorithm for the translation validation of the 
-pattern matching compiler. Given the source representation of the target program and the
-target program compiled in untyped lambda form, the algoritmhm is capable of modelling
-the source program in terms of symbolic constraints on it's branches and apply symbolic
-execution on the untyped lambda representation in order to validate wheter the compilation
-produced a valid result.
-In this context a valid result means that for every input in the domain of the source
-program the untyped lambda translation produces the same output as the source program.
-The input of the program is modelled in terms of symbolic constraints closely related to
-the runtime representation of  objects and the output consists of OCaml code
-blackboxes that are not evaluated in the context of the verification.
 
-\end{abstract}
 
 * Background