maggiore introduzione

2020-03-12 12:20:07 +01:00 · 2020-03-12 12:20:07 +01:00 · 926433c7c1
commit 926433c7c1
parent e4fc6b92aa
2 changed files with 159 additions and 14 deletions
--- a/tesi/Makefile
+++ b/tesi/Makefile
@ -9,7 +9,7 @@ tesi:
 	rm -f $(DEL)
 	@echo
 	@echo "=== Building from scratch ==="
-	emacs tesi_unicode.org  -f org-latex-export-to-latex --kill
+	emacs -batch tesi_unicode.org  -f org-latex-export-to-latex --kill
 	python3 conv.py tesi_unicode.tex tesi.tex
 	pdflatex $(SRC)
 	bibtex $(AUX)
--- a/tesi/tesi_unicode.org
+++ b/tesi/tesi_unicode.org
@ -38,25 +38,170 @@
 #+LaTeX_HEADER: \usepackage{graphicx} 
 #+LaTeX_HEADER: \usepackage{listings} 
 #+LaTeX_HEADER: \usepackage{color} 
 #+LaTeX_HEADER: \usepackage{stmaryrd}
 #+LaTeX_HEADER: \newcommand{\sem}[1]{{\llbracket{#1}\rrbracket}}
 #+EXPORT_SELECT_TAGS: export
 #+EXPORT_EXCLUDE_TAGS: noexport
 #+OPTIONS: H:2 toc:nil \n:nil @:t ::t |:t ^:{} _:{} *:t TeX:t LaTeX:t
 #+STARTUP: showall
-\begin{abstract}
+\section{Introduction}
 This dissertation presents an algorithm for the translation validation of the OCaml pattern
 matching compiler. Given a source program and its compiled version the
 algorithm checks wheter the two are equivalent or produce a counter
 example in case of a mismatch.
 Our equivalence algorithm works with decision trees. Source patterns are
 converted into a decision tree using a matrix decomposition algorithm.
 Target programs, described in the Lambda intermediate
 representation language of the OCaml compiler, are turned into decision trees
 by applying symbolic execution.
 \subsection{Translation validation}
 A pattern matching compiler turns a series of pattern matching clauses
 into simple control flow structures such as \texttt{if, switch}, for example:
 \begin{lstlisting}
  match x with
  | [] -> (0, None)
  | x::[] -> (1, Some x)
  | _::y::_ -> (2, Some y)
 \end{lstlisting}
 \begin{lstlisting}
 (if scrutinee
    (let (field_1 =a (field 1 scrutinee))
        (if field_1
            (let
                (field_1_1 =a (field 1 field_1)
                 x =a (field 0 field_1))
                (makeblock 0 2 (makeblock 0 x)))
            (let (y =a (field 0 scrutinee))
                (makeblock 0 1 (makeblock 0 y)))))
    [0: 0 0a])
 \end{lstlisting}
 %% TODO: side by side
 The code in the right is in the Lambda intermediate representation of
 the OCaml compiler. The Lambda representation of a program is shown by
 calling the \texttt{ocamlc} compiler with \texttt{-drawlambda} flag.
 The OCaml pattern matching compiler is a critical part of the OCaml compiler
 in terms of correctness because any bug would result in wrong code
 production rather than triggering compilation failures.
 Such bugs also are hard to catch by testing because they arise in
 corner cases of complex patterns which are typically not in the
 compiler test suite.
 The OCaml core developers group considered evolving the pattern matching compiler, either by
 using a new algorithm or by incremental refactorings of its codebase.
 For this reason we want to verify that new implementations of the
 compiler avoid the introduction of new bugs and that such
 modifications don't result in a different behaviour than the current one. 
 One possible approach is to formally verify the pattern matching compiler
 implementation using a machine checked proof.
 Another possibility, albeit with a weaker result, is to verify that
 each source program and target program pair are semantically correct.
 We chose the latter technique, translation validation because is easier to adopt in
 the case of a production compiler and to integrate with an existing codebase. The compiler is treated as a
 blackbox and proof only depends on our equivalence algorithm.
 \subsection{Our approach}
 %% replace common TODO
 Our algorithm translates both source and target programs into a common
 representation, decision trees. Decision trees where choosen because
 they model the space of possible values at a given branch of
 execution.
 Here is the decision tree for the source example program.
 \begin{verbatim}
       Node(Root)
       /        \
     (= [])    (= ::)
     /             \
   Leaf         Node(Root.1)
 (0, None)       /         \
             (= [])      (= ::)
             /               \
          Leaf              Leaf
   (1, Some(Root.0))   (2, Some(Root.1.0))
 \end{verbatim}
 \texttt{(Root.0)} is called an \emph{accessor}, that represents the
 access path to a value that can be reached by deconstructing the
 scrutinee. In this example \texttt{Root.0} is the first subvalue of
 the scrutinee.
 Target decision trees have a similar shape but the tests on the
 branches are related to the low level representation of values in
 Lambda code. For example, cons cells \texttt{x::xs} are blocks with
 tag 0.
 To check the equivalence of a source and a target decision tree,
 we proceed by case analysis.
 If we have two terminals, such as leaves in the previous example,
 we check that the two right-hand-sides are equivalent.
 If we have a node $N$ and another tree $T$ we check equivalence for
 each child of $N$, which is a pair of a branch condition $\pi_i$ and a
 subtree $C_i$. For every child $(\pi_i, C_i)$ we reduce $T$ by killing all
 the branches that are incompatible with $\pi_i$ and check that the
 reduced tree is equivalent to $C_i$.
 For the internship we have choosen a simple subset of the OCaml
 language and implemented a prototype equivalence checker along with a
 formal statement of correctness and proof sketches.
 The prototype is to be included in the OCaml compiler infrastructure
 and will aid the development. 
 \subsection{From source programs to decision trees}
 Our source language supports integers, lists, tuples and all algebraic
 datatypes. Patterns support wildcards, constructors and literals, or
 patterns $(p_1|p_2)$ and pattern variables.
 We also support \texttt{when} guards.
 Decision trees have nodes of the form:
 \begin{lstlisting}
 type decision_tree =
  | Unreachable
  | Failure
  | Leaf of source_expr
  | Guard of source_expr * decision_tree * decision_tree
  | Switch of accessor * (constructor * decision_tree) list * decision_tree
 \end{lstlisting}
 In the \texttt{Switch} node we have one subtree for every head constructor
 that appears in the pattern matching clauses and a fallback case for
 other values. The branch condition $\pi_i$ expresses that the value at the
 switch accessor starts with the given constructor.
 \texttt{Failure} nodes express match failures for values that are not
 matched by the source clauses.
 \texttt{Unreachable} is used when we statically know that no value
 can flow to that subtree.
 We write $\sem{t_S}_S$ for the decision tree of the source program
 $t_S$, computed by a matrix decomposition algorithm (each column
 decomposition step gives a \texttt{Switch} node).
 It satisfies the following correctness statement:
 \[
 \forall t_S, \forall v_S, \quad t_S(v_S) = \sem{t_S}_S(v_S)
 \]
 Running any source values $v_S$ against the source program gives the
 same result as running it against the decision tree.
 \subsection{From target programs to decision trees}
 The target programs include the following Lambda constructs:
 \texttt{let, if, switch, Match\_failure, catch, exit, field} and
 various comparation operations, guards. The symbolic execution engine
 traverses the target program and builds an environment that maps
 variables to accessors. It branches at every control flow statement
 and emits a Switch node. The branch condition $\pi_i$ is expressed as
 an interval set of possible values at that point.
 Guards result in branching. In comparison with the source decision
 trees, \texttt{Unreachable} nodes are never emitted.
 We write $\sem{t_T}_T$ for the decision tree of the target program
 $t_T$, satisfying the following correctness statement:
 \[
 \forall t_T, \forall v_T, \quad t_T(v_T) = \sem{t_T}_T(v_T)
 \]
 \subsection{Equivalence checking}
 This dissertation presents an algorithm for the translation validation of the 
 pattern matching compiler. Given the source representation of the target program and the
 target program compiled in untyped lambda form, the algoritmhm is capable of modelling
 the source program in terms of symbolic constraints on it's branches and apply symbolic
 execution on the untyped lambda representation in order to validate wheter the compilation
 produced a valid result.
 In this context a valid result means that for every input in the domain of the source
 program the untyped lambda translation produces the same output as the source program.
 The input of the program is modelled in terms of symbolic constraints closely related to
 the runtime representation of  objects and the output consists of OCaml code
 blackboxes that are not evaluated in the context of the verification.
 \end{abstract}
 * Background