From 926433c7c1f726a76f6e6fee57a6be41214afcdb Mon Sep 17 00:00:00 2001 From: Francesco Mecca Date: Thu, 12 Mar 2020 12:20:07 +0100 Subject: [PATCH] maggiore introduzione --- tesi/Makefile | 2 +- tesi/tesi_unicode.org | 171 ++++++++++++++++++++++++++++++++++++++---- 2 files changed, 159 insertions(+), 14 deletions(-) diff --git a/tesi/Makefile b/tesi/Makefile index 930f6e1..7e4f052 100644 --- a/tesi/Makefile +++ b/tesi/Makefile @@ -9,7 +9,7 @@ tesi: rm -f $(DEL) @echo @echo "=== Building from scratch ===" - emacs tesi_unicode.org -f org-latex-export-to-latex --kill + emacs -batch tesi_unicode.org -f org-latex-export-to-latex --kill python3 conv.py tesi_unicode.tex tesi.tex pdflatex $(SRC) bibtex $(AUX) diff --git a/tesi/tesi_unicode.org b/tesi/tesi_unicode.org index f4d024b..5182657 100644 --- a/tesi/tesi_unicode.org +++ b/tesi/tesi_unicode.org @@ -38,25 +38,170 @@ #+LaTeX_HEADER: \usepackage{graphicx} #+LaTeX_HEADER: \usepackage{listings} #+LaTeX_HEADER: \usepackage{color} +#+LaTeX_HEADER: \usepackage{stmaryrd} +#+LaTeX_HEADER: \newcommand{\sem}[1]{{\llbracket{#1}\rrbracket}} #+EXPORT_SELECT_TAGS: export #+EXPORT_EXCLUDE_TAGS: noexport #+OPTIONS: H:2 toc:nil \n:nil @:t ::t |:t ^:{} _:{} *:t TeX:t LaTeX:t #+STARTUP: showall -\begin{abstract} +\section{Introduction} + +This dissertation presents an algorithm for the translation validation of the OCaml pattern +matching compiler. Given a source program and its compiled version the +algorithm checks wheter the two are equivalent or produce a counter +example in case of a mismatch. + +Our equivalence algorithm works with decision trees. Source patterns are +converted into a decision tree using a matrix decomposition algorithm. +Target programs, described in the Lambda intermediate +representation language of the OCaml compiler, are turned into decision trees +by applying symbolic execution. + +\subsection{Translation validation} +A pattern matching compiler turns a series of pattern matching clauses +into simple control flow structures such as \texttt{if, switch}, for example: +\begin{lstlisting} + match x with + | [] -> (0, None) + | x::[] -> (1, Some x) + | _::y::_ -> (2, Some y) +\end{lstlisting} +\begin{lstlisting} +(if scrutinee + (let (field_1 =a (field 1 scrutinee)) + (if field_1 + (let + (field_1_1 =a (field 1 field_1) + x =a (field 0 field_1)) + (makeblock 0 2 (makeblock 0 x))) + (let (y =a (field 0 scrutinee)) + (makeblock 0 1 (makeblock 0 y))))) + [0: 0 0a]) +\end{lstlisting} +%% TODO: side by side +The code in the right is in the Lambda intermediate representation of +the OCaml compiler. The Lambda representation of a program is shown by +calling the \texttt{ocamlc} compiler with \texttt{-drawlambda} flag. + +The OCaml pattern matching compiler is a critical part of the OCaml compiler +in terms of correctness because any bug would result in wrong code +production rather than triggering compilation failures. +Such bugs also are hard to catch by testing because they arise in +corner cases of complex patterns which are typically not in the +compiler test suite. + +The OCaml core developers group considered evolving the pattern matching compiler, either by +using a new algorithm or by incremental refactorings of its codebase. +For this reason we want to verify that new implementations of the +compiler avoid the introduction of new bugs and that such +modifications don't result in a different behaviour than the current one. + +One possible approach is to formally verify the pattern matching compiler +implementation using a machine checked proof. +Another possibility, albeit with a weaker result, is to verify that +each source program and target program pair are semantically correct. +We chose the latter technique, translation validation because is easier to adopt in +the case of a production compiler and to integrate with an existing codebase. The compiler is treated as a +blackbox and proof only depends on our equivalence algorithm. + +\subsection{Our approach} +%% replace common TODO +Our algorithm translates both source and target programs into a common +representation, decision trees. Decision trees where choosen because +they model the space of possible values at a given branch of +execution. +Here is the decision tree for the source example program. +\begin{verbatim} + Node(Root) + / \ + (= []) (= ::) + / \ + Leaf Node(Root.1) +(0, None) / \ + (= []) (= ::) + / \ + Leaf Leaf + (1, Some(Root.0)) (2, Some(Root.1.0)) +\end{verbatim} +\texttt{(Root.0)} is called an \emph{accessor}, that represents the +access path to a value that can be reached by deconstructing the +scrutinee. In this example \texttt{Root.0} is the first subvalue of +the scrutinee. + +Target decision trees have a similar shape but the tests on the +branches are related to the low level representation of values in +Lambda code. For example, cons cells \texttt{x::xs} are blocks with +tag 0. + +To check the equivalence of a source and a target decision tree, +we proceed by case analysis. +If we have two terminals, such as leaves in the previous example, +we check that the two right-hand-sides are equivalent. +If we have a node $N$ and another tree $T$ we check equivalence for +each child of $N$, which is a pair of a branch condition $\pi_i$ and a +subtree $C_i$. For every child $(\pi_i, C_i)$ we reduce $T$ by killing all +the branches that are incompatible with $\pi_i$ and check that the +reduced tree is equivalent to $C_i$. + +For the internship we have choosen a simple subset of the OCaml +language and implemented a prototype equivalence checker along with a +formal statement of correctness and proof sketches. +The prototype is to be included in the OCaml compiler infrastructure +and will aid the development. + +\subsection{From source programs to decision trees} +Our source language supports integers, lists, tuples and all algebraic +datatypes. Patterns support wildcards, constructors and literals, or +patterns $(p_1|p_2)$ and pattern variables. +We also support \texttt{when} guards. +Decision trees have nodes of the form: +\begin{lstlisting} + type decision_tree = + | Unreachable + | Failure + | Leaf of source_expr + | Guard of source_expr * decision_tree * decision_tree + | Switch of accessor * (constructor * decision_tree) list * decision_tree +\end{lstlisting} +In the \texttt{Switch} node we have one subtree for every head constructor +that appears in the pattern matching clauses and a fallback case for +other values. The branch condition $\pi_i$ expresses that the value at the +switch accessor starts with the given constructor. +\texttt{Failure} nodes express match failures for values that are not +matched by the source clauses. +\texttt{Unreachable} is used when we statically know that no value +can flow to that subtree. + +We write $\sem{t_S}_S$ for the decision tree of the source program +$t_S$, computed by a matrix decomposition algorithm (each column +decomposition step gives a \texttt{Switch} node). +It satisfies the following correctness statement: +\[ +\forall t_S, \forall v_S, \quad t_S(v_S) = \sem{t_S}_S(v_S) +\] +Running any source values $v_S$ against the source program gives the +same result as running it against the decision tree. + +\subsection{From target programs to decision trees} +The target programs include the following Lambda constructs: +\texttt{let, if, switch, Match\_failure, catch, exit, field} and +various comparation operations, guards. The symbolic execution engine +traverses the target program and builds an environment that maps +variables to accessors. It branches at every control flow statement +and emits a Switch node. The branch condition $\pi_i$ is expressed as +an interval set of possible values at that point. +Guards result in branching. In comparison with the source decision +trees, \texttt{Unreachable} nodes are never emitted. + +We write $\sem{t_T}_T$ for the decision tree of the target program +$t_T$, satisfying the following correctness statement: +\[ +\forall t_T, \forall v_T, \quad t_T(v_T) = \sem{t_T}_T(v_T) +\] + +\subsection{Equivalence checking} -This dissertation presents an algorithm for the translation validation of the -pattern matching compiler. Given the source representation of the target program and the -target program compiled in untyped lambda form, the algoritmhm is capable of modelling -the source program in terms of symbolic constraints on it's branches and apply symbolic -execution on the untyped lambda representation in order to validate wheter the compilation -produced a valid result. -In this context a valid result means that for every input in the domain of the source -program the untyped lambda translation produces the same output as the source program. -The input of the program is modelled in terms of symbolic constraints closely related to -the runtime representation of objects and the output consists of OCaml code -blackboxes that are not evaluated in the context of the verification. -\end{abstract} * Background