maggiore introduzione
This commit is contained in:
parent
e4fc6b92aa
commit
926433c7c1
2 changed files with 159 additions and 14 deletions
|
@ -9,7 +9,7 @@ tesi:
|
||||||
rm -f $(DEL)
|
rm -f $(DEL)
|
||||||
@echo
|
@echo
|
||||||
@echo "=== Building from scratch ==="
|
@echo "=== Building from scratch ==="
|
||||||
emacs tesi_unicode.org -f org-latex-export-to-latex --kill
|
emacs -batch tesi_unicode.org -f org-latex-export-to-latex --kill
|
||||||
python3 conv.py tesi_unicode.tex tesi.tex
|
python3 conv.py tesi_unicode.tex tesi.tex
|
||||||
pdflatex $(SRC)
|
pdflatex $(SRC)
|
||||||
bibtex $(AUX)
|
bibtex $(AUX)
|
||||||
|
|
|
@ -38,25 +38,170 @@
|
||||||
#+LaTeX_HEADER: \usepackage{graphicx}
|
#+LaTeX_HEADER: \usepackage{graphicx}
|
||||||
#+LaTeX_HEADER: \usepackage{listings}
|
#+LaTeX_HEADER: \usepackage{listings}
|
||||||
#+LaTeX_HEADER: \usepackage{color}
|
#+LaTeX_HEADER: \usepackage{color}
|
||||||
|
#+LaTeX_HEADER: \usepackage{stmaryrd}
|
||||||
|
#+LaTeX_HEADER: \newcommand{\sem}[1]{{\llbracket{#1}\rrbracket}}
|
||||||
#+EXPORT_SELECT_TAGS: export
|
#+EXPORT_SELECT_TAGS: export
|
||||||
#+EXPORT_EXCLUDE_TAGS: noexport
|
#+EXPORT_EXCLUDE_TAGS: noexport
|
||||||
#+OPTIONS: H:2 toc:nil \n:nil @:t ::t |:t ^:{} _:{} *:t TeX:t LaTeX:t
|
#+OPTIONS: H:2 toc:nil \n:nil @:t ::t |:t ^:{} _:{} *:t TeX:t LaTeX:t
|
||||||
#+STARTUP: showall
|
#+STARTUP: showall
|
||||||
\begin{abstract}
|
\section{Introduction}
|
||||||
|
|
||||||
|
This dissertation presents an algorithm for the translation validation of the OCaml pattern
|
||||||
|
matching compiler. Given a source program and its compiled version the
|
||||||
|
algorithm checks wheter the two are equivalent or produce a counter
|
||||||
|
example in case of a mismatch.
|
||||||
|
|
||||||
|
Our equivalence algorithm works with decision trees. Source patterns are
|
||||||
|
converted into a decision tree using a matrix decomposition algorithm.
|
||||||
|
Target programs, described in the Lambda intermediate
|
||||||
|
representation language of the OCaml compiler, are turned into decision trees
|
||||||
|
by applying symbolic execution.
|
||||||
|
|
||||||
|
\subsection{Translation validation}
|
||||||
|
A pattern matching compiler turns a series of pattern matching clauses
|
||||||
|
into simple control flow structures such as \texttt{if, switch}, for example:
|
||||||
|
\begin{lstlisting}
|
||||||
|
match x with
|
||||||
|
| [] -> (0, None)
|
||||||
|
| x::[] -> (1, Some x)
|
||||||
|
| _::y::_ -> (2, Some y)
|
||||||
|
\end{lstlisting}
|
||||||
|
\begin{lstlisting}
|
||||||
|
(if scrutinee
|
||||||
|
(let (field_1 =a (field 1 scrutinee))
|
||||||
|
(if field_1
|
||||||
|
(let
|
||||||
|
(field_1_1 =a (field 1 field_1)
|
||||||
|
x =a (field 0 field_1))
|
||||||
|
(makeblock 0 2 (makeblock 0 x)))
|
||||||
|
(let (y =a (field 0 scrutinee))
|
||||||
|
(makeblock 0 1 (makeblock 0 y)))))
|
||||||
|
[0: 0 0a])
|
||||||
|
\end{lstlisting}
|
||||||
|
%% TODO: side by side
|
||||||
|
The code in the right is in the Lambda intermediate representation of
|
||||||
|
the OCaml compiler. The Lambda representation of a program is shown by
|
||||||
|
calling the \texttt{ocamlc} compiler with \texttt{-drawlambda} flag.
|
||||||
|
|
||||||
|
The OCaml pattern matching compiler is a critical part of the OCaml compiler
|
||||||
|
in terms of correctness because any bug would result in wrong code
|
||||||
|
production rather than triggering compilation failures.
|
||||||
|
Such bugs also are hard to catch by testing because they arise in
|
||||||
|
corner cases of complex patterns which are typically not in the
|
||||||
|
compiler test suite.
|
||||||
|
|
||||||
|
The OCaml core developers group considered evolving the pattern matching compiler, either by
|
||||||
|
using a new algorithm or by incremental refactorings of its codebase.
|
||||||
|
For this reason we want to verify that new implementations of the
|
||||||
|
compiler avoid the introduction of new bugs and that such
|
||||||
|
modifications don't result in a different behaviour than the current one.
|
||||||
|
|
||||||
|
One possible approach is to formally verify the pattern matching compiler
|
||||||
|
implementation using a machine checked proof.
|
||||||
|
Another possibility, albeit with a weaker result, is to verify that
|
||||||
|
each source program and target program pair are semantically correct.
|
||||||
|
We chose the latter technique, translation validation because is easier to adopt in
|
||||||
|
the case of a production compiler and to integrate with an existing codebase. The compiler is treated as a
|
||||||
|
blackbox and proof only depends on our equivalence algorithm.
|
||||||
|
|
||||||
|
\subsection{Our approach}
|
||||||
|
%% replace common TODO
|
||||||
|
Our algorithm translates both source and target programs into a common
|
||||||
|
representation, decision trees. Decision trees where choosen because
|
||||||
|
they model the space of possible values at a given branch of
|
||||||
|
execution.
|
||||||
|
Here is the decision tree for the source example program.
|
||||||
|
\begin{verbatim}
|
||||||
|
Node(Root)
|
||||||
|
/ \
|
||||||
|
(= []) (= ::)
|
||||||
|
/ \
|
||||||
|
Leaf Node(Root.1)
|
||||||
|
(0, None) / \
|
||||||
|
(= []) (= ::)
|
||||||
|
/ \
|
||||||
|
Leaf Leaf
|
||||||
|
(1, Some(Root.0)) (2, Some(Root.1.0))
|
||||||
|
\end{verbatim}
|
||||||
|
\texttt{(Root.0)} is called an \emph{accessor}, that represents the
|
||||||
|
access path to a value that can be reached by deconstructing the
|
||||||
|
scrutinee. In this example \texttt{Root.0} is the first subvalue of
|
||||||
|
the scrutinee.
|
||||||
|
|
||||||
|
Target decision trees have a similar shape but the tests on the
|
||||||
|
branches are related to the low level representation of values in
|
||||||
|
Lambda code. For example, cons cells \texttt{x::xs} are blocks with
|
||||||
|
tag 0.
|
||||||
|
|
||||||
|
To check the equivalence of a source and a target decision tree,
|
||||||
|
we proceed by case analysis.
|
||||||
|
If we have two terminals, such as leaves in the previous example,
|
||||||
|
we check that the two right-hand-sides are equivalent.
|
||||||
|
If we have a node $N$ and another tree $T$ we check equivalence for
|
||||||
|
each child of $N$, which is a pair of a branch condition $\pi_i$ and a
|
||||||
|
subtree $C_i$. For every child $(\pi_i, C_i)$ we reduce $T$ by killing all
|
||||||
|
the branches that are incompatible with $\pi_i$ and check that the
|
||||||
|
reduced tree is equivalent to $C_i$.
|
||||||
|
|
||||||
|
For the internship we have choosen a simple subset of the OCaml
|
||||||
|
language and implemented a prototype equivalence checker along with a
|
||||||
|
formal statement of correctness and proof sketches.
|
||||||
|
The prototype is to be included in the OCaml compiler infrastructure
|
||||||
|
and will aid the development.
|
||||||
|
|
||||||
|
\subsection{From source programs to decision trees}
|
||||||
|
Our source language supports integers, lists, tuples and all algebraic
|
||||||
|
datatypes. Patterns support wildcards, constructors and literals, or
|
||||||
|
patterns $(p_1|p_2)$ and pattern variables.
|
||||||
|
We also support \texttt{when} guards.
|
||||||
|
Decision trees have nodes of the form:
|
||||||
|
\begin{lstlisting}
|
||||||
|
type decision_tree =
|
||||||
|
| Unreachable
|
||||||
|
| Failure
|
||||||
|
| Leaf of source_expr
|
||||||
|
| Guard of source_expr * decision_tree * decision_tree
|
||||||
|
| Switch of accessor * (constructor * decision_tree) list * decision_tree
|
||||||
|
\end{lstlisting}
|
||||||
|
In the \texttt{Switch} node we have one subtree for every head constructor
|
||||||
|
that appears in the pattern matching clauses and a fallback case for
|
||||||
|
other values. The branch condition $\pi_i$ expresses that the value at the
|
||||||
|
switch accessor starts with the given constructor.
|
||||||
|
\texttt{Failure} nodes express match failures for values that are not
|
||||||
|
matched by the source clauses.
|
||||||
|
\texttt{Unreachable} is used when we statically know that no value
|
||||||
|
can flow to that subtree.
|
||||||
|
|
||||||
|
We write $\sem{t_S}_S$ for the decision tree of the source program
|
||||||
|
$t_S$, computed by a matrix decomposition algorithm (each column
|
||||||
|
decomposition step gives a \texttt{Switch} node).
|
||||||
|
It satisfies the following correctness statement:
|
||||||
|
\[
|
||||||
|
\forall t_S, \forall v_S, \quad t_S(v_S) = \sem{t_S}_S(v_S)
|
||||||
|
\]
|
||||||
|
Running any source values $v_S$ against the source program gives the
|
||||||
|
same result as running it against the decision tree.
|
||||||
|
|
||||||
|
\subsection{From target programs to decision trees}
|
||||||
|
The target programs include the following Lambda constructs:
|
||||||
|
\texttt{let, if, switch, Match\_failure, catch, exit, field} and
|
||||||
|
various comparation operations, guards. The symbolic execution engine
|
||||||
|
traverses the target program and builds an environment that maps
|
||||||
|
variables to accessors. It branches at every control flow statement
|
||||||
|
and emits a Switch node. The branch condition $\pi_i$ is expressed as
|
||||||
|
an interval set of possible values at that point.
|
||||||
|
Guards result in branching. In comparison with the source decision
|
||||||
|
trees, \texttt{Unreachable} nodes are never emitted.
|
||||||
|
|
||||||
|
We write $\sem{t_T}_T$ for the decision tree of the target program
|
||||||
|
$t_T$, satisfying the following correctness statement:
|
||||||
|
\[
|
||||||
|
\forall t_T, \forall v_T, \quad t_T(v_T) = \sem{t_T}_T(v_T)
|
||||||
|
\]
|
||||||
|
|
||||||
|
\subsection{Equivalence checking}
|
||||||
|
|
||||||
This dissertation presents an algorithm for the translation validation of the
|
|
||||||
pattern matching compiler. Given the source representation of the target program and the
|
|
||||||
target program compiled in untyped lambda form, the algoritmhm is capable of modelling
|
|
||||||
the source program in terms of symbolic constraints on it's branches and apply symbolic
|
|
||||||
execution on the untyped lambda representation in order to validate wheter the compilation
|
|
||||||
produced a valid result.
|
|
||||||
In this context a valid result means that for every input in the domain of the source
|
|
||||||
program the untyped lambda translation produces the same output as the source program.
|
|
||||||
The input of the program is modelled in terms of symbolic constraints closely related to
|
|
||||||
the runtime representation of objects and the output consists of OCaml code
|
|
||||||
blackboxes that are not evaluated in the context of the verification.
|
|
||||||
|
|
||||||
\end{abstract}
|
|
||||||
|
|
||||||
* Background
|
* Background
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue