coppo #2
This commit is contained in:
parent
7ae47c2e45
commit
d35546fe8b
2 changed files with 28 additions and 22 deletions
3
tesi/.gitignore
vendored
3
tesi/.gitignore
vendored
|
@ -1,6 +1,9 @@
|
|||
tesi.tex
|
||||
.#tesi.tex
|
||||
|
||||
tesi_unicode.tex
|
||||
|
||||
|
||||
## Core latex/pdflatex auxiliary files:
|
||||
*.aux
|
||||
*.lof
|
||||
|
|
|
@ -49,8 +49,13 @@ TODO: talk about compiler stuff
|
|||
|
||||
This dissertation presents an algorithm for the translation validation of the OCaml pattern
|
||||
matching compiler. Given a source program and its compiled version the
|
||||
algorithm checks wheter the two are equivalent or produce a counter
|
||||
algorithm checks whether the two are equivalent or produce a counter
|
||||
example in case of a mismatch.
|
||||
For the prototype of this algorithm we have chosen a subset of the OCaml
|
||||
language and implemented a prototype equivalence checker along with a
|
||||
formal statement of correctness and its proof.
|
||||
The prototype is to be included in the OCaml compiler infrastructure
|
||||
and will aid the development.
|
||||
|
||||
Our equivalence algorithm works with decision trees. Source patterns are
|
||||
converted into a decision tree using a matrix decomposition algorithm.
|
||||
|
@ -58,7 +63,9 @@ Target programs, described in the Lambda intermediate
|
|||
representation language of the OCaml compiler, are turned into decision trees
|
||||
by applying symbolic execution.
|
||||
|
||||
\begin{comment}
|
||||
\subsection{Translation validation}
|
||||
\end{comment}
|
||||
A pattern matching compiler turns a series of pattern matching clauses
|
||||
into simple control flow structures such as \texttt{if, switch}, for example:
|
||||
\begin{lstlisting}
|
||||
|
@ -79,7 +86,9 @@ into simple control flow structures such as \texttt{if, switch}, for example:
|
|||
(makeblock 0 1 (makeblock 0 y)))))
|
||||
[0: 0 0a])
|
||||
\end{lstlisting}
|
||||
\begin{comment}
|
||||
%% TODO: side by side
|
||||
\end{comment}
|
||||
The code in the right is in the Lambda intermediate representation of
|
||||
the OCaml compiler. The Lambda representation of a program is shown by
|
||||
calling the \texttt{ocamlc} compiler with \texttt{-drawlambda} flag.
|
||||
|
@ -92,23 +101,23 @@ corner cases of complex patterns which are typically not in the
|
|||
compiler test suite.
|
||||
|
||||
The OCaml core developers group considered evolving the pattern matching compiler, either by
|
||||
using a new algorithm or by incremental refactorings of its codebase.
|
||||
using a new algorithm or by incremental refactoring of its code base.
|
||||
For this reason we want to verify that new implementations of the
|
||||
compiler avoid the introduction of new bugs and that such
|
||||
modifications don't result in a different behaviour than the current one.
|
||||
modifications don't result in a different behavior than the current one.
|
||||
|
||||
One possible approach is to formally verify the pattern matching compiler
|
||||
implementation using a machine checked proof.
|
||||
Another possibility, albeit with a weaker result, is to verify that
|
||||
each source program and target program pair are semantically correct.
|
||||
We chose the latter technique, translation validation because is easier to adopt in
|
||||
the case of a production compiler and to integrate with an existing codebase. The compiler is treated as a
|
||||
blackbox and proof only depends on our equivalence algorithm.
|
||||
the case of a production compiler and to integrate with an existing code base. The compiler is treated as a
|
||||
black-box and proof only depends on our equivalence algorithm.
|
||||
|
||||
\subsection{Our approach}
|
||||
%% replace common TODO
|
||||
Our algorithm translates both source and target programs into a common
|
||||
representation, decision trees. Decision trees where choosen because
|
||||
representation, decision trees. Decision trees where chosen because
|
||||
they model the space of possible values at a given branch of
|
||||
execution.
|
||||
Here is the decision tree for the source example program.
|
||||
|
@ -144,12 +153,6 @@ subtree $C_i$. For every child $(\pi_i, C_i)$ we reduce $T$ by killing all
|
|||
the branches that are incompatible with $\pi_i$ and check that the
|
||||
reduced tree is equivalent to $C_i$.
|
||||
|
||||
For the prototype we have choosen a simple subset of the OCaml
|
||||
language and implemented a prototype equivalence checker along with a
|
||||
formal statement of correctness and proof sketches.
|
||||
The prototype is to be included in the OCaml compiler infrastructure
|
||||
and will aid the development.
|
||||
|
||||
\subsection{From source programs to decision trees}
|
||||
Our source language supports integers, lists, tuples and all algebraic
|
||||
datatypes. Patterns support wildcards, constructors and literals, or
|
||||
|
@ -186,7 +189,7 @@ same result as running it against the decision tree.
|
|||
\subsection{From target programs to decision trees}
|
||||
The target programs include the following Lambda constructs:
|
||||
\texttt{let, if, switch, Match\_failure, catch, exit, field} and
|
||||
various comparation operations, guards. The symbolic execution engine
|
||||
various comparison operations, guards. The symbolic execution engine
|
||||
traverses the target program and builds an environment that maps
|
||||
variables to accessors. It branches at every control flow statement
|
||||
and emits a Switch node. The branch condition $\pi_i$ is expressed as
|
||||
|
@ -214,7 +217,7 @@ The main features of ML languages are the use of the Hindley-Milner type system
|
|||
provides many advantages with respect to static type systems of traditional imperative and object
|
||||
oriented language such as C, C++ and Java, such as:
|
||||
- Polymorphism: in certain scenarios a function can accept more than one
|
||||
type for the input parameters. For example a function that computes the lenght of a
|
||||
type for the input parameters. For example a function that computes the length of a
|
||||
list doesn't need to inspect the type of the elements of the list and for this reason
|
||||
a List.length function can accept lists of integers, lists of strings and in general
|
||||
lists of any type. Such languages offer polymorphic functions through subtyping at
|
||||
|
@ -231,7 +234,7 @@ oriented language such as C, C++ and Java, such as:
|
|||
programmer is not allowed to operate on data by ignoring or promoting its type.
|
||||
- Type Inference: the principal type of a well formed term can be inferred without any
|
||||
annotation or declaration.
|
||||
- Algebraic data types: types that are modelled by the use of two
|
||||
- Algebraic data types: types that are modeled by the use of two
|
||||
algebraic operations, sum and product.
|
||||
A sum type is a type that can hold of many different types of
|
||||
objects, but only one at a time. For example the sum type defined
|
||||
|
@ -248,14 +251,14 @@ although mutable statements and imperative constructs are permitted.
|
|||
In addition to that features an object system, that provides
|
||||
inheritance, subtyping and dynamic binding, and modules, that
|
||||
provide a way to encapsulate definitions. Modules are checked
|
||||
statically and can be reificated through functors.
|
||||
statically and can be reifycated through functors.
|
||||
|
||||
** Lambda form compilation
|
||||
\begin{comment}
|
||||
https://dev.realworld.org/compiler-backend.html
|
||||
\end{comment}
|
||||
|
||||
provides compilation in form of a byecode executable with an
|
||||
provides compilation in form of a bytecode executable with an
|
||||
optionally embeddable interpreter and a native executable that could
|
||||
be statically linked to provide a single file executable.
|
||||
|
||||
|
@ -263,7 +266,7 @@ After the typechecker has proven that the program is type safe,
|
|||
the compiler lower the code to /Lambda/, an s-expression based
|
||||
language that assumes that its input has already been proved safe.
|
||||
On the /Lambda/ representation of the source program, the compiler
|
||||
performes a series of optimization passes before translating the
|
||||
performs a series of optimization passes before translating the
|
||||
lambda form to assembly code.
|
||||
|
||||
*** OCaml Native Datatypes
|
||||
|
@ -298,7 +301,7 @@ There are several numeric types:
|
|||
- floats: that use IEEE754 double-precision (64-bit) arithmetic with
|
||||
the addition of the literals /infinity/, /neg_infinity/ and /nan/.
|
||||
|
||||
The are varios numeric operations defined:
|
||||
The are various numeric operations defined:
|
||||
|
||||
- Arithmetic operations: +, -, *, /, % (modulo), neg (unary negation)
|
||||
- Bitwise operations: &, |, ^, <<, >> (zero-shifting), a>> (sign extending)
|
||||
|
@ -349,7 +352,7 @@ match color with
|
|||
#+END_SRC
|
||||
|
||||
provides tokens to express data destructoring.
|
||||
For example we can examine the content of a list with patten matching
|
||||
For example we can examine the content of a list with pattern matching
|
||||
|
||||
#+BEGIN_SRC
|
||||
|
||||
|
@ -432,7 +435,7 @@ We distinguish
|
|||
- Unreachable: statically it is known that no value can go there
|
||||
- Failure: a value matching this part results in an error
|
||||
- Leaf: a value matching this part results into the evaluation of a
|
||||
source blackbox of code
|
||||
source black box of code
|
||||
|
||||
The algorithm doesn't support type-declaration-based analysis
|
||||
to know the list of constructors at a given type.
|
||||
|
@ -758,7 +761,7 @@ following four rules:
|
|||
\end{equation*}
|
||||
4) Mixture rule:
|
||||
When none of the previous rules apply the clause matrix P → L is
|
||||
splitted into two clause matrices, the first P₁ → L₁ that is the
|
||||
split into two clause matrices, the first P₁ → L₁ that is the
|
||||
largest prefix matrix for which one of the three previous rules
|
||||
apply, and P₂ → L₂ containing the remaining rows. The algorithm is
|
||||
applied to both matrices.
|
||||
|
|
Loading…
Reference in a new issue