This commit is contained in:
Francesco Mecca 2020-03-12 19:37:38 +01:00
parent 7ae47c2e45
commit d35546fe8b
2 changed files with 28 additions and 22 deletions

3
tesi/.gitignore vendored
View file

@ -1,6 +1,9 @@
tesi.tex tesi.tex
.#tesi.tex .#tesi.tex
tesi_unicode.tex
## Core latex/pdflatex auxiliary files: ## Core latex/pdflatex auxiliary files:
*.aux *.aux
*.lof *.lof

View file

@ -49,8 +49,13 @@ TODO: talk about compiler stuff
This dissertation presents an algorithm for the translation validation of the OCaml pattern This dissertation presents an algorithm for the translation validation of the OCaml pattern
matching compiler. Given a source program and its compiled version the matching compiler. Given a source program and its compiled version the
algorithm checks wheter the two are equivalent or produce a counter algorithm checks whether the two are equivalent or produce a counter
example in case of a mismatch. example in case of a mismatch.
For the prototype of this algorithm we have chosen a subset of the OCaml
language and implemented a prototype equivalence checker along with a
formal statement of correctness and its proof.
The prototype is to be included in the OCaml compiler infrastructure
and will aid the development.
Our equivalence algorithm works with decision trees. Source patterns are Our equivalence algorithm works with decision trees. Source patterns are
converted into a decision tree using a matrix decomposition algorithm. converted into a decision tree using a matrix decomposition algorithm.
@ -58,7 +63,9 @@ Target programs, described in the Lambda intermediate
representation language of the OCaml compiler, are turned into decision trees representation language of the OCaml compiler, are turned into decision trees
by applying symbolic execution. by applying symbolic execution.
\begin{comment}
\subsection{Translation validation} \subsection{Translation validation}
\end{comment}
A pattern matching compiler turns a series of pattern matching clauses A pattern matching compiler turns a series of pattern matching clauses
into simple control flow structures such as \texttt{if, switch}, for example: into simple control flow structures such as \texttt{if, switch}, for example:
\begin{lstlisting} \begin{lstlisting}
@ -79,7 +86,9 @@ into simple control flow structures such as \texttt{if, switch}, for example:
(makeblock 0 1 (makeblock 0 y))))) (makeblock 0 1 (makeblock 0 y)))))
[0: 0 0a]) [0: 0 0a])
\end{lstlisting} \end{lstlisting}
\begin{comment}
%% TODO: side by side %% TODO: side by side
\end{comment}
The code in the right is in the Lambda intermediate representation of The code in the right is in the Lambda intermediate representation of
the OCaml compiler. The Lambda representation of a program is shown by the OCaml compiler. The Lambda representation of a program is shown by
calling the \texttt{ocamlc} compiler with \texttt{-drawlambda} flag. calling the \texttt{ocamlc} compiler with \texttt{-drawlambda} flag.
@ -92,23 +101,23 @@ corner cases of complex patterns which are typically not in the
compiler test suite. compiler test suite.
The OCaml core developers group considered evolving the pattern matching compiler, either by The OCaml core developers group considered evolving the pattern matching compiler, either by
using a new algorithm or by incremental refactorings of its codebase. using a new algorithm or by incremental refactoring of its code base.
For this reason we want to verify that new implementations of the For this reason we want to verify that new implementations of the
compiler avoid the introduction of new bugs and that such compiler avoid the introduction of new bugs and that such
modifications don't result in a different behaviour than the current one. modifications don't result in a different behavior than the current one.
One possible approach is to formally verify the pattern matching compiler One possible approach is to formally verify the pattern matching compiler
implementation using a machine checked proof. implementation using a machine checked proof.
Another possibility, albeit with a weaker result, is to verify that Another possibility, albeit with a weaker result, is to verify that
each source program and target program pair are semantically correct. each source program and target program pair are semantically correct.
We chose the latter technique, translation validation because is easier to adopt in We chose the latter technique, translation validation because is easier to adopt in
the case of a production compiler and to integrate with an existing codebase. The compiler is treated as a the case of a production compiler and to integrate with an existing code base. The compiler is treated as a
blackbox and proof only depends on our equivalence algorithm. black-box and proof only depends on our equivalence algorithm.
\subsection{Our approach} \subsection{Our approach}
%% replace common TODO %% replace common TODO
Our algorithm translates both source and target programs into a common Our algorithm translates both source and target programs into a common
representation, decision trees. Decision trees where choosen because representation, decision trees. Decision trees where chosen because
they model the space of possible values at a given branch of they model the space of possible values at a given branch of
execution. execution.
Here is the decision tree for the source example program. Here is the decision tree for the source example program.
@ -144,12 +153,6 @@ subtree $C_i$. For every child $(\pi_i, C_i)$ we reduce $T$ by killing all
the branches that are incompatible with $\pi_i$ and check that the the branches that are incompatible with $\pi_i$ and check that the
reduced tree is equivalent to $C_i$. reduced tree is equivalent to $C_i$.
For the prototype we have choosen a simple subset of the OCaml
language and implemented a prototype equivalence checker along with a
formal statement of correctness and proof sketches.
The prototype is to be included in the OCaml compiler infrastructure
and will aid the development.
\subsection{From source programs to decision trees} \subsection{From source programs to decision trees}
Our source language supports integers, lists, tuples and all algebraic Our source language supports integers, lists, tuples and all algebraic
datatypes. Patterns support wildcards, constructors and literals, or datatypes. Patterns support wildcards, constructors and literals, or
@ -186,7 +189,7 @@ same result as running it against the decision tree.
\subsection{From target programs to decision trees} \subsection{From target programs to decision trees}
The target programs include the following Lambda constructs: The target programs include the following Lambda constructs:
\texttt{let, if, switch, Match\_failure, catch, exit, field} and \texttt{let, if, switch, Match\_failure, catch, exit, field} and
various comparation operations, guards. The symbolic execution engine various comparison operations, guards. The symbolic execution engine
traverses the target program and builds an environment that maps traverses the target program and builds an environment that maps
variables to accessors. It branches at every control flow statement variables to accessors. It branches at every control flow statement
and emits a Switch node. The branch condition $\pi_i$ is expressed as and emits a Switch node. The branch condition $\pi_i$ is expressed as
@ -214,7 +217,7 @@ The main features of ML languages are the use of the Hindley-Milner type system
provides many advantages with respect to static type systems of traditional imperative and object provides many advantages with respect to static type systems of traditional imperative and object
oriented language such as C, C++ and Java, such as: oriented language such as C, C++ and Java, such as:
- Polymorphism: in certain scenarios a function can accept more than one - Polymorphism: in certain scenarios a function can accept more than one
type for the input parameters. For example a function that computes the lenght of a type for the input parameters. For example a function that computes the length of a
list doesn't need to inspect the type of the elements of the list and for this reason list doesn't need to inspect the type of the elements of the list and for this reason
a List.length function can accept lists of integers, lists of strings and in general a List.length function can accept lists of integers, lists of strings and in general
lists of any type. Such languages offer polymorphic functions through subtyping at lists of any type. Such languages offer polymorphic functions through subtyping at
@ -231,7 +234,7 @@ oriented language such as C, C++ and Java, such as:
programmer is not allowed to operate on data by ignoring or promoting its type. programmer is not allowed to operate on data by ignoring or promoting its type.
- Type Inference: the principal type of a well formed term can be inferred without any - Type Inference: the principal type of a well formed term can be inferred without any
annotation or declaration. annotation or declaration.
- Algebraic data types: types that are modelled by the use of two - Algebraic data types: types that are modeled by the use of two
algebraic operations, sum and product. algebraic operations, sum and product.
A sum type is a type that can hold of many different types of A sum type is a type that can hold of many different types of
objects, but only one at a time. For example the sum type defined objects, but only one at a time. For example the sum type defined
@ -248,14 +251,14 @@ although mutable statements and imperative constructs are permitted.
In addition to that features an object system, that provides In addition to that features an object system, that provides
inheritance, subtyping and dynamic binding, and modules, that inheritance, subtyping and dynamic binding, and modules, that
provide a way to encapsulate definitions. Modules are checked provide a way to encapsulate definitions. Modules are checked
statically and can be reificated through functors. statically and can be reifycated through functors.
** Lambda form compilation ** Lambda form compilation
\begin{comment} \begin{comment}
https://dev.realworld.org/compiler-backend.html https://dev.realworld.org/compiler-backend.html
\end{comment} \end{comment}
provides compilation in form of a byecode executable with an provides compilation in form of a bytecode executable with an
optionally embeddable interpreter and a native executable that could optionally embeddable interpreter and a native executable that could
be statically linked to provide a single file executable. be statically linked to provide a single file executable.
@ -263,7 +266,7 @@ After the typechecker has proven that the program is type safe,
the compiler lower the code to /Lambda/, an s-expression based the compiler lower the code to /Lambda/, an s-expression based
language that assumes that its input has already been proved safe. language that assumes that its input has already been proved safe.
On the /Lambda/ representation of the source program, the compiler On the /Lambda/ representation of the source program, the compiler
performes a series of optimization passes before translating the performs a series of optimization passes before translating the
lambda form to assembly code. lambda form to assembly code.
*** OCaml Native Datatypes *** OCaml Native Datatypes
@ -298,7 +301,7 @@ There are several numeric types:
- floats: that use IEEE754 double-precision (64-bit) arithmetic with - floats: that use IEEE754 double-precision (64-bit) arithmetic with
the addition of the literals /infinity/, /neg_infinity/ and /nan/. the addition of the literals /infinity/, /neg_infinity/ and /nan/.
The are varios numeric operations defined: The are various numeric operations defined:
- Arithmetic operations: +, -, *, /, % (modulo), neg (unary negation) - Arithmetic operations: +, -, *, /, % (modulo), neg (unary negation)
- Bitwise operations: &, |, ^, <<, >> (zero-shifting), a>> (sign extending) - Bitwise operations: &, |, ^, <<, >> (zero-shifting), a>> (sign extending)
@ -349,7 +352,7 @@ match color with
#+END_SRC #+END_SRC
provides tokens to express data destructoring. provides tokens to express data destructoring.
For example we can examine the content of a list with patten matching For example we can examine the content of a list with pattern matching
#+BEGIN_SRC #+BEGIN_SRC
@ -432,7 +435,7 @@ We distinguish
- Unreachable: statically it is known that no value can go there - Unreachable: statically it is known that no value can go there
- Failure: a value matching this part results in an error - Failure: a value matching this part results in an error
- Leaf: a value matching this part results into the evaluation of a - Leaf: a value matching this part results into the evaluation of a
source blackbox of code source black box of code
The algorithm doesn't support type-declaration-based analysis The algorithm doesn't support type-declaration-based analysis
to know the list of constructors at a given type. to know the list of constructors at a given type.
@ -758,7 +761,7 @@ following four rules:
\end{equation*} \end{equation*}
4) Mixture rule: 4) Mixture rule:
When none of the previous rules apply the clause matrix P → L is When none of the previous rules apply the clause matrix P → L is
splitted into two clause matrices, the first P₁ → L₁ that is the split into two clause matrices, the first P₁ → L₁ that is the
largest prefix matrix for which one of the three previous rules largest prefix matrix for which one of the three previous rules
apply, and P₂ → L₂ containing the remaining rows. The algorithm is apply, and P₂ → L₂ containing the remaining rows. The algorithm is
applied to both matrices. applied to both matrices.