diff --git a/tesi/.gitignore b/tesi/.gitignore index 5ff4fc3..2ea43bc 100644 --- a/tesi/.gitignore +++ b/tesi/.gitignore @@ -1,6 +1,9 @@ tesi.tex .#tesi.tex +tesi_unicode.tex + + ## Core latex/pdflatex auxiliary files: *.aux *.lof diff --git a/tesi/tesi_unicode.org b/tesi/tesi_unicode.org index 168583b..b80f603 100644 --- a/tesi/tesi_unicode.org +++ b/tesi/tesi_unicode.org @@ -49,8 +49,13 @@ TODO: talk about compiler stuff This dissertation presents an algorithm for the translation validation of the OCaml pattern matching compiler. Given a source program and its compiled version the -algorithm checks wheter the two are equivalent or produce a counter +algorithm checks whether the two are equivalent or produce a counter example in case of a mismatch. +For the prototype of this algorithm we have chosen a subset of the OCaml +language and implemented a prototype equivalence checker along with a +formal statement of correctness and its proof. +The prototype is to be included in the OCaml compiler infrastructure +and will aid the development. Our equivalence algorithm works with decision trees. Source patterns are converted into a decision tree using a matrix decomposition algorithm. @@ -58,7 +63,9 @@ Target programs, described in the Lambda intermediate representation language of the OCaml compiler, are turned into decision trees by applying symbolic execution. +\begin{comment} \subsection{Translation validation} +\end{comment} A pattern matching compiler turns a series of pattern matching clauses into simple control flow structures such as \texttt{if, switch}, for example: \begin{lstlisting} @@ -79,7 +86,9 @@ into simple control flow structures such as \texttt{if, switch}, for example: (makeblock 0 1 (makeblock 0 y))))) [0: 0 0a]) \end{lstlisting} +\begin{comment} %% TODO: side by side +\end{comment} The code in the right is in the Lambda intermediate representation of the OCaml compiler. The Lambda representation of a program is shown by calling the \texttt{ocamlc} compiler with \texttt{-drawlambda} flag. @@ -92,23 +101,23 @@ corner cases of complex patterns which are typically not in the compiler test suite. The OCaml core developers group considered evolving the pattern matching compiler, either by -using a new algorithm or by incremental refactorings of its codebase. +using a new algorithm or by incremental refactoring of its code base. For this reason we want to verify that new implementations of the compiler avoid the introduction of new bugs and that such -modifications don't result in a different behaviour than the current one. +modifications don't result in a different behavior than the current one. One possible approach is to formally verify the pattern matching compiler implementation using a machine checked proof. Another possibility, albeit with a weaker result, is to verify that each source program and target program pair are semantically correct. We chose the latter technique, translation validation because is easier to adopt in -the case of a production compiler and to integrate with an existing codebase. The compiler is treated as a -blackbox and proof only depends on our equivalence algorithm. +the case of a production compiler and to integrate with an existing code base. The compiler is treated as a +black-box and proof only depends on our equivalence algorithm. \subsection{Our approach} %% replace common TODO Our algorithm translates both source and target programs into a common -representation, decision trees. Decision trees where choosen because +representation, decision trees. Decision trees where chosen because they model the space of possible values at a given branch of execution. Here is the decision tree for the source example program. @@ -144,12 +153,6 @@ subtree $C_i$. For every child $(\pi_i, C_i)$ we reduce $T$ by killing all the branches that are incompatible with $\pi_i$ and check that the reduced tree is equivalent to $C_i$. -For the prototype we have choosen a simple subset of the OCaml -language and implemented a prototype equivalence checker along with a -formal statement of correctness and proof sketches. -The prototype is to be included in the OCaml compiler infrastructure -and will aid the development. - \subsection{From source programs to decision trees} Our source language supports integers, lists, tuples and all algebraic datatypes. Patterns support wildcards, constructors and literals, or @@ -186,7 +189,7 @@ same result as running it against the decision tree. \subsection{From target programs to decision trees} The target programs include the following Lambda constructs: \texttt{let, if, switch, Match\_failure, catch, exit, field} and -various comparation operations, guards. The symbolic execution engine +various comparison operations, guards. The symbolic execution engine traverses the target program and builds an environment that maps variables to accessors. It branches at every control flow statement and emits a Switch node. The branch condition $\pi_i$ is expressed as @@ -214,7 +217,7 @@ The main features of ML languages are the use of the Hindley-Milner type system provides many advantages with respect to static type systems of traditional imperative and object oriented language such as C, C++ and Java, such as: - Polymorphism: in certain scenarios a function can accept more than one - type for the input parameters. For example a function that computes the lenght of a + type for the input parameters. For example a function that computes the length of a list doesn't need to inspect the type of the elements of the list and for this reason a List.length function can accept lists of integers, lists of strings and in general lists of any type. Such languages offer polymorphic functions through subtyping at @@ -231,7 +234,7 @@ oriented language such as C, C++ and Java, such as: programmer is not allowed to operate on data by ignoring or promoting its type. - Type Inference: the principal type of a well formed term can be inferred without any annotation or declaration. - - Algebraic data types: types that are modelled by the use of two + - Algebraic data types: types that are modeled by the use of two algebraic operations, sum and product. A sum type is a type that can hold of many different types of objects, but only one at a time. For example the sum type defined @@ -248,14 +251,14 @@ although mutable statements and imperative constructs are permitted. In addition to that features an object system, that provides inheritance, subtyping and dynamic binding, and modules, that provide a way to encapsulate definitions. Modules are checked -statically and can be reificated through functors. +statically and can be reifycated through functors. ** Lambda form compilation \begin{comment} https://dev.realworld.org/compiler-backend.html \end{comment} - provides compilation in form of a byecode executable with an + provides compilation in form of a bytecode executable with an optionally embeddable interpreter and a native executable that could be statically linked to provide a single file executable. @@ -263,7 +266,7 @@ After the typechecker has proven that the program is type safe, the compiler lower the code to /Lambda/, an s-expression based language that assumes that its input has already been proved safe. On the /Lambda/ representation of the source program, the compiler -performes a series of optimization passes before translating the +performs a series of optimization passes before translating the lambda form to assembly code. *** OCaml Native Datatypes @@ -298,7 +301,7 @@ There are several numeric types: - floats: that use IEEE754 double-precision (64-bit) arithmetic with the addition of the literals /infinity/, /neg_infinity/ and /nan/. -The are varios numeric operations defined: +The are various numeric operations defined: - Arithmetic operations: +, -, *, /, % (modulo), neg (unary negation) - Bitwise operations: &, |, ^, <<, >> (zero-shifting), a>> (sign extending) @@ -349,7 +352,7 @@ match color with #+END_SRC provides tokens to express data destructoring. -For example we can examine the content of a list with patten matching +For example we can examine the content of a list with pattern matching #+BEGIN_SRC @@ -432,7 +435,7 @@ We distinguish - Unreachable: statically it is known that no value can go there - Failure: a value matching this part results in an error - Leaf: a value matching this part results into the evaluation of a - source blackbox of code + source black box of code The algorithm doesn't support type-declaration-based analysis to know the list of constructors at a given type. @@ -758,7 +761,7 @@ following four rules: \end{equation*} 4) Mixture rule: When none of the previous rules apply the clause matrix P → L is - splitted into two clause matrices, the first P₁ → L₁ that is the + split into two clause matrices, the first P₁ → L₁ that is the largest prefix matrix for which one of the three previous rules apply, and P₂ → L₂ containing the remaining rows. The algorithm is applied to both matrices.