UniTO/tesi/tesi_unicode.org

\begin{comment}
* TODO Scaletta [1/6]
  - [X] Introduction
  - [-] Background [60%]
    - [X] Low level representation
    - [X] Lambda code [0%]
    - [X] Pattern matching 
    - [ ] Symbolic execution
    - [ ] Translation Validation
  - [ ] Translation validation of the Pattern Matching Compiler
    - [ ] Source translation
      - [ ] Formal Grammar
      - [ ] Compilation of source patterns
      - [ ] Rest?
    - [ ] Target translation
      - [ ] Formal Grammar
      - [ ] Symbolic execution of the Lambda target
    - [ ] Equivalence between source and target
  - [ ] Statement of correctness
  - [ ] Proof of correctness
  - [ ] Practical results 

Magari prima pattern matching poi compilatore?
  
\end{comment}

#+TITLE:   Translation Verification of the  pattern matching compiler
#+AUTHOR:   Francesco Mecca
#+EMAIL:    me@francescomecca.eu
#+DATE:

#+LANGUAGE: en
#+LaTeX_CLASS: article
#+LaTeX_HEADER: \usepackage{algorithm} 
#+LaTeX_HEADER: \usepackage{comment} 
#+LaTeX_HEADER: \usepackage{algpseudocode} 
#+LaTeX_HEADER: \usepackage{amsmath,amssymb,amsthm}
#+LaTeX_HEADER: \newtheorem{definition}{Definition}
#+LaTeX_HEADER: \usepackage{mathpartir}
#+LaTeX_HEADER: \usepackage{graphicx} 
#+LaTeX_HEADER: \usepackage{listings} 
#+LaTeX_HEADER: \usepackage{color} 
#+LaTeX_HEADER: \usepackage{stmaryrd}
#+LaTeX_HEADER: \newcommand{\sem}[1]{{\llbracket{#1}\rrbracket}}
#+LaTeX_HEADER: \newcommand{\Equiv}[3]{\mathsf{equiv}(#1, #2, #3)} % \equiv is already taken
#+LaTeX_HEADER: \newcommand{\covers}[2]{#1 \mathrel{\mathsf{covers}} #2}
#+LaTeX_HEADER: \newcommand{\Yes}{\mathsf{Yes}}
#+LaTeX_HEADER: \newcommand{\No}[2]{\mathsf{No}(#1, #2)}

#+EXPORT_SELECT_TAGS: export
#+EXPORT_EXCLUDE_TAGS: noexport
#+OPTIONS: H:2 toc:nil \n:nil @:t ::t |:t ^:{} _:{} *:t TeX:t LaTeX:t
#+STARTUP: showall
\section{Introduction}

This dissertation presents an algorithm for the translation validation of the OCaml pattern
matching compiler. Given a source program and its compiled version the
algorithm checks whether the two are equivalent or produce a counter
example in case of a mismatch.
For the prototype of this algorithm we have chosen a subset of the OCaml
language and implemented a prototype equivalence checker along with a
formal statement of correctness and its proof.
The prototype is to be included in the OCaml compiler infrastructure
and will aid the development. 

Our equivalence algorithm works with decision trees. Source patterns are
converted into a decision tree using a matrix decomposition algorithm.
Target programs, described in the Lambda intermediate
representation language of the OCaml compiler, are turned into decision trees
by applying symbolic execution.

\begin{comment}
\subsection{Translation validation}
\end{comment}
A pattern matching compiler turns a series of pattern matching clauses
into simple control flow structures such as \texttt{if, switch}, for example:
\begin{lstlisting}
  match x with
  | [] -> (0, None)
  | x::[] -> (1, Some x)
  | _::y::_ -> (2, Some y)
\end{lstlisting}
\begin{lstlisting}
(if scrutinee
    (let (field_1 =a (field 1 scrutinee))
        (if field_1
            (let
                (field_1_1 =a (field 1 field_1)
                 x =a (field 0 field_1))
                (makeblock 0 2 (makeblock 0 x)))
            (let (y =a (field 0 scrutinee))
                (makeblock 0 1 (makeblock 0 y)))))
    [0: 0 0a])
\end{lstlisting}
\begin{comment}
%% TODO: side by side
\end{comment}
The code on the right is in the Lambda intermediate representation of
the OCaml compiler. The Lambda representation of a program is shown by
calling the \texttt{ocamlc} compiler with \texttt{-drawlambda} flag.

The OCaml pattern matching compiler is a critical part of the OCaml compiler
in terms of correctness because bugs typically would result in wrong code
production rather than triggering compilation failures.
Such bugs also are hard to catch by testing because they arise in
corner cases of complex patterns which are typically not in the
compiler test suite or most user programs.

The OCaml core developers group considered evolving the pattern matching compiler, either by
using a new algorithm or by incremental refactoring of its code base.
For this reason we want to verify that new implementations of the
compiler avoid the introduction of new bugs and that such
modifications don't result in a different behavior than the current one. 

One possible approach is to formally verify the pattern matching compiler
implementation using a machine checked proof.
Another possibility, albeit with a weaker result, is to verify that
each source program and target program pair are semantically correct.
We chose the latter technique, translation validation because is easier to adopt in
the case of a production compiler and to integrate with an existing code base. The compiler is treated as a
black-box and proof only depends on our equivalence algorithm.

\subsection{Our approach}
%% replace common TODO
Our algorithm translates both source and target programs into a common
representation, decision trees. Decision trees where chosen because
they model the space of possible values at a given branch of
execution.
Here are the decision trees for the source and target example program.

\begin{minipage}{0.5\linewidth}
\begin{verbatim}
       Switch(Root)
       /        \
     (= [])    (= ::)
     /             \
   Leaf         Switch(Root.1)
(0, None)       /         \
             (= [])      (= ::)
             /               \
          Leaf              Leaf
   [x = Root.0]         [y = Root.1.0]
   (1, Some x)          (2, Some y)
\end{verbatim}
\end{minipage}
\hfill
\begin{minipage}{0.5\linewidth}
\begin{verbatim}
       Switch(Root)
       /        \
     (= int 0)  (!= int 0)
     /             \
   Leaf         Switch(Root.1)
(makeblock 0     /       \
  0 0a)         /         \
             (= int 0)    (!= int 0)
             /               \
          Leaf              Leaf
[x = Root.0]            [y = Root.1.0]
(makeblock 0            (makeblock 0
  1 (makeblock 0 x))      2 (makeblock 0 y))
\end{verbatim}
\end{minipage}

\texttt{(Root.0)} is called an \emph{accessor}, that represents the
access path to a value that can be reached by deconstructing the
scrutinee. In this example \texttt{Root.0} is the first subvalue of
the scrutinee.

Target decision trees have a similar shape but the tests on the
branches are related to the low level representation of values in
Lambda code. For example, cons cells \texttt{x::xs} or tuples
\texttt{(x,y)} are blocks with tag 0.

To check the equivalence of a source and a target decision tree,
we proceed by case analysis.
If we have two terminals, such as leaves in the previous example,
we check that the two right-hand-sides are equivalent.
If we have a node $N$ and another tree $T$ we check equivalence for
each child of $N$, which is a pair of a branch condition $\pi_i$ and a
subtree $C_i$. For every child $(\pi_i, C_i)$ we reduce $T$ by killing all
the branches that are incompatible with $\pi_i$ and check that the
reduced tree is equivalent to $C_i$.

\subsection{From source programs to decision trees}
Our source language supports integers, lists, tuples and all algebraic
datatypes. Patterns support wildcards, constructors and literals,
or-patterns $(p_1 | p_2)$ and pattern variables.  We also support
\texttt{when} guards, which are interesting as they introduce the
evaluation of expressions during matching.  Decision trees have nodes
of the form:
\begin{lstlisting}
 type decision_tree =
  | Unreachable
  | Failure
  | Leaf of source_expr
  | Guard of source_expr * decision_tree * decision_tree
  | Switch of accessor * (constructor * decision_tree) list * decision_tree
\end{lstlisting}
In the \texttt{Switch} node we have one subtree for every head constructor
that appears in the pattern matching clauses and a fallback case for
other values. The branch condition $\pi_i$ expresses that the value at the
switch accessor starts with the given constructor.
\texttt{Failure} nodes express match failures for values that are not
matched by the source clauses.
\texttt{Unreachable} is used when we statically know that no value
can flow to that subtree.

We write $\sem{t_S}_S$ for the decision tree of the source program
$t_S$, computed by a matrix decomposition algorithm (each column
decomposition step gives a \texttt{Switch} node).
It satisfies the following correctness statement:
\[
\forall t_S, \forall v_S, \quad t_S(v_S) = \sem{t_S}_S(v_S)
\]
Running any source value $v_S$ against the source program gives the
same result as running it against the decision tree.

\subsection{From target programs to decision trees}
The target programs include the following Lambda constructs:
\texttt{let, if, switch, Match\_failure, catch, exit, field} and
various comparison operations, guards. The symbolic execution engine
traverses the target program and builds an environment that maps
variables to accessors. It branches at every control flow statement
and emits a \texttt{Switch} node. The branch condition $\pi_i$ is expressed as
an interval set of possible values at that point.
In comparison with the source decision trees, \texttt{Unreachable}
nodes are never emitted.
Guards result in branching. In comparison with the source decision
trees, \texttt{Unreachable} nodes are never emitted.

We write $\sem{t_T}_T$ for the decision tree of the target program
$t_T$, satisfying the following correctness statement:
\[
\forall t_T, \forall v_T, \quad t_T(v_T) = \sem{t_T}_T(v_T)
\]

\subsection{Equivalence checking}
The equivalence checking algorithm takes as input a domain of
possible values \emph{S} and a pair of source and target decision trees and
in case the two trees are not equivalent it returns a counter example.
The algorithm respects the following correctness statement:
\begin{comment}
% TODO: we have to define what \covers mean for readers to understand the specifications
% (or we use a simplifying lie by hiding \covers in the statements).
\end{comment}

\begin{align*}
 \Equiv S {C_S} {C_T} = \Yes \;\land\; \covers {C_T} S
 & \implies
 \forall v_S \approx v_T \in S,\; C_S(v_S) = C_T(v_T)
\\
 \Equiv S {C_S} {C_T} = \No {v_S} {v_T} \;\land\; \covers {C_T} S
 & \implies
 v_S \approx v_T \in S \;\land\; C_S(v_S) \neq C_T(v_T)
\end{align*}

The algorithm proceeds by case analysis. Inference rules are shown.
If $S$ is empty the results is $\Yes$.

\begin{verbatim}
------------------------
equiv \emptyset Cs Ct gs
\end{verbatim}

If the two decision trees are both terminal nodes the algorithm checks
for content equality.
\begin{verbatim}
--------------------------
equiv S Failure Failure []

equiv_BB BBs BBt
-------------------------------
equiv S (Leaf BBs) (Leaf BBt) []
\end{verbatim}

If the source decision tree (left hand side) is a terminal while the
target decistion tree (right hand side) is not, the algorithm proceeds
by \emph{explosion} of the right hand side. Explosion means that every
child of the right hand side is tested for equality against the left
hand side.

\begin{verbatim}
(equiv S Cs Ci gs)^i
equiv S Cs Cf gs
-----------------------------------------
equiv S Cs (Node(a, (Domi,Ci)^i, Cf)) gs
\end{verbatim}


\begin{comment}
% TODO: [Gabriel] in practice the $dom_S$ are constructors and the
% $dom_T$ are domains. Do we want to hide this fact, or be explicit
% about it? (Maybe we should introduce explicitly/clearly the idea of
% target domains at some point).
\end{comment}

When the left hand side is not a terminal, the algorithm explodes the
left hand side while trimming every right hand side subtree. Trimming
a left hand side tree on an interval set $dom_S$ computed from the right hand
side tree constructor means mapping every branch condition $dom_T$ (interval set of
possible values) on the left to the intersection of $dom_T$ and $dom_S$ when
the accessors on both side are equal, and removing the branches that
result in an empty intersection. If the accessors are different,
\emph{$dom_T$} is left unchanged.

\begin{verbatim}
equiv S Ci (trim Ct a=Ki) gs
equiv S Cf (trim Ct (a\notin(K_i)^i) gs
-------------------------------------
equiv S (Node(a, (Ki,Ci)^i, Cf) Ct gs
\end{verbatim}

The equivalence checking algorithm deals with guards by storing a
queue. A guard blackbox is pushed to the queue whenever the algorithm
encounters a Guard node on the right, while it pops a blackbox from
the queue whenever a Guard node appears on the left hand side.
The algorithm stops with failure if the popped blackbox and the and
blackbox on the left hand Guard node are different, otherwise in
continues by exploding to two subtrees, one in which the guard
condition evaluates to true, the other when it evaluates to false.
Termination of the algorithm is successful only when the guards queue
is empty.
\begin{verbatim}
equiv S Ctrue Ct (gs++[condition])
equiv S Cfalse Ct (gs++[condition])
--------------------------------------------
equiv S (Guard condition Ctrue Cfalse) Ct gs

equiv S Cs Ctrue gs
equiv S Cs Cfalse gs
--------------------------------------------
equiv S Cs (Guard condition Ctrue Cfalse) ([condition]++gs)
\end{verbatim}

\begin{comment}
TODO: replace inference rules with good latex
\end{comment}

* Background
  
** OCaml
Objective Caml (OCaml) is a dialect of the ML (Meta-Language)
family of programming that features with other dialects of ML, such
as SML and Caml Light.
The main features of ML languages are the use of the Hindley-Milner type system that
provides many advantages with respect to static type systems of traditional imperative and object
oriented language such as C, C++ and Java, such as:
    - Polymorphism: in certain scenarios a function can accept more than one
      type for the input parameters. For example a function that computes the length of a
      list doesn't need to inspect the type of the elements of the list and for this reason
      a List.length function can accept lists of integers, lists of strings and in general
      lists of any type. Such languages offer polymorphic functions through subtyping at
      runtime only, while other languages such as C++ offer polymorphism through compile
      time templates and function overloading.
      With the Hindley-Milner type system each well typed function can have more than one
      type but always has a unique best type, called the /principal type/.
      For example the principal type of the List.length function is "For any /a/, function from
      list of /a/ to /int/" and /a/ is called the /type parameter/.
    - Strong typing: Languages such as C and C++ allow the programmer to operate on data
      without considering its type, mainly through pointers. Other languages such as C#
      and Go allow type erasure so at runtime the type of the data can't be queried.
      In the case of programming languages using an Hindley-Milner type system the
      programmer is not allowed to operate on data by ignoring or promoting its type.
    - Type Inference: the principal type of a well formed term can be inferred without any
      annotation or declaration.
    - Algebraic data types: types that are modeled by the use of two
      algebraic operations, sum and product.
      A sum type is a type that can hold of many different types of
      objects, but only one at a time. For example the sum type defined
      as /A + B/ can hold at any moment a value of type A or a value of
      type B. Sum types are also called tagged union or variants.
      A product type is a type constructed as a direct product
      of multiple types and contains at any moment one instance for
      every type of its operands. Product types are also called tuples
      or records. Algebraic data types can be recursive
      in their definition and can be combined.
Moreover ML languages are functional, meaning that functions are
treated as first class citizens and variables are immutable,
although mutable statements and imperative constructs are permitted.
In addition to that  features an object system, that provides
inheritance, subtyping and dynamic binding, and modules, that
provide a way to encapsulate definitions. Modules are checked
statically and can be reifycated through functors.

** Compiling OCaml code

The OCaml compiler provides compilation of source files in form of a bytecode executable with an
optionally embeddable interpreter or as a native executable that could
be statically linked to provide a single file executable.
Every source file is treated as a separate /compilation unit/ that is
advanced through different states.
The first stage of compilation is the parsing of the input code that
is trasformed into an untyped syntax tree. Code with syntax errors is
rejected at this stage.
After that the AST is processed by the type checker that performs
three steps at once:
- type inference, using the classical /Algorithm W/
- perform subtyping and gathers type information from the module system
- ensures that the code obeys the rule of the OCaml type system
At this stage, incorrectly typed code is rejected. In case of success,
the untyped AST in trasformed into a /Typed Tree/.
After the  typechecker has proven that the program is type safe,
the  compiler lower the code to /Lambda/, an s-expression based
language that assumes that its input has already been proved safe.
After the  Lambda pass, the Lambda code is either translated into
bytecode or goes through a series of optimization steps performed by
the /Flambda/ optimizer before being translated into assembly.
\begin{comment}
TODO: Talk about flambda passes?
\end{comment}

This is an overview of the different compiler steps.
[[./files/ocamlcompilation.png]]

** Memory representation of OCaml values
An usual OCaml source program contains few to none explicit type
signatures.
This is possible because of type inference that allows to annotate the
AST with type informations. However, since the OCaml typechecker guarantes that a program is well typed
before being transformed into Lambda code, values at runtime contains
only a minimal subset of type informations needed to distinguish
polymorphic values.
For runtime values, OCaml uses a uniform memory representation in
which every variable is stored as a value in a contiguous block of
memory.
Every value is a single word that is either a concrete integer or a
pointer to another block of memory, that is called /cell/ or /box/.
We can abstract the type of OCaml runtime values as the following:
#+BEGIN_SRC 
type t = Constant | Cell of int * t 
#+END_SRC
where a one bit tag is used to distinguish between Constant or Cell.
In particular this bit of metadata is stored as the lowest bit of a
memory block.

Given that all the OCaml target architectures guarantee that all
pointers are divisible by four and that means that two lowest bits are
always 00 storing this bit of metadata at the lowest bit allows an
optimization. Constant values in OCaml, such as integers, empty lists,
Unit values and constructors of arity zero (/constant/ constructors)
are unboxed at runtime while pointers are recognized by the lowest bit
set to 0.


** Lambda form compilation
\begin{comment}
https://dev.realworld.org/compiler-backend.html
CITE: realworldocaml
\end{comment}
A Lambda code target file is produced by the compiler and consists of a
single s-expression. Every s-expression consist of /(/, a sequence of
elements separated by a whitespace and a closing /)/.
Elements of s-expressions are:
- Atoms: sequences of ascii letters, digits or symbols
- Variables
- Strings: enclosed in double quotes and possibly escaped
- S-expressions: allowing arbitrary nesting

The Lambda form is a key stage where the compiler discards type
informations and maps the original source code to the runtime memory
model described.
In this stage of the compiler pipeline pattern match statements are
analyzed and compiled into an automata.
\begin{comment}
evidenzia centralita` rispetto alla tesi
\end{comment}
#+BEGIN_SRC 
type t = | Foo | Bar | Baz | Fred

let test = function
  | Foo -> "foo"
  | Bar -> "bar"
  | Baz -> "baz"
  | Fred -> "fred"
#+END_SRC
The Lambda output for this code can be obtained by running the
compiler with the /-dlambda/ flag:
#+BEGIN_SRC 
(setglobal Prova!
  (let
    (test/85 =
       (function param/86
         (switch* param/86
          case int 0: "foo"
          case int 1: "bar"
          case int 2: "baz"
          case int 3: "fred")))
    (makeblock 0 test/85)))
#+END_SRC
As outlined by the example, the /makeblock/ directive is responsible
for allocating low level OCaml values and every constant constructor
ot the algebraic type /t/ is stored in memory as an integer.
The /setglobal/ directives declares a new binding in the global scope:
Every concept of modules is lost at this stage of compilation.
The pattern matching compiler uses a jump table to map every pattern
matching clauses to its target expression. Values are addressed by a
unique name.
#+BEGIN_SRC 
type t = | English of p | French of q
type p = | Foo | Bar
type q = | Tata| Titi
type t = | English of p | French of q

let test = function
  | English Foo -> "foo"
  | English Bar -> "bar"
  | French Tata -> "baz"
  | French Titi -> "fred"
#+END_SRC
In the case of types with a smaller number of variants, the pattern
matching compiler may avoid the overhead of computing a jump table.
This example also highlights the fact that non constant constructor
are mapped to cons cell that are accessed using the /tag/ directive.
#+BEGIN_SRC 
(setglobal Prova!
  (let
    (test/89 =
       (function param/90
         (switch* param/90
          case tag 0: (if (!= (field 0 param/90) 0) "bar" "foo")
          case tag 1: (if (!= (field 0 param/90) 0) "fred" "baz"))))
    (makeblock 0 test/89)))
#+END_SRC
In the Lambda language are several numeric types:
- integers: that us either 31 or 63 bit two's complement arithmetic
  depending on system word size, and also wrapping on overflow
- 32 bit and 64 bit integers: that use 32-bit and 64-bit two's complement arithmetic
  with wrap on overflow
- big integers: offer integers with arbitrary precision
- floats: that use IEEE754 double-precision (64-bit) arithmetic with
  the addition of the literals /infinity/, /neg_infinity/ and /nan/.

The are various numeric operations defined:

- Arithmetic operations: +, -, *, /, % (modulo), neg (unary negation)
- Bitwise operations: &, |, ^, <<, >> (zero-shifting), a>> (sign extending)
- Numeric comparisons: <, >, <=, >=, ==

*** Functions

Functions are defined using the following syntax, and close over all
bindings in scope: (lambda (arg1 arg2 arg3) BODY)
and are applied using the following syntax: (apply FUNC ARG ARG ARG)
Evaluation is eager.

*** Other atoms
The atom /let/ introduces a sequence of bindings at a smaller scope
than the global one:
(let BINDING BINDING BINDING ... BODY)

The Lambda form supports many other directives such as /strinswitch/
that is constructs aspecialized jump tables for string, integer range
comparisons and so on.
These construct are explicitely undocumented because the Lambda code
intermediate language can change across compiler releases.
\begin{comment}
Spiega che la sintassi che supporti e` quella nella BNF
\end{comment}


** Pattern matching

Pattern matching is a widely adopted mechanism to interact with ADT.
C family languages provide branching on predicates through the use of
if statements and switch statements.
Pattern matching on the other hands express predicates through
syntactic templates that also allow to bind on data structures of
arbitrary shapes. One common example of pattern matching is the use of regular
expressions on strings.  provides pattern matching on ADT and
primitive data types.
The result of a pattern matching operation is always one of:
- this value does not match this pattern”
- this value matches this pattern, resulting the following bindings of
  names to values and the jump to the expression pointed at the
  pattern.

#+BEGIN_SRC 
type color = | Red | Blue | Green | Black | White

match color with
| Red -> print "red"
| Blue -> print "red"
| Green -> print "red"
| _ -> print "white or black"
#+END_SRC

 provides tokens to express data destructoring.
For example we can examine the content of a list with pattern matching

#+BEGIN_SRC 

begin match list with
| [ ] -> print "empty list"
| element1 :: [ ] -> print "one element"
| (element1 :: element2) :: [ ] -> print "two elements"
| head :: tail-> print "head followed by many elements"
#+END_SRC

Parenthesized patterns, such as the third one in the previous example,
matches the same value as the pattern without parenthesis.

The same could be done with tuples
#+BEGIN_SRC 

begin match tuple with
| (Some _, Some _) -> print "Pair of optional types"
| (Some _, None) | (None, Some _) -> print "Pair of optional types, one of which is null"
| (None, None) -> print "Pair of optional types, both null"
#+END_SRC

The pattern pattern₁ |  pattern₂ represents the logical "or" of the
two patterns pattern₁ and pattern₂. A value matches pattern₁ |
pattern₂ if it matches pattern₁ or pattern₂.

Pattern clauses can make the use of /guards/ to test predicates and
variables can captured (binded in scope).

#+BEGIN_SRC 

begin match token_list with
| "switch"::var::"{"::rest -> ...
| "case"::":"::var::rest when is_int var -> ...
| "case"::":"::var::rest when is_string var -> ...
| "}"::[ ] -> ...
| "}"::rest -> error "syntax error: " rest

#+END_SRC

Moreover, the  pattern matching compiler emits a warning when a
pattern is not exhaustive or some patterns are shadowed by precedent ones.

** Symbolic execution

Symbolic execution is a widely used techniques in the field of
computer security.
It allows to analyze different execution paths of a program
simultanously while tracking which inputs trigger the execution of
different parts of the program.
Inputs are modelled symbolically rather than taking "concrete" values.
A symbolic execution engine keeps track of expressions and variables
in terms of these symbolic symbols and attaches logical constraints to every
branch that is being followed.
Symbolic execution engines are used to track bugs by modelling the
domain of all possible inputs of a program, detecting infeasible
paths, dead code and proving that two code segments are equivalent.

Let's take as example this signedness bug that was found in the
FreeBSD kernel and allowed, when calling the getpeername function, to
read portions of kernel memory.
#+BEGIN_SRC
int compat;
{
    struct file *fp;
    register struct socket *so;
    struct sockaddr *sa;
    int len, error;

    ...

    len = MIN(len, sa->sa_len);    /* [1] */
    error = copyout(sa, (caddr_t)uap->asa, (u_int)len);
    if (error)
        goto bad;

    ...

bad:
    if (sa)
        FREE(sa, M_SONAME);
    fdrop(fp, p);
    return (error);
}
#+END_SRC

The tree of the execution when the function is evaluated considering
/int len/ our symbolic variable α, sa->sa_len as symbolic variable β
and π as the set of constraints on a symbolic variable:

#+BEGIN_SRC
[1]              compat (...)        { π_{α}: -∞ < α < ∞ }
                   |
[2]              min (σ₁, σ₂)        { π_{σ}: -∞ < (σ₁,σ₂) < ∞ ; π_{α}: -∞ < α < β ; π_{β}: ...}
                   |
[3]             cast(u_int) (...)    { π_{σ}: 0 ≤ (σ) < ∞ ; π_{α}: -∞ < α < β ; π_{β}: ...}
                   |   
                  ... // rest of the execution
#+END_SRC
We can see that at step 3 the set of possible values of the scrutinee
α is bigger than the set of possible values of the input σ to the
/cast/ directive, that is: π_{α} ⊈ π_{σ}. For this reason the /cast/ may fail when α is /len/
negative number, outside the domain π_{σ}. In C this would trigger undefined behaviour (signed
overflow) that made the exploitation possible.

Every step of evaluation can be modelled as the following transition:
#+BEGIN_SRC
(π_{σ}, (πᵢ)ⁱ) → (π'_{σ}, (π'ᵢ)ⁱ)
#+END_SRC
if we express the π constraints as logical formulas we can model the
execution of the program in terms of Hoare Logic.
State of the computation is a Hoare triple {P}C{Q} where P and Q are
respectively the /precondition/ and the /postcondition/ that
constitute the assertions of the program. C is the directive being
executed.
The language of the assertions P is:
#+BEGIN_SRC
P ::= true | false | a < b | P₁ ∧ P₂ | P₁ ∨ P₂ | ~P
#+END_SRC
where a and b are numbers.
In the Hoare rules assertions could also take the form of
#+BEGIN_SRC
P ::= ∀i. P | ∃i. P | P₁ ⇒ P₂
#+END_SRC
where i is a logical variable, but assertions of these kinds increases
the complexity of the symbolic engine.
Execution follows the rules of Hoare logic:
- Empty statement :
\begin{verbatim}
————————————
{P}/skip/{P}
\end{verbatim}
- Assignment statement : The truthness of P[a/x] is equivalent to the
  truth of {P} after the assignment.
\begin{verbatim}
————————————
{P[a/x]}x:=a{P}
\end{verbatim}

- Composition : c₁ and c₂ are directives that are executed in order;
  {Q} is the /midcondition/.
\begin{verbatim}
{P}c₁{R}, {R}c₂{Q}
——————————————————
   {P}c₁;c₂{Q}
\end{verbatim}

- Conditional : 
\begin{verbatim}
 {P∧b}c₁{Q}, {P∧~b}c₂{Q}
————————————————————————
{P}if b then c₁ else c₂{Q}
\end{verbatim}

- Loop : {P} is the loop invariant. After the loop is finished /P/
  holds and ~b caused the loop to end.
\begin{verbatim}
 {P∧b}c{P}
————————————————————————
{P}while b do c{P∧~b}
\end{verbatim}

Even if the semantics of symbolic execution engines are well defined,
the user may run into different complications when applying such
analysis to non trivial codebases.
For example, depending on the domain, loop termination is not
guaranteed. Even when termination is guaranteed, looping causes
exponential branching that may lead to path explosion or state
explosion.
Reasoning about all possible executions of a program is not always
feasible and in case of explosion usually symbolic execution engines
implement heuristics to reduce the size of the search space.

** Translation validation
Translators, such as translators and code generators, are huge pieces of
software usually consisting of multiple subsystem and
constructing an actual specification of a translator implementation for
formal validation is a very long task. Moreover, different
translators implement different algorithms, so the correctness proof of
a translator cannot be generalized and reused to prove another translator.
Translation validation is an alternative to the verification of
existing translators that consists of taking the source and the target
(compiled) program and proving /a posteriori/ their semantic equivalence.

- [ ] Techniques for translation validation
- [ ] What does semantically equivalent mean
- [ ] What happens when there is no semantic equivalence
- [ ] Translation validation through symbolic execution

* Translation validation of the Pattern Matching Compiler

** Source program
The algorithm takes as its input a source program and translates it
into an algebraic data structure called /decision_tree/.

#+BEGIN_SRC 
type decision_tree =
  | Unreachable
  | Failure
  | Leaf of source_expr
  | Guard of source_blackbox * decision_tree * decision_tree
  | Node of accessor * (constructor * decision_tree) list * decision_tree
#+END_SRC

Unreachable, Leaf of source_expr and Failure are the terminals of the three.
We distinguish
- Unreachable: statically it is known that no value can go there
- Failure: a value matching this part results in an error
- Leaf: a value matching this part results into the evaluation of a
  source black box of  code

The algorithm doesn't support type-declaration-based analysis 
to know the list of constructors at a given type.
Let's consider some trivial examples:

#+BEGIN_SRC 
function true -> 1
#+END_SRC

[ ] Converti a disegni 

Is translated to
|Node ([(true, Leaf 1)], Failure)
while
#+BEGIN_SRC 
function 
true -> 1 
| false -> 2
#+END_SRC
will give
|Node ([(true, Leaf 1); (false, Leaf 2)])

It is possible to produce Unreachable examples by using
 refutation clauses (a "dot" in the right-hand-side)
#+BEGIN_SRC 
function 
true -> 1 
| false -> 2 
| _ -> .
#+END_SRC
that gets translated into
Node ([(true, Leaf 1); (false, Leaf 2)], Unreachable)

We trust this annotation, which is reasonable as the  type-checker
verifies that it indeed holds.

Guard nodes of the tree are emitted whenever a guard is found. Guards
node contains a blackbox of  code that is never evaluated and two
branches, one that is taken in case the guard evaluates to true and
the other one that contains the path taken when the guard evaluates to
true.

[ ] Finisci con Node
[ ] Spiega del fallback
[ ] rivedi compilazione per tenere in considerazione il tuo albero invece che le Lambda
[ ] Specifica che stesso algoritmo usato per compilare a lambda, piu` optimizations

The source code of a pattern matching function has the
following form:

|match variable with
|\vert pattern₁ -> expr₁
|\vert pattern₂ when guard -> expr₂
|\vert pattern₃ as var -> expr₃ 
|⋮
|\vert pₙ -> exprₙ

and can include any expression that is legal for the OCaml compiler,
such as /when/ guards and assignments. Patterns could or could not
be exhaustive.

Pattern matching code could also be written using the more compact form:
|function
|\vert pattern₁ -> expr₁
|\vert pattern₂ when guard -> expr₂
|\vert pattern₃ as var -> expr₃ 
|⋮
|\vert pₙ -> exprₙ


This BNF grammar describes formally the grammar of the source program:

#+BEGIN_SRC bnf
start ::= "match" id "with" patterns | "function" patterns
patterns ::= (pattern0|pattern1) pattern1+
;; pattern0 and pattern1 are needed to distinguish the first case in which
;; we can avoid writing the optional vertical line
pattern0 ::= clause 
pattern1 ::= "|" clause
clause ::= lexpr "->" rexpr

lexpr ::= rule (ε|condition)
rexpr ::= _code ;; arbitrary code

rule ::= wildcard|variable|constructor_pattern|or_pattern ;;

;; rules
wildcard ::= "_"
variable ::= identifier
constructor_pattern ::= constructor (rule|ε) (assignment|ε)

constructor ::= int|float|char|string|bool
                |unit|record|exn|objects|ref
                |list|tuple|array
                |variant|parameterized_variant ;;  data types

or_pattern ::=  rule ("|" wildcard|variable|constructor_pattern)+

condition ::= "when" bexpr 
assignment ::= "as" id
bexpr ::= _code ;; arbitrary code
#+END_SRC

\begin{comment}
Check that it is still coherent to this bnf
\end{comment}

Patterns are of the form
| pattern         | type of pattern    |
|-----------------+---------------------|
| _               | wildcard            |
| x               | variable            |
| c(p₁,p₂,...,pₙ) | constructor pattern |
| (p₁\vert p₂)    | or-pattern          |

During compilation by the translators expressions are compiled into
Lambda code and are referred as lambda code actions lᵢ.

The entire pattern matching code is represented as a clause matrix
that associates rows of patterns (p_{i,1}, p_{i,2}, ..., p_{i,n}) to
lambda code action lⁱ
\begin{equation*}
(P → L) =
\begin{pmatrix}
p_{1,1} & p_{1,2} & \cdots & p_{1,n} & → l₁ \\
p_{2,1} & p_{2,2} & \cdots & p_{2,n} & →  l₂ \\
\vdots & \vdots & \ddots & \vdots & → \vdots \\
p_{m,1} & p_{m,2} & \cdots & p_{m,n} & → lₘ
\end{pmatrix}
\end{equation*}

The pattern /p/ matches a value /v/, written as p ≼ v, when one of the
following rules apply

|--------------------+---+--------------------+-------------------------------------------|
| _                  | ≼ | v                  | ∀v                                        |
| x                  | ≼ | v                  | ∀v                                        |
| (p₁ \vert\ p₂)     | ≼ | v                  | iff p₁ ≼ v or p₂ ≼ v                      |
| c(p₁, p₂, ..., pₐ) | ≼ | c(v₁, v₂, ..., vₐ) | iff (p₁, p₂, ..., pₐ) ≼ (v₁, v₂, ..., vₐ) |
| (p₁, p₂, ..., pₐ)  | ≼ | (v₁, v₂, ..., vₐ)  | iff pᵢ ≼ vᵢ ∀i ∈ [1..a]                   |
|--------------------+---+--------------------+-------------------------------------------|

When a value /v/ matches pattern /p/ we say that /v/ is an /instance/ of /p/.

Considering the pattern matrix P we say that the value vector
$\vec{v}$ = (v₁, v₂, ..., vᵢ) matches the line number i in P  if and only if the following two
conditions are satisfied:
- p_{i,1}, p_{i,2}, \cdots, p_{i,n}  ≼ (v₁, v₂, ..., vᵢ)
- ∀j < i p_{j,1}, p_{j,2}, \cdots, p_{j,n} ⋠ (v₁, v₂, ..., vᵢ) 

We can define the following three relations with respect to patterns:
- Patter p is less precise than pattern q, written p ≼ q, when all
  instances of q are instances of p
- Pattern p and q are equivalent, written p ≡ q, when their instances
  are the same
- Patterns p and q are compatible when they share a common instance

\subsubsection{Parsing of the source program}

The source program of the following general form is parsed using a parser
generated by Menhir, a LR(1) parser generator for the OCaml programming language.
Menhir compiles LR(1) a grammar specification, in this case the OCaml pattern matching
grammar, down to OCaml code. 

|match variable with
|\vert pattern₁ -> e₁
|\vert pattern₂ -> e₂
|⋮
|\vert pₘ -> eₘ

The result of parsing, when successful, results in a list of clauses
and a list of type declarations.
Every clause consists of three objects: a left-hand-side that is the
kind of pattern expressed, an option guard and a right-hand-side expression.
Patterns are encoded in the following way:
| pattern         | type        |
|-----------------+-------------|
| _               | Wildcard    |
| p₁ as x         | Assignment  |
| c(p₁,p₂,...,pₙ) | Constructor |
| (p₁\vert p₂)    | Orpat       |

Guards and right-hand-sides are treated as a blackbox of OCaml code.
A sound approach for treating these blackbox would be to inspect the
OCaml compiler during translation to Lambda code and extract the
blackboxes compiled in their Lambda representation.
This would allow to test for equality with the respective blackbox at
the target level.
Given that this level of introspection is currently not possibile, we
decided to restrict the structure of blackboxes to the following (valid) OCaml
code:

#+BEGIN_SRC 
external guard : 'a -> 'b = "guard"
external observe : 'a -> 'b = "observe"
#+END_SRC 

We assume these two external functions /guard/ and /observe/ with a valid
type that lets the user pass any number of arguments to them.
All the guards are of the form \texttt{guard <arg> <arg> <arg>}, where the
<arg> are expressed using the OCaml pattern matching language.
Similarly, all the right-hand-side expressions are of the form
\texttt{observe <arg> <arg> ...} with the same constraints on arguments.

#+BEGIN_SRC 
type t = Z | S of t

let _ = function
  | Z -> observe 0
  | S Z -> observe 1
  | S x when guard x -> observe 2
  | S (S x) as y when guard x y -> observe 3
  | S _ -> observe 4
#+END_SRC

Once parsed, the type declarations and the list of clauses are encoded in the form of a matrix
that is later evaluated using a matrix decomposition algorithm.

\subsubsection{Matrix decomposition of pattern clauses}

The initial input of the decomposition algorithm C consists of a vector of variables
$\vec{x}$ = (x₁, x₂, ..., xₙ)  of size /n/ where /n/ is the arity of
the type of /x/ and a clause matrix P → L of width n and height m.
That is:

\begin{equation*}
C((\vec{x} = (x₁, x₂, ..., xₙ),
\begin{pmatrix}
p_{1,1} & p_{1,2} & \cdots & p_{1,n} → l₁ \\
p_{2,1} & p_{2,2} & \cdots & p_{2,n} → l₂ \\
\vdots & \vdots & \ddots & \vdots →  \vdots \\
p_{m,1} & p_{m,2} & \cdots & p_{m,n} → lₘ)
\end{pmatrix}
\end{equation*}

The base case C₀ of the algorithm is the case in which the $\vec{x}$
is empty, that is $\vec{x}$ = (), then the result of the compilation
C₀ is l₁
\begin{equation*}
C₀((),
\begin{pmatrix}
→ l₁ \\
→ l₂ \\
→  \vdots \\
→ lₘ
\end{pmatrix})
) = l₁
\end{equation*}

When $\vec{x}$ ≠ () then the compilation advances using one of the
following four rules:

1) Variable rule: if all patterns of the first column of P are wildcard patterns or
   bind the value to a variable, then 

    \begin{equation*}
    C(\vec{x}, P → L) = C((x₂, x₃, ..., xₙ), P' → L')
    \end{equation*}
    where
    \begin{equation*}
    \begin{pmatrix}
    p_{1,2} & \cdots & p_{1,n} & → & (let & y₁ & x₁) & l₁ \\
    p_{2,2} & \cdots & p_{2,n} & → & (let & y₂ & x₁) & l₂ \\
    \vdots & \ddots & \vdots & →  & \vdots & \vdots & \vdots & \vdots \\
    p_{m,2} & \cdots & p_{m,n} & → & (let & yₘ & x₁) & lₘ
    \end{pmatrix}
    \end{equation*}

    That means in every lambda action lᵢ there is a binding of x₁ to the
    variable that appears on the pattern $p_{i,1}. Bindings are omitted
    for wildcard patterns and the lambda action lᵢ remains unchanged.

2) Constructor rule: if all patterns in the first column of P are
   constructors patterns of the form k(q₁, q₂, ..., qₙ) we define a
   new matrix, the specialized clause matrix S, by applying the
   following transformation on every row /p/:
    \begin{lstlisting}[mathescape,columns=fullflexible,basicstyle=\fontfamily{lmvtt}\selectfont,]
    for every c ∈ Set of constructors 
        for i ← 1 .. m 
            let kᵢ ← constructor_of($p_{i,1}$) 
            if kᵢ = c then 
                p ← $q_{i,1}$, $q_{i,2}$, ..., $q_{i,n'}$,  $p_{i,2}$, $p_{i,3}$, ..., $p_{i, n}$
    \end{lstlisting}
   Patterns of the form $q_{i,j}$ matches on the values of the
   constructor and we define new fresh variables y₁, y₂, ..., yₐ so
   that the lambda action lᵢ becomes

\begin{lstlisting}[mathescape,columns=fullflexible,basicstyle=\fontfamily{lmvtt}\selectfont,]
    (let (y₁ (field 0 x₁))
         (y₂ (field 1 x₁))
         ...
         (yₐ (field (a-1) x₁))
         lᵢ)
\end{lstlisting}

   and the result of the compilation for the set of constructors
   {c₁, c₂, ..., cₖ} is:

\begin{lstlisting}[mathescape,columns=fullflexible,basicstyle=\fontfamily{lmvtt}\selectfont,]
    switch x₁ with
    case c₁: l₁
    case c₂: l₂
    ...
    case cₖ: lₖ
    default: exit
\end{lstlisting}

3) Orpat rule: there are various strategies for dealing with
   or-patterns. The most naive one is to split the or-patterns.
   For example a row p containing an or-pattern:
   \begin{equation*}
   (p_{i,1}|q_{i,1}|r_{i,1}), p_{i,2}, ..., p_{i,m} → lᵢ
   \end{equation*}
   results in three rows added to the clause matrix
   \begin{equation*}
   p_{i,1}, p_{i,2}, ..., p_{i,m} → lᵢ \\
   \end{equation*}
   \begin{equation*}
   q_{i,1}, p_{i,2}, ..., p_{i,m} → lᵢ \\
   \end{equation*}
   \begin{equation*}
   r_{i,1}, p_{i,2}, ..., p_{i,m} → lᵢ 
   \end{equation*}
4) Mixture rule:
   When none of the previous rules apply the clause matrix P → L is
   split into two clause matrices, the first P₁ → L₁ that is the
   largest prefix matrix for which one of the three previous rules
   apply, and P₂ → L₂ containing the remaining rows. The algorithm is
   applied to both matrices.

* Correctness of the algorithm
Running a program tₛ or its translation 〚tₛ〛 against an input vₛ
produces as a result a result /r/ in the following way:
| ( 〚tₛ〛ₛ(vₛ) = Cₛ(vₛ) ) → r
| tₛ(vₛ) → r
Likewise
| ( 〚tₜ〛ₜ(vₜ) = Cₜ(vₜ) ) → r
| tₜ(vₜ) → r
where result r ::= guard list * (Match blackbox | NoMatch | Absurd)
and guard ::= blackbox.

Having defined equivalence between two inputs of which one is
expressed in the source language and the other in the target language
  vₛ ≃ vₜ    (TODO define, this talks about the representation of source values in the target)

we can define the equivalence between a couple of programs or a couple
of decision trees 
| tₛ ≃ tₜ := ∀vₛ≃vₜ, tₛ(vₛ) = tₜ(vₜ)
| Cₛ ≃ Cₜ := ∀vₛ≃vₜ, Cₛ(vₛ) = Cₜ(vₜ)

The proposed equivalence algorithm that works on a couple of
decision trees is returns either /Yes/ or /No(vₛ, vₜ)/ where vₛ and
vₜ are a couple of possible counter examples for which the constraint
trees produce a different result.

** Statements
Theorem. We say that a translation of a source program to a decision tree
is correct when for every possible input, the source program and its
respective decision tree produces the same result

| ∀vₛ, tₛ(vₛ) = 〚tₛ〛ₛ(vₛ)


Likewise, for the target language:

| ∀vₜ, tₜ(vₜ) = 〚tₜ〛ₜ(vₜ)

Definition: in the presence of guards we can say that two results are
equivalent modulo the guards queue, written /r₁ ≃gs r₂/, when:
| (gs₁, r₁) ≃gs (gs₂, r₂)  ⇔  (gs₁, r₁) = (gs₂ ++ gs, r₂)

Definition: we say that Cₜ covers the input space /S/, written
/covers(Cₜ, S) when every value vₛ∈S is a valid input to the
decision tree Cₜ. (TODO: rephrase)

Theorem: Given an input space /S/ and a couple of decision trees, where
the target decision tree Cₜ covers the input space /S/, we say that
the two decision trees are equivalent when:

| equiv(S, Cₛ, Cₜ, gs) = Yes ∧ covers(Cₜ, S) → ∀vₛ≃vₜ ∈ S, Cₛ(vₛ) ≃gs Cₜ(vₜ)

Similarly we say that a couple of decision trees in the presence of
an input space /S/ are /not/ equivalent when:

| equiv(S, Cₛ, Cₜ, gs) = No(vₛ,vₜ) ∧ covers(Cₜ, S) → vₛ≃vₜ ∈ S ∧ Cₛ(vₛ) ≠gs Cₜ(vₜ)

Corollary: For a full input space /S/, that is the universe of the
target program we say:

| equiv(S, 〚tₛ〛ₛ, 〚tₜ〛ₜ, ∅) = Yes  ⇔  tₛ ≃ tₜ


*** Proof of the correctness of the translation from source programs to source decision trees

We define a source term tₛ as a collection of patterns pointing to blackboxes
| tₛ ::= (p → bb)^{i∈I}

A pattern is defined as either a constructor pattern, an or pattern or
a constant pattern
| p ::= | K(pᵢ)ⁱ, i ∈ I | (p|q) | n ∈ ℕ

A decision tree is defined as either a Leaf, a Failure terminal or
an intermediate node with different children sharing the same accessor /a/
and an optional fallback.
Failure is emitted only when the patterns don't cover the whole set of
possible input values /S/. The fallback is not needed when the user
doesn't use a wildcard pattern.

| Cₛ ::= Leaf bb | Node(a, (Kᵢ → Cᵢ)^{i∈S} , C?)
| a ::= Here | n.a
| vₛ ::= K(vᵢ)^{i∈I} | n ∈ ℕ

\begin{comment}
Are K and Here clear here?
\end{comment}

We define the decomposition matrix /mₛ/ as
| SMatrix mₛ := (aⱼ)^{j∈J}, ((pᵢⱼ)^{j∈J} → bbᵢ)^{i∈I}
\begin{comment}
Correggi prendendo in considerazione l'accessor
\end{comment}

We define the decision tree of source programs
  [|tₛ|]
in terms of the decision tree of pattern matrices
  [|mₛ|]
by the following:
  〚((pᵢ → bbᵢ)^{i∈I}〛 := 〚(Root), (pᵢ → bbᵢ)^{i∈I} 〛 # i ∈ I

decision tree computed from pattern matrices respect the following invariant:
| ∀v (vᵢ)^{i∈I} = v(aᵢ)^{i∈I} → 〚m〛(v) = m(vᵢ)^{i∈I} for m = ((aᵢ)^{i∈I}, (rᵢ)^{i∈I})
where 
| v(Here) = v
| K(vᵢ)ⁱ(k.a) = vₖ(a) if k ∈ [0;n[
\begin{comment}
TODO: EXPLAIN
\end{comment}

We proceed to show the correctness of the invariant by a case
analysys.

Base cases:
1. [| ∅, (∅ → bbᵢ)ⁱ |] := Leaf bbᵢ where i := min(I), that is a
   decision tree [|ms|] defined by an empty accessor and empty
   patterns pointing to blackboxes /bbᵢ/. This respects the invariant
   because a decomposition matrix in the case of empty rows returns
   the first expression and we known that (Leaf bb)(v) := Match bb
2. [| (aⱼ)ʲ, ∅ |] := Failure 

Regarding non base cases:
We define
| let Idx(k) = [0; arity(k)[
| let First(∅) = ⊥
| let First((aⱼ)ʲ) = a_{min(j)} ## where j ∈ J ≠ ∅


** Proof of equivalence checking

The equivalence checking algorithm takes as parameters an input space
/S/, a source decision tree /Cₛ/ and a target decision tree /Cₜ/:
| equiv(S, Cₛ, Cₜ) → Yes | No(vₛ, vₜ)

When the algorithm returns Yes and the input space is covered by /Cₛ/
we can say that the couple of decision trees are the same for
every couple of source value /vₛ/ and target value /vₜ/ that are equivalent.
\begin{comment}
Define "covered"
Is it correct to say the same? How to correctly distinguish in words ≃ and = ?
\end{comment}
| equiv(S, Cₛ, Cₜ) = Yes and cover(Cₜ, S) → ∀ vₛ ≃ vₜ∈S ∧ Cₛ(vₛ) = Cₜ(vₜ)
In the case where the algorithm returns No we have at least a couple
of counter example values vₛ and vₜ for which the two decision trees
outputs a different result.
| equiv(S, Cₛ, Cₜ) = No(vₛ,vₜ) and cover(Cₜ, S) → ∀ vₛ ≃ vₜ∈S ∧ Cₛ(vₛ) ≠ Cₜ(vₜ)

We define  the following
| Forall(Yes) = Yes
| Forall(Yes::l) = Forall(l)
| Forall(No(vₛ,vₜ)::_) = No(vₛ,vₜ)
There exists and are injective:
|  int(k)∈ℕ (ar(k) = 0) 
|  tag(k)∈ℕ (ar(k) > 0) 
|  π(k) = {n|int(k) = n} x {n|tag{k} = n}
where k is a constructor.

\begin{comment}
TODO: explain:
∀v∈a→π, C_{/a→π}(v) = C(v)
\end{comment}

We proceed by case analysis:
1. equiv(∅, Cₛ, Cₜ) := Yes
\begin{comment}
Devo spiegarlo?
\end{comment}
In the other subcases S is always non-empty.
2. equiv(S, Failure, Failure) := Yes
   the statement holds because of equality between Failure nodes in
   the case of every possible value /v/.
3. The result of the subcase where we have a source decision tree
   /Cₛ/ that is either a Leaf terminal or a Failure terminal and a
   target decision tree defined by an accessor /a/ and a positive
   number of couples constraint πᵢ and children nodes Cₜᵢ. The output
   the output of the algorithm is:
   | equiv(S, (Leaf bbₛ|Failure) as Cₛ, Node(a, (πᵢ → Cₜᵢ)ⁱ)) :=  Forall(equiv( S∩a→π(kᵢ)), Cₛ, Cₜᵢ)ⁱ)
   The statement holds because defined let Sᵢ := S∩(a→πᵢ)
   either the algorithm is true for every sub-input space Sᵢ and
   subtree Cₜᵢ
   | equiv(Sᵢ, Cₛ, Cₜᵢ) = Yes ∀i
   or we have a counter example vₛ, vₜ for which
   | vₛ≃vₜ∈Sₖ ∧ cₛ(vₛ) ≠ Cₜₖ(vₜ)
   then because 
   | vₜ∈(a→πₖ) ⇒ Cₜ(vₜ) = Cₜₖ(vₜ)
   then
   | vₛ≃vₜ∈S ∧ Cₛ(vₛ)≠Cₜ(vₜ) 
   and the result of the algorithm is
   | equiv(Sᵢ, Cₛ, Cₜᵢ) = No(vₛ, vₜ) for some minimal k∈I
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								\begin{comment}
-												lambda form

											
										
										
											2020-03-30 21:23:55 +02:00
+								* TODO Scaletta [1/6]
-												forse

											
										
										
											2020-03-12 19:12:23 +01:00
+								  - [X] Introduction
-												lambda form

											
										
										
											2020-03-30 21:23:55 +02:00
+								  - [-] Background [60%]
 								    - [X] Low level representation
 								    - [X] Lambda code [0%]
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
+								    - [X] Pattern matching
-												ocaml todo

											
										
										
											2020-02-17 17:31:11 +01:00
+								    - [ ] Symbolic execution
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
+								    - [ ] Translation Validation
 								  - [ ] Translation validation of the Pattern Matching Compiler
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								    - [ ] Source translation
 								      - [ ] Formal Grammar
 								      - [ ] Compilation of source patterns
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
+								      - [ ] Rest?
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								    - [ ] Target translation
 								      - [ ] Formal Grammar
-												forse

											
										
										
											2020-03-12 19:12:23 +01:00
+								      - [ ] Symbolic execution of the Lambda target
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								    - [ ] Equivalence between source and target
-												todo

											
										
										
											2020-03-12 12:21:08 +01:00
+								  - [ ] Statement of correctness
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
+								  - [ ] Proof of correctness
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								  - [ ] Practical results
-												ocaml runtime values

											
										
										
											2020-03-29 22:54:33 +02:00
 								Magari prima pattern matching poi compilatore?
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								\end{comment}
-												ocaml todo

											
										
										
											2020-02-17 17:31:11 +01:00
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								#+TITLE:   Translation Verification of the  pattern matching compiler
-												ocaml todo

											
										
										
											2020-02-17 17:31:11 +01:00
+								#+AUTHOR:   Francesco Mecca
 								#+EMAIL:    me@francescomecca.eu
 								#+DATE:
 								#+LANGUAGE: en
 								#+LaTeX_CLASS: article
 								#+LaTeX_HEADER: \usepackage{algorithm}
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								#+LaTeX_HEADER: \usepackage{comment}
-												ocaml todo

											
										
										
											2020-02-17 17:31:11 +01:00
+								#+LaTeX_HEADER: \usepackage{algpseudocode}
 								#+LaTeX_HEADER: \usepackage{amsmath,amssymb,amsthm}
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
+								#+LaTeX_HEADER: \newtheorem{definition}{Definition}
 								#+LaTeX_HEADER: \usepackage{mathpartir}
-												ocaml todo

											
										
										
											2020-02-17 17:31:11 +01:00
+								#+LaTeX_HEADER: \usepackage{graphicx}
 								#+LaTeX_HEADER: \usepackage{listings}
 								#+LaTeX_HEADER: \usepackage{color}
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								#+LaTeX_HEADER: \usepackage{stmaryrd}
 								#+LaTeX_HEADER: \newcommand{\sem}[1]{{\llbracket{#1}\rrbracket}}
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
+								#+LaTeX_HEADER: \newcommand{\Equiv}[3]{\mathsf{equiv}(#1, #2, #3)} % \equiv is already taken
 								#+LaTeX_HEADER: \newcommand{\covers}[2]{#1 \mathrel{\mathsf{covers}} #2}
 								#+LaTeX_HEADER: \newcommand{\Yes}{\mathsf{Yes}}
 								#+LaTeX_HEADER: \newcommand{\No}[2]{\mathsf{No}(#1, #2)}
-												ocaml todo

											
										
										
											2020-02-17 17:31:11 +01:00
+								#+EXPORT_SELECT_TAGS: export
 								#+EXPORT_EXCLUDE_TAGS: noexport
 								#+OPTIONS: H:2 toc:nil \n:nil @:t ::t |:t ^:{} _:{} *:t TeX:t LaTeX:t
 								#+STARTUP: showall
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								\section{Introduction}
 								This dissertation presents an algorithm for the translation validation of the OCaml pattern
 								matching compiler. Given a source program and its compiled version the
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								algorithm checks whether the two are equivalent or produce a counter
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								example in case of a mismatch.
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								For the prototype of this algorithm we have chosen a subset of the OCaml
 								language and implemented a prototype equivalence checker along with a
 								formal statement of correctness and its proof.
 								The prototype is to be included in the OCaml compiler infrastructure
 								and will aid the development.
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
 								Our equivalence algorithm works with decision trees. Source patterns are
 								converted into a decision tree using a matrix decomposition algorithm.
 								Target programs, described in the Lambda intermediate
 								representation language of the OCaml compiler, are turned into decision trees
 								by applying symbolic execution.
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								\begin{comment}
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								\subsection{Translation validation}
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								\end{comment}
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								A pattern matching compiler turns a series of pattern matching clauses
 								into simple control flow structures such as \texttt{if, switch}, for example:
 								\begin{lstlisting}
 								  match x with
 								  | [] -> (0, None)
 								  | x::[] -> (1, Some x)
 								  | _::y::_ -> (2, Some y)
 								\end{lstlisting}
 								\begin{lstlisting}
 								(if scrutinee
 								    (let (field_1 =a (field 1 scrutinee))
 								        (if field_1
 								            (let
 								                (field_1_1 =a (field 1 field_1)
 								                 x =a (field 0 field_1))
 								                (makeblock 0 2 (makeblock 0 x)))
 								            (let (y =a (field 0 scrutinee))
 								                (makeblock 0 1 (makeblock 0 y)))))
 								    [0: 0 0a])
 								\end{lstlisting}
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								\begin{comment}
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								%% TODO: side by side
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								\end{comment}
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
+								The code on the right is in the Lambda intermediate representation of
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								the OCaml compiler. The Lambda representation of a program is shown by
 								calling the \texttt{ocamlc} compiler with \texttt{-drawlambda} flag.
 								The OCaml pattern matching compiler is a critical part of the OCaml compiler
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
+								in terms of correctness because bugs typically would result in wrong code
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								production rather than triggering compilation failures.
 								Such bugs also are hard to catch by testing because they arise in
 								corner cases of complex patterns which are typically not in the
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
+								compiler test suite or most user programs.
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
 								The OCaml core developers group considered evolving the pattern matching compiler, either by
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								using a new algorithm or by incremental refactoring of its code base.
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								For this reason we want to verify that new implementations of the
 								compiler avoid the introduction of new bugs and that such
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								modifications don't result in a different behavior than the current one.
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
 								One possible approach is to formally verify the pattern matching compiler
 								implementation using a machine checked proof.
 								Another possibility, albeit with a weaker result, is to verify that
 								each source program and target program pair are semantically correct.
 								We chose the latter technique, translation validation because is easier to adopt in
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								the case of a production compiler and to integrate with an existing code base. The compiler is treated as a
 								black-box and proof only depends on our equivalence algorithm.
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
 								\subsection{Our approach}
 								%% replace common TODO
 								Our algorithm translates both source and target programs into a common
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								representation, decision trees. Decision trees where chosen because
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								they model the space of possible values at a given branch of
 								execution.
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
+								Here are the decision trees for the source and target example program.
 								\begin{minipage}{0.5\linewidth}
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								\begin{verbatim}
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
+								       Switch(Root)
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								       /        \
 								     (= [])    (= ::)
 								     /             \
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
+								   Leaf         Switch(Root.1)
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								(0, None)       /         \
 								             (= [])      (= ::)
 								             /               \
 								          Leaf              Leaf
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
+								   [x = Root.0]         [y = Root.1.0]
 								   (1, Some x)          (2, Some y)
 								\end{verbatim}
 								\end{minipage}
 								\hfill
 								\begin{minipage}{0.5\linewidth}
 								\begin{verbatim}
 								       Switch(Root)
 								       /        \
 								     (= int 0)  (!= int 0)
 								     /             \
 								   Leaf         Switch(Root.1)
 								(makeblock 0     /       \
 0a)         /         \
 								             (= int 0)    (!= int 0)
 								             /               \
 								          Leaf              Leaf
 								[x = Root.0]            [y = Root.1.0]
 								(makeblock 0            (makeblock 0
 (makeblock 0 x))      2 (makeblock 0 y))
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								\end{verbatim}
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
+								\end{minipage}
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								\texttt{(Root.0)} is called an \emph{accessor}, that represents the
 								access path to a value that can be reached by deconstructing the
 								scrutinee. In this example \texttt{Root.0} is the first subvalue of
 								the scrutinee.
 								Target decision trees have a similar shape but the tests on the
 								branches are related to the low level representation of values in
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
+								Lambda code. For example, cons cells \texttt{x::xs} or tuples
 								\texttt{(x,y)} are blocks with tag 0.
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
 								To check the equivalence of a source and a target decision tree,
 								we proceed by case analysis.
 								If we have two terminals, such as leaves in the previous example,
 								we check that the two right-hand-sides are equivalent.
 								If we have a node $N$ and another tree $T$ we check equivalence for
 								each child of $N$, which is a pair of a branch condition $\pi_i$ and a
 								subtree $C_i$. For every child $(\pi_i, C_i)$ we reduce $T$ by killing all
 								the branches that are incompatible with $\pi_i$ and check that the
 								reduced tree is equivalent to $C_i$.
 								\subsection{From source programs to decision trees}
 								Our source language supports integers, lists, tuples and all algebraic
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
+								datatypes. Patterns support wildcards, constructors and literals,
 								or-patterns $(p_1 | p_2)$ and pattern variables.  We also support
 								\texttt{when} guards, which are interesting as they introduce the
 								evaluation of expressions during matching.  Decision trees have nodes
 								of the form:
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								\begin{lstlisting}
 								 type decision_tree =
 								  | Unreachable
 								  | Failure
 								  | Leaf of source_expr
 								  | Guard of source_expr * decision_tree * decision_tree
 								  | Switch of accessor * (constructor * decision_tree) list * decision_tree
 								\end{lstlisting}
 								In the \texttt{Switch} node we have one subtree for every head constructor
 								that appears in the pattern matching clauses and a fallback case for
 								other values. The branch condition $\pi_i$ expresses that the value at the
 								switch accessor starts with the given constructor.
 								\texttt{Failure} nodes express match failures for values that are not
 								matched by the source clauses.
 								\texttt{Unreachable} is used when we statically know that no value
 								can flow to that subtree.
 								We write $\sem{t_S}_S$ for the decision tree of the source program
 								$t_S$, computed by a matrix decomposition algorithm (each column
 								decomposition step gives a \texttt{Switch} node).
 								It satisfies the following correctness statement:
 								\[
 								\forall t_S, \forall v_S, \quad t_S(v_S) = \sem{t_S}_S(v_S)
 								\]
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
+								Running any source value $v_S$ against the source program gives the
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								same result as running it against the decision tree.
 								\subsection{From target programs to decision trees}
 								The target programs include the following Lambda constructs:
 								\texttt{let, if, switch, Match\_failure, catch, exit, field} and
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								various comparison operations, guards. The symbolic execution engine
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								traverses the target program and builds an environment that maps
 								variables to accessors. It branches at every control flow statement
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
+								and emits a \texttt{Switch} node. The branch condition $\pi_i$ is expressed as
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								an interval set of possible values at that point.
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
+								In comparison with the source decision trees, \texttt{Unreachable}
 								nodes are never emitted.
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
+								Guards result in branching. In comparison with the source decision
 								trees, \texttt{Unreachable} nodes are never emitted.
 								We write $\sem{t_T}_T$ for the decision tree of the target program
 								$t_T$, satisfying the following correctness statement:
 								\[
 								\forall t_T, \forall v_T, \quad t_T(v_T) = \sem{t_T}_T(v_T)
 								\]
 								\subsection{Equivalence checking}
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
+								The equivalence checking algorithm takes as input a domain of
 								possible values \emph{S} and a pair of source and target decision trees and
 								in case the two trees are not equivalent it returns a counter example.
 								The algorithm respects the following correctness statement:
 								\begin{comment}
 								% TODO: we have to define what \covers mean for readers to understand the specifications
 								% (or we use a simplifying lie by hiding \covers in the statements).
 								\end{comment}
 								\begin{align*}
 								 \Equiv S {C_S} {C_T} = \Yes \;\land\; \covers {C_T} S
 								 & \implies
 								 \forall v_S \approx v_T \in S,\; C_S(v_S) = C_T(v_T)
 								\\
 								 \Equiv S {C_S} {C_T} = \No {v_S} {v_T} \;\land\; \covers {C_T} S
 								 & \implies
 								 v_S \approx v_T \in S \;\land\; C_S(v_S) \neq C_T(v_T)
 								\end{align*}
 								The algorithm proceeds by case analysis. Inference rules are shown.
 								If $S$ is empty the results is $\Yes$.
 								\begin{verbatim}
 								------------------------
 								equiv \emptyset Cs Ct gs
 								\end{verbatim}
 								If the two decision trees are both terminal nodes the algorithm checks
 								for content equality.
 								\begin{verbatim}
 								--------------------------
 								equiv S Failure Failure []
 								equiv_BB BBs BBt
 								-------------------------------
 								equiv S (Leaf BBs) (Leaf BBt) []
 								\end{verbatim}
 								If the source decision tree (left hand side) is a terminal while the
 								target decistion tree (right hand side) is not, the algorithm proceeds
 								by \emph{explosion} of the right hand side. Explosion means that every
 								child of the right hand side is tested for equality against the left
 								hand side.
 								\begin{verbatim}
 								(equiv S Cs Ci gs)^i
 								equiv S Cs Cf gs
 								-----------------------------------------
 								equiv S Cs (Node(a, (Domi,Ci)^i, Cf)) gs
 								\end{verbatim}
-												maggiore introduzione

											
										
										
											2020-03-12 12:20:07 +01:00
-												ocaml todo

											
										
										
											2020-02-17 17:31:11 +01:00
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
+								\begin{comment}
 								% TODO: [Gabriel] in practice the $dom_S$ are constructors and the
 								% $dom_T$ are domains. Do we want to hide this fact, or be explicit
 								% about it? (Maybe we should introduce explicitly/clearly the idea of
 								% target domains at some point).
 								\end{comment}
 								When the left hand side is not a terminal, the algorithm explodes the
 								left hand side while trimming every right hand side subtree. Trimming
 								a left hand side tree on an interval set $dom_S$ computed from the right hand
 								side tree constructor means mapping every branch condition $dom_T$ (interval set of
 								possible values) on the left to the intersection of $dom_T$ and $dom_S$ when
 								the accessors on both side are equal, and removing the branches that
 								result in an empty intersection. If the accessors are different,
 								\emph{$dom_T$} is left unchanged.
 								\begin{verbatim}
 								equiv S Ci (trim Ct a=Ki) gs
 								equiv S Cf (trim Ct (a\notin(K_i)^i) gs
 								-------------------------------------
 								equiv S (Node(a, (Ki,Ci)^i, Cf) Ct gs
 								\end{verbatim}
 								The equivalence checking algorithm deals with guards by storing a
 								queue. A guard blackbox is pushed to the queue whenever the algorithm
 								encounters a Guard node on the right, while it pops a blackbox from
 								the queue whenever a Guard node appears on the left hand side.
 								The algorithm stops with failure if the popped blackbox and the and
 								blackbox on the left hand Guard node are different, otherwise in
 								continues by exploding to two subtrees, one in which the guard
 								condition evaluates to true, the other when it evaluates to false.
 								Termination of the algorithm is successful only when the guards queue
 								is empty.
 								\begin{verbatim}
 								equiv S Ctrue Ct (gs++[condition])
 								equiv S Cfalse Ct (gs++[condition])
 								--------------------------------------------
 								equiv S (Guard condition Ctrue Cfalse) Ct gs
 								equiv S Cs Ctrue gs
 								equiv S Cs Cfalse gs
 								--------------------------------------------
 								equiv S Cs (Guard condition Ctrue Cfalse) ([condition]++gs)
 								\end{verbatim}
 								\begin{comment}
 								TODO: replace inference rules with good latex
 								\end{comment}
-												script conversione

											
										
										
											2020-02-24 14:36:26 +01:00
+								* Background
-												forse

											
										
										
											2020-03-12 19:12:23 +01:00
 								** OCaml
 								Objective Caml (OCaml) is a dialect of the ML (Meta-Language)
 								family of programming that features with other dialects of ML, such
 								as SML and Caml Light.
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								The main features of ML languages are the use of the Hindley-Milner type system that
 								provides many advantages with respect to static type systems of traditional imperative and object
 								oriented language such as C, C++ and Java, such as:
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
+								    - Polymorphism: in certain scenarios a function can accept more than one
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								      type for the input parameters. For example a function that computes the length of a
-												script conversione

											
										
										
											2020-02-24 14:36:26 +01:00
+								      list doesn't need to inspect the type of the elements of the list and for this reason
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
+								      a List.length function can accept lists of integers, lists of strings and in general
 								      lists of any type. Such languages offer polymorphic functions through subtyping at
-												script conversione

											
										
										
											2020-02-24 14:36:26 +01:00
+								      runtime only, while other languages such as C++ offer polymorphism through compile
 								      time templates and function overloading.
 								      With the Hindley-Milner type system each well typed function can have more than one
 								      type but always has a unique best type, called the /principal type/.
 								      For example the principal type of the List.length function is "For any /a/, function from
 								      list of /a/ to /int/" and /a/ is called the /type parameter/.
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								    - Strong typing: Languages such as C and C++ allow the programmer to operate on data
-												script conversione

											
										
										
											2020-02-24 14:36:26 +01:00
+								      without considering its type, mainly through pointers. Other languages such as C#
 								      and Go allow type erasure so at runtime the type of the data can't be queried.
 								      In the case of programming languages using an Hindley-Milner type system the
 								      programmer is not allowed to operate on data by ignoring or promoting its type.
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								    - Type Inference: the principal type of a well formed term can be inferred without any
-												script conversione

											
										
										
											2020-02-24 14:36:26 +01:00
+								      annotation or declaration.
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								    - Algebraic data types: types that are modeled by the use of two
-												script conversione

											
										
										
											2020-02-24 14:36:26 +01:00
+								      algebraic operations, sum and product.
 								      A sum type is a type that can hold of many different types of
 								      objects, but only one at a time. For example the sum type defined
 								      as /A + B/ can hold at any moment a value of type A or a value of
 								      type B. Sum types are also called tagged union or variants.
 								      A product type is a type constructed as a direct product
 								      of multiple types and contains at any moment one instance for
 								      every type of its operands. Product types are also called tuples
 								      or records. Algebraic data types can be recursive
 								      in their definition and can be combined.
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								Moreover ML languages are functional, meaning that functions are
 								treated as first class citizens and variables are immutable,
 								although mutable statements and imperative constructs are permitted.
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								In addition to that  features an object system, that provides
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								inheritance, subtyping and dynamic binding, and modules, that
 								provide a way to encapsulate definitions. Modules are checked
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								statically and can be reifycated through functors.
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												ocaml runtime values

											
										
										
											2020-03-29 22:54:33 +02:00
+								** Compiling OCaml code
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
-												ocaml runtime values

											
										
										
											2020-03-29 22:54:33 +02:00
+								The OCaml compiler provides compilation of source files in form of a bytecode executable with an
 								optionally embeddable interpreter or as a native executable that could
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
+								be statically linked to provide a single file executable.
-												ocaml runtime values

											
										
										
											2020-03-29 22:54:33 +02:00
+								Every source file is treated as a separate /compilation unit/ that is
 								advanced through different states.
 								The first stage of compilation is the parsing of the input code that
 								is trasformed into an untyped syntax tree. Code with syntax errors is
 								rejected at this stage.
 								After that the AST is processed by the type checker that performs
 								three steps at once:
 								- type inference, using the classical /Algorithm W/
 								- perform subtyping and gathers type information from the module system
 								- ensures that the code obeys the rule of the OCaml type system
 								At this stage, incorrectly typed code is rejected. In case of success,
 								the untyped AST in trasformed into a /Typed Tree/.
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								After the  typechecker has proven that the program is type safe,
 								the  compiler lower the code to /Lambda/, an s-expression based
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
+								language that assumes that its input has already been proved safe.
-												ocaml runtime values

											
										
										
											2020-03-29 22:54:33 +02:00
+								After the  Lambda pass, the Lambda code is either translated into
 								bytecode or goes through a series of optimization steps performed by
 								the /Flambda/ optimizer before being translated into assembly.
 								\begin{comment}
 								TODO: Talk about flambda passes?
 								\end{comment}
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
-												ocaml runtime values

											
										
										
											2020-03-29 22:54:33 +02:00
+								This is an overview of the different compiler steps.
 								[[./files/ocamlcompilation.png]]
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
-												ocaml runtime values

											
										
										
											2020-03-29 22:54:33 +02:00
+								** Memory representation of OCaml values
 								An usual OCaml source program contains few to none explicit type
 								signatures.
 								This is possible because of type inference that allows to annotate the
 								AST with type informations. However, since the OCaml typechecker guarantes that a program is well typed
 								before being transformed into Lambda code, values at runtime contains
 								only a minimal subset of type informations needed to distinguish
 								polymorphic values.
 								For runtime values, OCaml uses a uniform memory representation in
 								which every variable is stored as a value in a contiguous block of
 								memory.
 								Every value is a single word that is either a concrete integer or a
 								pointer to another block of memory, that is called /cell/ or /box/.
 								We can abstract the type of OCaml runtime values as the following:
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								#+BEGIN_SRC
-												ocaml runtime values

											
										
										
											2020-03-29 22:54:33 +02:00
+								type t = Constant | Cell of int * t
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
+								#+END_SRC
-												ocaml runtime values

											
										
										
											2020-03-29 22:54:33 +02:00
+								where a one bit tag is used to distinguish between Constant or Cell.
 								In particular this bit of metadata is stored as the lowest bit of a
 								memory block.
 								Given that all the OCaml target architectures guarantee that all
 								pointers are divisible by four and that means that two lowest bits are
 								always 00 storing this bit of metadata at the lowest bit allows an
 								optimization. Constant values in OCaml, such as integers, empty lists,
 								Unit values and constructors of arity zero (/constant/ constructors)
 								are unboxed at runtime while pointers are recognized by the lowest bit
 								set to 0.
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
-												ocaml runtime values

											
										
										
											2020-03-29 22:54:33 +02:00
+								** Lambda form compilation
 								\begin{comment}
 								https://dev.realworld.org/compiler-backend.html
 								CITE: realworldocaml
 								\end{comment}
-												lambda form

											
										
										
											2020-03-30 21:23:55 +02:00
+								A Lambda code target file is produced by the compiler and consists of a
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
+								single s-expression. Every s-expression consist of /(/, a sequence of
 								elements separated by a whitespace and a closing /)/.
 								Elements of s-expressions are:
 								- Atoms: sequences of ascii letters, digits or symbols
 								- Variables
 								- Strings: enclosed in double quotes and possibly escaped
 								- S-expressions: allowing arbitrary nesting
-												lambda form

											
										
										
											2020-03-30 21:23:55 +02:00
+								The Lambda form is a key stage where the compiler discards type
 								informations and maps the original source code to the runtime memory
 								model described.
 								In this stage of the compiler pipeline pattern match statements are
 								analyzed and compiled into an automata.
 								\begin{comment}
 								evidenzia centralita` rispetto alla tesi
 								\end{comment}
 								#+BEGIN_SRC
 								type t = | Foo | Bar | Baz | Fred
 								let test = function
 								  | Foo -> "foo"
 								  | Bar -> "bar"
 								  | Baz -> "baz"
 								  | Fred -> "fred"
 								#+END_SRC
 								The Lambda output for this code can be obtained by running the
 								compiler with the /-dlambda/ flag:
 								#+BEGIN_SRC
 								(setglobal Prova!
 								  (let
 								    (test/85 =
 								       (function param/86
 								         (switch* param/86
 								          case int 0: "foo"
 								          case int 1: "bar"
 								          case int 2: "baz"
 								          case int 3: "fred")))
 								    (makeblock 0 test/85)))
 								#+END_SRC
 								As outlined by the example, the /makeblock/ directive is responsible
 								for allocating low level OCaml values and every constant constructor
 								ot the algebraic type /t/ is stored in memory as an integer.
 								The /setglobal/ directives declares a new binding in the global scope:
 								Every concept of modules is lost at this stage of compilation.
 								The pattern matching compiler uses a jump table to map every pattern
 								matching clauses to its target expression. Values are addressed by a
 								unique name.
 								#+BEGIN_SRC
 								type t = | English of p | French of q
 								type p = | Foo | Bar
 								type q = | Tata| Titi
 								type t = | English of p | French of q
 								let test = function
 								  | English Foo -> "foo"
 								  | English Bar -> "bar"
 								  | French Tata -> "baz"
 								  | French Titi -> "fred"
 								#+END_SRC
 								In the case of types with a smaller number of variants, the pattern
 								matching compiler may avoid the overhead of computing a jump table.
 								This example also highlights the fact that non constant constructor
 								are mapped to cons cell that are accessed using the /tag/ directive.
 								#+BEGIN_SRC
 								(setglobal Prova!
 								  (let
 								    (test/89 =
 								       (function param/90
 								         (switch* param/90
 								          case tag 0: (if (!= (field 0 param/90) 0) "bar" "foo")
 								          case tag 1: (if (!= (field 0 param/90) 0) "fred" "baz"))))
 								    (makeblock 0 test/89)))
 								#+END_SRC
 								In the Lambda language are several numeric types:
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
+								- integers: that us either 31 or 63 bit two's complement arithmetic
 								  depending on system word size, and also wrapping on overflow
 								- 32 bit and 64 bit integers: that use 32-bit and 64-bit two's complement arithmetic
 								  with wrap on overflow
 								- big integers: offer integers with arbitrary precision
 								- floats: that use IEEE754 double-precision (64-bit) arithmetic with
 								  the addition of the literals /infinity/, /neg_infinity/ and /nan/.
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								The are various numeric operations defined:
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
 								- Arithmetic operations: +, -, *, /, % (modulo), neg (unary negation)
 								- Bitwise operations: &, |, ^, <<, >> (zero-shifting), a>> (sign extending)
 								- Numeric comparisons: <, >, <=, >=, ==
 								*** Functions
 								Functions are defined using the following syntax, and close over all
 								bindings in scope: (lambda (arg1 arg2 arg3) BODY)
 								and are applied using the following syntax: (apply FUNC ARG ARG ARG)
 								Evaluation is eager.
-												lambda form

											
										
										
											2020-03-30 21:23:55 +02:00
+								*** Other atoms
 								The atom /let/ introduces a sequence of bindings at a smaller scope
 								than the global one:
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
+								(let BINDING BINDING BINDING ... BODY)
-												lambda form

											
										
										
											2020-03-30 21:23:55 +02:00
+								The Lambda form supports many other directives such as /strinswitch/
 								that is constructs aspecialized jump tables for string, integer range
 								comparisons and so on.
 								These construct are explicitely undocumented because the Lambda code
 								intermediate language can change across compiler releases.
 								\begin{comment}
 								Spiega che la sintassi che supporti e` quella nella BNF
 								\end{comment}
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
 								** Pattern matching
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
 								Pattern matching is a widely adopted mechanism to interact with ADT.
 								C family languages provide branching on predicates through the use of
 								if statements and switch statements.
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								Pattern matching on the other hands express predicates through
 								syntactic templates that also allow to bind on data structures of
 								arbitrary shapes. One common example of pattern matching is the use of regular
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								expressions on strings.  provides pattern matching on ADT and
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								primitive data types.
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								The result of a pattern matching operation is always one of:
 								- this value does not match this pattern”
 								- this value matches this pattern, resulting the following bindings of
 								  names to values and the jump to the expression pointed at the
 								  pattern.
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								#+BEGIN_SRC
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								type color = | Red | Blue | Green | Black | White
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								match color with
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								| Red -> print "red"
 								| Blue -> print "red"
 								| Green -> print "red"
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								| _ -> print "white or black"
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								#+END_SRC
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								 provides tokens to express data destructoring.
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								For example we can examine the content of a list with pattern matching
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								#+BEGIN_SRC
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
 								begin match list with
 								| [ ] -> print "empty list"
 								| element1 :: [ ] -> print "one element"
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								| (element1 :: element2) :: [ ] -> print "two elements"
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								| head :: tail-> print "head followed by many elements"
 								#+END_SRC
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								Parenthesized patterns, such as the third one in the previous example,
 								matches the same value as the pattern without parenthesis.
 								The same could be done with tuples
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								#+BEGIN_SRC
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
 								begin match tuple with
 								| (Some _, Some _) -> print "Pair of optional types"
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								| (Some _, None) | (None, Some _) -> print "Pair of optional types, one of which is null"
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								| (None, None) -> print "Pair of optional types, both null"
 								#+END_SRC
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								The pattern pattern₁ |  pattern₂ represents the logical "or" of the
 								two patterns pattern₁ and pattern₂. A value matches pattern₁ |
 								pattern₂ if it matches pattern₁ or pattern₂.
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								Pattern clauses can make the use of /guards/ to test predicates and
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								variables can captured (binded in scope).
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								#+BEGIN_SRC
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
 								begin match token_list with
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								| "switch"::var::"{"::rest -> ...
 								| "case"::":"::var::rest when is_int var -> ...
 								| "case"::":"::var::rest when is_string var -> ...
 								| "}"::[ ] -> ...
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								| "}"::rest -> error "syntax error: " rest
 								#+END_SRC
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								Moreover, the  pattern matching compiler emits a warning when a
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								pattern is not exhaustive or some patterns are shadowed by precedent ones.
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								** Symbolic execution
-												results of mumble meeting

											
										
										
											2020-04-03 15:53:54 +02:00
 								Symbolic execution is a widely used techniques in the field of
 								computer security.
-												symbolic execution done

											
										
										
											2020-04-03 18:12:32 +02:00
+								It allows to analyze different execution paths of a program
 								simultanously while tracking which inputs trigger the execution of
 								different parts of the program.
 								Inputs are modelled symbolically rather than taking "concrete" values.
 								A symbolic execution engine keeps track of expressions and variables
 								in terms of these symbolic symbols and attaches logical constraints to every
 								branch that is being followed.
 								Symbolic execution engines are used to track bugs by modelling the
 								domain of all possible inputs of a program, detecting infeasible
 								paths, dead code and proving that two code segments are equivalent.
 								Let's take as example this signedness bug that was found in the
 								FreeBSD kernel and allowed, when calling the getpeername function, to
 								read portions of kernel memory.
 								#+BEGIN_SRC
 								int compat;
 								{
 								    struct file *fp;
 								    register struct socket *so;
 								    struct sockaddr *sa;
 								    int len, error;
 								    ...
 								    len = MIN(len, sa->sa_len);    /* [1] */
 								    error = copyout(sa, (caddr_t)uap->asa, (u_int)len);
 								    if (error)
 								        goto bad;
 								    ...
 								bad:
 								    if (sa)
 								        FREE(sa, M_SONAME);
 								    fdrop(fp, p);
 								    return (error);
 								}
 								#+END_SRC
 								The tree of the execution when the function is evaluated considering
 								/int len/ our symbolic variable α, sa->sa_len as symbolic variable β
 								and π as the set of constraints on a symbolic variable:
 								#+BEGIN_SRC
 								[1]              compat (...)        { π_{α}: -∞ < α < ∞ }
 								                   |
 								[2]              min (σ₁, σ₂)        { π_{σ}: -∞ < (σ₁,σ₂) < ∞ ; π_{α}: -∞ < α < β ; π_{β}: ...}
 								                   |
 								[3]             cast(u_int) (...)    { π_{σ}: 0 ≤ (σ) < ∞ ; π_{α}: -∞ < α < β ; π_{β}: ...}
 								                   |
 								                  ... // rest of the execution
 								#+END_SRC
 								We can see that at step 3 the set of possible values of the scrutinee
 								α is bigger than the set of possible values of the input σ to the
 								/cast/ directive, that is: π_{α} ⊈ π_{σ}. For this reason the /cast/ may fail when α is /len/
 								negative number, outside the domain π_{σ}. In C this would trigger undefined behaviour (signed
 								overflow) that made the exploitation possible.
 								Every step of evaluation can be modelled as the following transition:
 								#+BEGIN_SRC
 								(π_{σ}, (πᵢ)ⁱ) → (π'_{σ}, (π'ᵢ)ⁱ)
 								#+END_SRC
 								if we express the π constraints as logical formulas we can model the
 								execution of the program in terms of Hoare Logic.
 								State of the computation is a Hoare triple {P}C{Q} where P and Q are
 								respectively the /precondition/ and the /postcondition/ that
 								constitute the assertions of the program. C is the directive being
 								executed.
 								The language of the assertions P is:
 								#+BEGIN_SRC
 								P ::= true | false | a < b | P₁ ∧ P₂ | P₁ ∨ P₂ | ~P
 								#+END_SRC
 								where a and b are numbers.
 								In the Hoare rules assertions could also take the form of
 								#+BEGIN_SRC
 								P ::= ∀i. P | ∃i. P | P₁ ⇒ P₂
 								#+END_SRC
 								where i is a logical variable, but assertions of these kinds increases
 								the complexity of the symbolic engine.
 								Execution follows the rules of Hoare logic:
 								- Empty statement :
 								\begin{verbatim}
 								————————————
 								{P}/skip/{P}
 								\end{verbatim}
 								- Assignment statement : The truthness of P[a/x] is equivalent to the
 								  truth of {P} after the assignment.
 								\begin{verbatim}
 								————————————
 								{P[a/x]}x:=a{P}
 								\end{verbatim}
 								- Composition : c₁ and c₂ are directives that are executed in order;
 								  {Q} is the /midcondition/.
 								\begin{verbatim}
 								{P}c₁{R}, {R}c₂{Q}
 								——————————————————
 								   {P}c₁;c₂{Q}
 								\end{verbatim}
 								- Conditional :
 								\begin{verbatim}
 								 {P∧b}c₁{Q}, {P∧~b}c₂{Q}
 								————————————————————————
 								{P}if b then c₁ else c₂{Q}
 								\end{verbatim}
 								- Loop : {P} is the loop invariant. After the loop is finished /P/
 								  holds and ~b caused the loop to end.
 								\begin{verbatim}
 								 {P∧b}c{P}
 								————————————————————————
 								{P}while b do c{P∧~b}
 								\end{verbatim}
 								Even if the semantics of symbolic execution engines are well defined,
 								the user may run into different complications when applying such
 								analysis to non trivial codebases.
 								For example, depending on the domain, loop termination is not
 								guaranteed. Even when termination is guaranteed, looping causes
 								exponential branching that may lead to path explosion or state
 								explosion.
 								Reasoning about all possible executions of a program is not always
 								feasible and in case of explosion usually symbolic execution engines
 								implement heuristics to reduce the size of the search space.
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
 								** Translation validation
 								Translators, such as translators and code generators, are huge pieces of
 								software usually consisting of multiple subsystem and
 								constructing an actual specification of a translator implementation for
 								formal validation is a very long task. Moreover, different
 								translators implement different algorithms, so the correctness proof of
 								a translator cannot be generalized and reused to prove another translator.
 								Translation validation is an alternative to the verification of
 								existing translators that consists of taking the source and the target
 								(compiled) program and proving /a posteriori/ their semantic equivalence.
 								- [ ] Techniques for translation validation
 								- [ ] What does semantically equivalent mean
 								- [ ] What happens when there is no semantic equivalence
 								- [ ] Translation validation through symbolic execution
-												forse

											
										
										
											2020-03-12 19:12:23 +01:00
+								* Translation validation of the Pattern Matching Compiler
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
-												forse

											
										
										
											2020-03-12 19:12:23 +01:00
+								** Source program
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								The algorithm takes as its input a source program and translates it
-												forse

											
										
										
											2020-03-12 19:12:23 +01:00
+								into an algebraic data structure called /decision_tree/.
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
 								#+BEGIN_SRC
-												forse

											
										
										
											2020-03-12 19:12:23 +01:00
+								type decision_tree =
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								  | Unreachable
 								  | Failure
 								  | Leaf of source_expr
-												forse

											
										
										
											2020-03-12 19:12:23 +01:00
+								  | Guard of source_blackbox * decision_tree * decision_tree
 								  | Node of accessor * (constructor * decision_tree) list * decision_tree
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								#+END_SRC
 								Unreachable, Leaf of source_expr and Failure are the terminals of the three.
 								We distinguish
 								- Unreachable: statically it is known that no value can go there
 								- Failure: a value matching this part results in an error
 								- Leaf: a value matching this part results into the evaluation of a
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								  source black box of  code
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
 								The algorithm doesn't support type-declaration-based analysis
 								to know the list of constructors at a given type.
 								Let's consider some trivial examples:
 								#+BEGIN_SRC
 								function true -> 1
 								#+END_SRC
 								[ ] Converti a disegni
 								Is translated to
 								|Node ([(true, Leaf 1)], Failure)
 								while
 								#+BEGIN_SRC
 								function
 								true -> 1
 								| false -> 2
 								#+END_SRC
 								will give
 								|Node ([(true, Leaf 1); (false, Leaf 2)])
 								It is possible to produce Unreachable examples by using
 								 refutation clauses (a "dot" in the right-hand-side)
 								#+BEGIN_SRC
 								function
 								true -> 1
 								| false -> 2
 								| _ -> .
 								#+END_SRC
 								that gets translated into
 								Node ([(true, Leaf 1); (false, Leaf 2)], Unreachable)
 								We trust this annotation, which is reasonable as the  type-checker
 								verifies that it indeed holds.
 								Guard nodes of the tree are emitted whenever a guard is found. Guards
 								node contains a blackbox of  code that is never evaluated and two
 								branches, one that is taken in case the guard evaluates to true and
 								the other one that contains the path taken when the guard evaluates to
 								true.
 								[ ] Finisci con Node
 								[ ] Spiega del fallback
-												forse

											
										
										
											2020-03-12 19:12:23 +01:00
+								[ ] rivedi compilazione per tenere in considerazione il tuo albero invece che le Lambda
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								[ ] Specifica che stesso algoritmo usato per compilare a lambda, piu` optimizations
-												forse

											
										
										
											2020-03-12 19:12:23 +01:00
+								The source code of a pattern matching function has the
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								following form:
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								|match variable with
 								|\vert pattern₁ -> expr₁
 								|\vert pattern₂ when guard -> expr₂
 								|\vert pattern₃ as var -> expr₃
 								|⋮
 								|\vert pₙ -> exprₙ
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												forse

											
										
										
											2020-03-12 19:12:23 +01:00
+								and can include any expression that is legal for the OCaml compiler,
 								such as /when/ guards and assignments. Patterns could or could not
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								be exhaustive.
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								Pattern matching code could also be written using the more compact form:
 								|function
 								|\vert pattern₁ -> expr₁
 								|\vert pattern₂ when guard -> expr₂
 								|\vert pattern₃ as var -> expr₃
 								|⋮
 								|\vert pₙ -> exprₙ
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								This BNF grammar describes formally the grammar of the source program:
 								#+BEGIN_SRC bnf
 								start ::= "match" id "with" patterns | "function" patterns
 								patterns ::= (pattern0|pattern1) pattern1+
 								;; pattern0 and pattern1 are needed to distinguish the first case in which
 								;; we can avoid writing the optional vertical line
 								pattern0 ::= clause
 								pattern1 ::= "|" clause
 								clause ::= lexpr "->" rexpr
 								lexpr ::= rule (ε|condition)
 								rexpr ::= _code ;; arbitrary code
 								rule ::= wildcard|variable|constructor_pattern|or_pattern ;;
 								;; rules
 								wildcard ::= "_"
 								variable ::= identifier
 								constructor_pattern ::= constructor (rule|ε) (assignment|ε)
 								constructor ::= int|float|char|string|bool
 								                |unit|record|exn|objects|ref
 								                |list|tuple|array
 								                |variant|parameterized_variant ;;  data types
-												forse

											
										
										
											2020-03-12 19:12:23 +01:00
+								or_pattern ::=  rule ("|" wildcard|variable|constructor_pattern)+
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
 								condition ::= "when" bexpr
 								assignment ::= "as" id
 								bexpr ::= _code ;; arbitrary code
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								#+END_SRC
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												uff

											
										
										
											2020-03-02 14:46:37 +01:00
+								\begin{comment}
-												forse

											
										
										
											2020-03-12 19:12:23 +01:00
+								Check that it is still coherent to this bnf
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								\end{comment}
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								Patterns are of the form
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								| pattern         | type of pattern    |
-												script conversione

											
										
										
											2020-02-24 14:36:26 +01:00
+								|-----------------+---------------------|
 								| _               | wildcard            |
 								| x               | variable            |
 								| c(p₁,p₂,...,pₙ) | constructor pattern |
 								| (p₁\vert p₂)    | or-pattern          |
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								During compilation by the translators expressions are compiled into
-												forse

											
										
										
											2020-03-12 19:12:23 +01:00
+								Lambda code and are referred as lambda code actions lᵢ.
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												inizio source

											
										
										
											2020-03-03 17:18:40 +01:00
+								The entire pattern matching code is represented as a clause matrix
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								that associates rows of patterns (p_{i,1}, p_{i,2}, ..., p_{i,n}) to
 								lambda code action lⁱ
 								\begin{equation*}
 								(P → L) =
 								\begin{pmatrix}
-												script conversione

											
										
										
											2020-02-24 14:36:26 +01:00
+								p_{1,1} & p_{1,2} & \cdots & p_{1,n} & → l₁ \\
 								p_{2,1} & p_{2,2} & \cdots & p_{2,n} & →  l₂ \\
 								\vdots & \vdots & \ddots & \vdots & → \vdots \\
 								p_{m,1} & p_{m,2} & \cdots & p_{m,n} & → lₘ
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								\end{pmatrix}
 								\end{equation*}
-												script conversione

											
										
										
											2020-02-24 14:36:26 +01:00
+								The pattern /p/ matches a value /v/, written as p ≼ v, when one of the
 								following rules apply
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												script conversione

											
										
										
											2020-02-24 14:36:26 +01:00
+								|--------------------+---+--------------------+-------------------------------------------|
 								| _                  | ≼ | v                  | ∀v                                        |
 								| x                  | ≼ | v                  | ∀v                                        |
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								| (p₁ \vert\ p₂)     | ≼ | v                  | iff p₁ ≼ v or p₂ ≼ v                      |
 								| c(p₁, p₂, ..., pₐ) | ≼ | c(v₁, v₂, ..., vₐ) | iff (p₁, p₂, ..., pₐ) ≼ (v₁, v₂, ..., vₐ) |
 								| (p₁, p₂, ..., pₐ)  | ≼ | (v₁, v₂, ..., vₐ)  | iff pᵢ ≼ vᵢ ∀i ∈ [1..a]                   |
-												script conversione

											
										
										
											2020-02-24 14:36:26 +01:00
+								|--------------------+---+--------------------+-------------------------------------------|
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								When a value /v/ matches pattern /p/ we say that /v/ is an /instance/ of /p/.
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								Considering the pattern matrix P we say that the value vector
-												script conversione

											
										
										
											2020-02-24 14:36:26 +01:00
+								$\vec{v}$ = (v₁, v₂, ..., vᵢ) matches the line number i in P  if and only if the following two
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								conditions are satisfied:
-												script conversione

											
										
										
											2020-02-24 14:36:26 +01:00
+								- p_{i,1}, p_{i,2}, \cdots, p_{i,n}  ≼ (v₁, v₂, ..., vᵢ)
 								- ∀j < i p_{j,1}, p_{j,2}, \cdots, p_{j,n} ⋠ (v₁, v₂, ..., vᵢ)
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
 								We can define the following three relations with respect to patterns:
-												script conversione

											
										
										
											2020-02-24 14:36:26 +01:00
+								- Patter p is less precise than pattern q, written p ≼ q, when all
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								  instances of q are instances of p
 								- Pattern p and q are equivalent, written p ≡ q, when their instances
 								  are the same
 								- Patterns p and q are compatible when they share a common instance
-												forse

											
										
										
											2020-03-12 19:12:23 +01:00
+								\subsubsection{Parsing of the source program}
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												forse

											
										
										
											2020-03-12 19:12:23 +01:00
+								The source program of the following general form is parsed using a parser
 								generated by Menhir, a LR(1) parser generator for the OCaml programming language.
 								Menhir compiles LR(1) a grammar specification, in this case the OCaml pattern matching
 								grammar, down to OCaml code.
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								|match variable with
 								|\vert pattern₁ -> e₁
 								|\vert pattern₂ -> e₂
 								|⋮
 								|\vert pₘ -> eₘ
-												forse

											
										
										
											2020-03-12 19:12:23 +01:00
+								The result of parsing, when successful, results in a list of clauses
 								and a list of type declarations.
 								Every clause consists of three objects: a left-hand-side that is the
 								kind of pattern expressed, an option guard and a right-hand-side expression.
 								Patterns are encoded in the following way:
 								| pattern         | type        |
 								|-----------------+-------------|
 								| _               | Wildcard    |
 								| p₁ as x         | Assignment  |
 								| c(p₁,p₂,...,pₙ) | Constructor |
 								| (p₁\vert p₂)    | Orpat       |
 								Guards and right-hand-sides are treated as a blackbox of OCaml code.
 								A sound approach for treating these blackbox would be to inspect the
 								OCaml compiler during translation to Lambda code and extract the
 								blackboxes compiled in their Lambda representation.
 								This would allow to test for equality with the respective blackbox at
 								the target level.
 								Given that this level of introspection is currently not possibile, we
 								decided to restrict the structure of blackboxes to the following (valid) OCaml
 								code:
 								#+BEGIN_SRC
 								external guard : 'a -> 'b = "guard"
 								external observe : 'a -> 'b = "observe"
 								#+END_SRC
 								We assume these two external functions /guard/ and /observe/ with a valid
 								type that lets the user pass any number of arguments to them.
 								All the guards are of the form \texttt{guard <arg> <arg> <arg>}, where the
 								<arg> are expressed using the OCaml pattern matching language.
 								Similarly, all the right-hand-side expressions are of the form
 								\texttt{observe <arg> <arg> ...} with the same constraints on arguments.
 								#+BEGIN_SRC
 								type t = Z | S of t
 								let _ = function
 								  | Z -> observe 0
 								  | S Z -> observe 1
 								  | S x when guard x -> observe 2
 								  | S (S x) as y when guard x y -> observe 3
 								  | S _ -> observe 4
 								#+END_SRC
 								Once parsed, the type declarations and the list of clauses are encoded in the form of a matrix
 								that is later evaluated using a matrix decomposition algorithm.
 								\subsubsection{Matrix decomposition of pattern clauses}
 								The initial input of the decomposition algorithm C consists of a vector of variables
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								$\vec{x}$ = (x₁, x₂, ..., xₙ)  of size /n/ where /n/ is the arity of
 								the type of /x/ and a clause matrix P → L of width n and height m.
 								That is:
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
 								\begin{equation*}
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								C((\vec{x} = (x₁, x₂, ..., xₙ),
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								\begin{pmatrix}
 								p_{1,1} & p_{1,2} & \cdots & p_{1,n} → l₁ \\
 								p_{2,1} & p_{2,2} & \cdots & p_{2,n} → l₂ \\
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								\vdots & \vdots & \ddots & \vdots →  \vdots \\
 								p_{m,1} & p_{m,2} & \cdots & p_{m,n} → lₘ)
-												logica che palle

											
										
										
											2020-02-21 11:29:04 +01:00
+								\end{pmatrix}
 								\end{equation*}
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
 								The base case C₀ of the algorithm is the case in which the $\vec{x}$
 								is empty, that is $\vec{x}$ = (), then the result of the compilation
 								C₀ is l₁
 								\begin{equation*}
 								C₀((),
 								\begin{pmatrix}
 								→ l₁ \\
 								→ l₂ \\
 								→  \vdots \\
 								→ lₘ
 								\end{pmatrix})
 								) = l₁
 								\end{equation*}
 								When $\vec{x}$ ≠ () then the compilation advances using one of the
 								following four rules:
 ) Variable rule: if all patterns of the first column of P are wildcard patterns or
 								   bind the value to a variable, then
 								    \begin{equation*}
 								    C(\vec{x}, P → L) = C((x₂, x₃, ..., xₙ), P' → L')
 								    \end{equation*}
 								    where
 								    \begin{equation*}
 								    \begin{pmatrix}
 								    p_{1,2} & \cdots & p_{1,n} & → & (let & y₁ & x₁) & l₁ \\
 								    p_{2,2} & \cdots & p_{2,n} & → & (let & y₂ & x₁) & l₂ \\
 								    \vdots & \ddots & \vdots & →  & \vdots & \vdots & \vdots & \vdots \\
 								    p_{m,2} & \cdots & p_{m,n} & → & (let & yₘ & x₁) & lₘ
 								    \end{pmatrix}
 								    \end{equation*}
 								    That means in every lambda action lᵢ there is a binding of x₁ to the
 								    variable that appears on the pattern $p_{i,1}. Bindings are omitted
 								    for wildcard patterns and the lambda action lᵢ remains unchanged.
 ) Constructor rule: if all patterns in the first column of P are
 								   constructors patterns of the form k(q₁, q₂, ..., qₙ) we define a
 								   new matrix, the specialized clause matrix S, by applying the
 								   following transformation on every row /p/:
 								    \begin{lstlisting}[mathescape,columns=fullflexible,basicstyle=\fontfamily{lmvtt}\selectfont,]
 								    for every c ∈ Set of constructors
 								        for i ← 1 .. m
 								            let kᵢ ← constructor_of($p_{i,1}$)
 								            if kᵢ = c then
 								                p ← $q_{i,1}$, $q_{i,2}$, ..., $q_{i,n'}$,  $p_{i,2}$, $p_{i,3}$, ..., $p_{i, n}$
 								    \end{lstlisting}
 								   Patterns of the form $q_{i,j}$ matches on the values of the
 								   constructor and we define new fresh variables y₁, y₂, ..., yₐ so
 								   that the lambda action lᵢ becomes
 								\begin{lstlisting}[mathescape,columns=fullflexible,basicstyle=\fontfamily{lmvtt}\selectfont,]
 								    (let (y₁ (field 0 x₁))
 								         (y₂ (field 1 x₁))
 								         ...
 								         (yₐ (field (a-1) x₁))
 								         lᵢ)
 								\end{lstlisting}
 								   and the result of the compilation for the set of constructors
 								   {c₁, c₂, ..., cₖ} is:
 								\begin{lstlisting}[mathescape,columns=fullflexible,basicstyle=\fontfamily{lmvtt}\selectfont,]
 								    switch x₁ with
 								    case c₁: l₁
 								    case c₂: l₂
 								    ...
 								    case cₖ: lₖ
 								    default: exit
 								\end{lstlisting}
 ) Orpat rule: there are various strategies for dealing with
 								   or-patterns. The most naive one is to split the or-patterns.
 								   For example a row p containing an or-pattern:
 								   \begin{equation*}
 								   (p_{i,1}|q_{i,1}|r_{i,1}), p_{i,2}, ..., p_{i,m} → lᵢ
 								   \end{equation*}
 								   results in three rows added to the clause matrix
 								   \begin{equation*}
 								   p_{i,1}, p_{i,2}, ..., p_{i,m} → lᵢ \\
 								   \end{equation*}
 								   \begin{equation*}
 								   q_{i,1}, p_{i,2}, ..., p_{i,m} → lᵢ \\
 								   \end{equation*}
 								   \begin{equation*}
 								   r_{i,1}, p_{i,2}, ..., p_{i,m} → lᵢ
 								   \end{equation*}
 ) Mixture rule:
 								   When none of the previous rules apply the clause matrix P → L is
-												coppo #2

											
										
										
											2020-03-12 19:37:38 +01:00
+								   split into two clause matrices, the first P₁ → L₁ that is the
-												prima versione per coppo

											
										
										
											2020-02-24 19:46:00 +01:00
+								   largest prefix matrix for which one of the three previous rules
 								   apply, and P₂ → L₂ containing the remaining rows. The algorithm is
 								   applied to both matrices.
-												proof #1

											
										
										
											2020-03-24 15:52:52 +01:00
 								* Correctness of the algorithm
 								Running a program tₛ or its translation 〚tₛ〛 against an input vₛ
 								produces as a result a result /r/ in the following way:
 								| ( 〚tₛ〛ₛ(vₛ) = Cₛ(vₛ) ) → r
 								| tₛ(vₛ) → r
 								Likewise
 								| ( 〚tₜ〛ₜ(vₜ) = Cₜ(vₜ) ) → r
 								| tₜ(vₜ) → r
 								where result r ::= guard list * (Match blackbox | NoMatch | Absurd)
 								and guard ::= blackbox.
 								Having defined equivalence between two inputs of which one is
 								expressed in the source language and the other in the target language
 								  vₛ ≃ vₜ    (TODO define, this talks about the representation of source values in the target)
 								we can define the equivalence between a couple of programs or a couple
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+								of decision trees
-												proof #1

											
										
										
											2020-03-24 15:52:52 +01:00
+								| tₛ ≃ tₜ := ∀vₛ≃vₜ, tₛ(vₛ) = tₜ(vₜ)
 								| Cₛ ≃ Cₜ := ∀vₛ≃vₜ, Cₛ(vₛ) = Cₜ(vₜ)
 								The proposed equivalence algorithm that works on a couple of
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+								decision trees is returns either /Yes/ or /No(vₛ, vₜ)/ where vₛ and
-												proof #1

											
										
										
											2020-03-24 15:52:52 +01:00
+								vₜ are a couple of possible counter examples for which the constraint
 								trees produce a different result.
 								** Statements
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+								Theorem. We say that a translation of a source program to a decision tree
-												proof #1

											
										
										
											2020-03-24 15:52:52 +01:00
+								is correct when for every possible input, the source program and its
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+								respective decision tree produces the same result
-												proof #1

											
										
										
											2020-03-24 15:52:52 +01:00
 								| ∀vₛ, tₛ(vₛ) = 〚tₛ〛ₛ(vₛ)
-												altro

											
										
										
											2020-03-29 21:24:56 +02:00
-												proof #1

											
										
										
											2020-03-24 15:52:52 +01:00
+								Likewise, for the target language:
 								| ∀vₜ, tₜ(vₜ) = 〚tₜ〛ₜ(vₜ)
 								Definition: in the presence of guards we can say that two results are
 								equivalent modulo the guards queue, written /r₁ ≃gs r₂/, when:
 								| (gs₁, r₁) ≃gs (gs₂, r₂)  ⇔  (gs₁, r₁) = (gs₂ ++ gs, r₂)
 								Definition: we say that Cₜ covers the input space /S/, written
 								/covers(Cₜ, S) when every value vₛ∈S is a valid input to the
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+								decision tree Cₜ. (TODO: rephrase)
-												proof #1

											
										
										
											2020-03-24 15:52:52 +01:00
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+								Theorem: Given an input space /S/ and a couple of decision trees, where
 								the target decision tree Cₜ covers the input space /S/, we say that
 								the two decision trees are equivalent when:
-												proof #1

											
										
										
											2020-03-24 15:52:52 +01:00
-												latex symbols and python

											
										
										
											2020-04-02 14:14:39 +02:00
+								| equiv(S, Cₛ, Cₜ, gs) = Yes ∧ covers(Cₜ, S) → ∀vₛ≃vₜ ∈ S, Cₛ(vₛ) ≃gs Cₜ(vₜ)
-												proof #1

											
										
										
											2020-03-24 15:52:52 +01:00
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+								Similarly we say that a couple of decision trees in the presence of
-												proof #1

											
										
										
											2020-03-24 15:52:52 +01:00
+								an input space /S/ are /not/ equivalent when:
-												latex symbols and python

											
										
										
											2020-04-02 14:14:39 +02:00
+								| equiv(S, Cₛ, Cₜ, gs) = No(vₛ,vₜ) ∧ covers(Cₜ, S) → vₛ≃vₜ ∈ S ∧ Cₛ(vₛ) ≠gs Cₜ(vₜ)
-												proof #1

											
										
										
											2020-03-24 15:52:52 +01:00
 								Corollary: For a full input space /S/, that is the universe of the
 								target program we say:
-												latex symbols and python

											
										
										
											2020-04-02 14:14:39 +02:00
+								| equiv(S, 〚tₛ〛ₛ, 〚tₜ〛ₜ, ∅) = Yes  ⇔  tₛ ≃ tₜ
-												altro

											
										
										
											2020-03-29 21:24:56 +02:00
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+								*** Proof of the correctness of the translation from source programs to source decision trees
-												altro

											
										
										
											2020-03-29 21:24:56 +02:00
 								We define a source term tₛ as a collection of patterns pointing to blackboxes
 								| tₛ ::= (p → bb)^{i∈I}
 								A pattern is defined as either a constructor pattern, an or pattern or
 								a constant pattern
 								| p ::= | K(pᵢ)ⁱ, i ∈ I | (p|q) | n ∈ ℕ
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+								A decision tree is defined as either a Leaf, a Failure terminal or
-												altro

											
										
										
											2020-03-29 21:24:56 +02:00
+								an intermediate node with different children sharing the same accessor /a/
 								and an optional fallback.
 								Failure is emitted only when the patterns don't cover the whole set of
 								possible input values /S/. The fallback is not needed when the user
 								doesn't use a wildcard pattern.
 								| Cₛ ::= Leaf bb | Node(a, (Kᵢ → Cᵢ)^{i∈S} , C?)
 								| a ::= Here | n.a
 								| vₛ ::= K(vᵢ)^{i∈I} | n ∈ ℕ
 								\begin{comment}
 								Are K and Here clear here?
 								\end{comment}
 								We define the decomposition matrix /mₛ/ as
 								| SMatrix mₛ := (aⱼ)^{j∈J}, ((pᵢⱼ)^{j∈J} → bbᵢ)^{i∈I}
-												continuing with proof

											
										
										
											2020-04-01 00:37:54 +02:00
+								\begin{comment}
 								Correggi prendendo in considerazione l'accessor
 								\end{comment}
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+								We define the decision tree of source programs
-												continuing with proof

											
										
										
											2020-04-01 00:37:54 +02:00
+								  [|tₛ|]
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+								in terms of the decision tree of pattern matrices
-												continuing with proof

											
										
										
											2020-04-01 00:37:54 +02:00
+								  [|mₛ|]
 								by the following:
 								  〚((pᵢ → bbᵢ)^{i∈I}〛 := 〚(Root), (pᵢ → bbᵢ)^{i∈I} 〛 # i ∈ I
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+								decision tree computed from pattern matrices respect the following invariant:
-												continuing with proof

											
										
										
											2020-04-01 00:37:54 +02:00
+								| ∀v (vᵢ)^{i∈I} = v(aᵢ)^{i∈I} → 〚m〛(v) = m(vᵢ)^{i∈I} for m = ((aᵢ)^{i∈I}, (rᵢ)^{i∈I})
 								where
 								| v(Here) = v
 								| K(vᵢ)ⁱ(k.a) = vₖ(a) if k ∈ [0;n[
 								\begin{comment}
 								TODO: EXPLAIN
 								\end{comment}
 								We proceed to show the correctness of the invariant by a case
 								analysys.
 								Base cases:
 . [| ∅, (∅ → bbᵢ)ⁱ |] := Leaf bbᵢ where i := min(I), that is a
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+								   decision tree [|ms|] defined by an empty accessor and empty
-												continuing with proof

											
										
										
											2020-04-01 00:37:54 +02:00
+								   patterns pointing to blackboxes /bbᵢ/. This respects the invariant
 								   because a decomposition matrix in the case of empty rows returns
 								   the first expression and we known that (Leaf bb)(v) := Match bb
 . [| (aⱼ)ʲ, ∅ |] := Failure
 								Regarding non base cases:
 								We define
 								| let Idx(k) = [0; arity(k)[
 								| let First(∅) = ⊥
 								| let First((aⱼ)ʲ) = a_{min(j)} ## where j ∈ J ≠ ∅
 								** Proof of equivalence checking
 								The equivalence checking algorithm takes as parameters an input space
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+								/S/, a source decision tree /Cₛ/ and a target decision tree /Cₜ/:
-												continuing with proof

											
										
										
											2020-04-01 00:37:54 +02:00
+								| equiv(S, Cₛ, Cₜ) → Yes | No(vₛ, vₜ)
 								When the algorithm returns Yes and the input space is covered by /Cₛ/
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+								we can say that the couple of decision trees are the same for
-												continuing with proof

											
										
										
											2020-04-01 00:37:54 +02:00
+								every couple of source value /vₛ/ and target value /vₜ/ that are equivalent.
 								\begin{comment}
 								Define "covered"
-												latex symbols and python

											
										
										
											2020-04-02 14:14:39 +02:00
+								Is it correct to say the same? How to correctly distinguish in words ≃ and = ?
-												continuing with proof

											
										
										
											2020-04-01 00:37:54 +02:00
+								\end{comment}
-												latex symbols and python

											
										
										
											2020-04-02 14:14:39 +02:00
+								| equiv(S, Cₛ, Cₜ) = Yes and cover(Cₜ, S) → ∀ vₛ ≃ vₜ∈S ∧ Cₛ(vₛ) = Cₜ(vₜ)
-												continuing with proof

											
										
										
											2020-04-01 00:37:54 +02:00
+								In the case where the algorithm returns No we have at least a couple
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+								of counter example values vₛ and vₜ for which the two decision trees
-												continuing with proof

											
										
										
											2020-04-01 00:37:54 +02:00
+								outputs a different result.
-												latex symbols and python

											
										
										
											2020-04-02 14:14:39 +02:00
+								| equiv(S, Cₛ, Cₜ) = No(vₛ,vₜ) and cover(Cₜ, S) → ∀ vₛ ≃ vₜ∈S ∧ Cₛ(vₛ) ≠ Cₜ(vₜ)
-												continuing with proof

											
										
										
											2020-04-01 00:37:54 +02:00
 								We define  the following
 								| Forall(Yes) = Yes
 								| Forall(Yes::l) = Forall(l)
 								| Forall(No(vₛ,vₜ)::_) = No(vₛ,vₜ)
 								There exists and are injective:
 								|  int(k)∈ℕ (ar(k) = 0)
 								|  tag(k)∈ℕ (ar(k) > 0)
 								|  π(k) = {n|int(k) = n} x {n|tag{k} = n}
 								where k is a constructor.
 								\begin{comment}
 								TODO: explain:
 								∀v∈a→π, C_{/a→π}(v) = C(v)
 								\end{comment}
 								We proceed by case analysis:
 . equiv(∅, Cₛ, Cₜ) := Yes
 								\begin{comment}
 								Devo spiegarlo?
 								\end{comment}
 								In the other subcases S is always non-empty.
 . equiv(S, Failure, Failure) := Yes
 								   the statement holds because of equality between Failure nodes in
 								   the case of every possible value /v/.
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+. The result of the subcase where we have a source decision tree
-												continuing with proof

											
										
										
											2020-04-01 00:37:54 +02:00
+								   /Cₛ/ that is either a Leaf terminal or a Failure terminal and a
-												s/constraint/decision/g

											
										
										
											2020-04-01 17:16:57 +02:00
+								   target decision tree defined by an accessor /a/ and a positive
-												continuing with proof

											
										
										
											2020-04-01 00:37:54 +02:00
+								   number of couples constraint πᵢ and children nodes Cₜᵢ. The output
 								   the output of the algorithm is:
 								   | equiv(S, (Leaf bbₛ|Failure) as Cₛ, Node(a, (πᵢ → Cₜᵢ)ⁱ)) :=  Forall(equiv( S∩a→π(kᵢ)), Cₛ, Cₜᵢ)ⁱ)
 								   The statement holds because defined let Sᵢ := S∩(a→πᵢ)
 								   either the algorithm is true for every sub-input space Sᵢ and
 								   subtree Cₜᵢ
 								   | equiv(Sᵢ, Cₛ, Cₜᵢ) = Yes ∀i
 								   or we have a counter example vₛ, vₜ for which
-												latex symbols and python

											
										
										
											2020-04-02 14:14:39 +02:00
+								   | vₛ≃vₜ∈Sₖ ∧ cₛ(vₛ) ≠ Cₜₖ(vₜ)
-												continuing with proof

											
										
										
											2020-04-01 00:37:54 +02:00
+								   then because
 								   | vₜ∈(a→πₖ) ⇒ Cₜ(vₜ) = Cₜₖ(vₜ)
 								   then
-												latex symbols and python

											
										
										
											2020-04-02 14:14:39 +02:00
+								   | vₛ≃vₜ∈S ∧ Cₛ(vₛ)≠Cₜ(vₜ)
-												continuing with proof

											
										
										
											2020-04-01 00:37:54 +02:00
+								   and the result of the algorithm is
 								   | equiv(Sᵢ, Cₛ, Cₜᵢ) = No(vₛ, vₜ) for some minimal k∈I