UniTO/tesi/tesi_unicode.org
2020-02-24 14:36:26 +01:00

271 lines
11 KiB
Org Mode

* TODO Scaletta [1/2]
- [X] Abstract
- [-] Background [20%]
- [X] Ocaml
- [ ] Lambda code [0%]
- [ ] Untyped lambda form
- [ ] OCaml specific instructions
- [-] Pattern matching [50%]
- [X] Introduzione
- [ ] Compilation to lambda form
- [ ] Translation Verification
- [ ] Symbolic execution
- [ ] Translation verification of the Pattern Matching Compiler
- [ ] Source translation
- [ ] Formal Grammar
- [ ] Compilation of source patterns
- [ ] Target translation
- [ ] Formal Grammar
- [ ] Symbolic execution of the lambda target
- [ ] Equivalence between source and target
- [ ] Practical results
#+TITLE: Translation Verification of the OCaml pattern matching compiler
#+AUTHOR: Francesco Mecca
#+EMAIL: me@francescomecca.eu
#+DATE:
#+LANGUAGE: en
#+LaTeX_CLASS: article
#+LaTeX_HEADER: \usepackage[utf8]{inputenc}
#+LaTeX_HEADER: \usepackage{algorithm}
#+LaTeX_HEADER: \usepackage{algpseudocode}
#+LaTeX_HEADER: \usepackage{amsmath,amssymb,amsthm}
#+Latex_HEADER: \newtheorem{definition}{Definition}
#+LaTeX_HEADER: \usepackage{graphicx}
#+LaTeX_HEADER: \usepackage{listings}
#+LaTeX_HEADER: \usepackage{color}
#+EXPORT_SELECT_TAGS: export
#+EXPORT_EXCLUDE_TAGS: noexport
#+OPTIONS: H:2 toc:nil \n:nil @:t ::t |:t ^:{} _:{} *:t TeX:t LaTeX:t
#+STARTUP: showall
\begin{abstract}
This dissertation presents an algorithm for the translation validation of the OCaml
pattern matching compiler. Given the source representation of the target program and the
target program compiled in untyped lambda form, the algoritmhm is capable of modelling
the source program in terms of symbolic constraints on it's branches and apply symbolic
execution on the untyped lambda representation in order to validate wheter the compilation
produced a valid result.
In this context a valid result means that for every input in the domain of the source
program the untyped lambda translation produces the same output as the source program.
The input of the program is modelled in terms of symbolic constraints closely related to
the runtime representation of OCaml objects and the output consists of OCaml code
blackboxes that are not evaluated in the context of the verification.
\end{abstract}
* Background
** OCaml
Objective Caml (OCaml) is a dialect of the ML (Meta-Language) family of programming
languages.
OCaml shares many features with other dialects of ML, such as SML and Caml Light,
The main features of ML languages are the use of the Hindley-Milner type system that
provides many advantages with respect to static type systems of traditional imperative and object
oriented language such as C, C++ and Java, such as:
- Parametric polymorphism: in certain scenarios a function can accept more than one
type for the input parameters. For example a function that computes the lenght of a
list doesn't need to inspect the type of the elements of the list and for this reason
a List.length function can accept list of integers, list of strings and in general
list of any type. Such languages offer polymorphic functions through subtyping at
runtime only, while other languages such as C++ offer polymorphism through compile
time templates and function overloading.
With the Hindley-Milner type system each well typed function can have more than one
type but always has a unique best type, called the /principal type/.
For example the principal type of the List.length function is "For any /a/, function from
list of /a/ to /int/" and /a/ is called the /type parameter/.
- Strong typing: Languages such as C and C++ allow the programmer to operate on data
without considering its type, mainly through pointers. Other languages such as C#
and Go allow type erasure so at runtime the type of the data can't be queried.
In the case of programming languages using an Hindley-Milner type system the
programmer is not allowed to operate on data by ignoring or promoting its type.
- Type Inference: the principal type of a well formed term can be inferred without any
annotation or declaration.
- Algebraic data types: types that are modelled by the use of two
algebraic operations, sum and product.
A sum type is a type that can hold of many different types of
objects, but only one at a time. For example the sum type defined
as /A + B/ can hold at any moment a value of type A or a value of
type B. Sum types are also called tagged union or variants.
A product type is a type constructed as a direct product
of multiple types and contains at any moment one instance for
every type of its operands. Product types are also called tuples
or records. Algebraic data types can be recursive
in their definition and can be combined.
Moreover ML languages are functional, meaning that functions are
treated as first class citizens and variables are immutable,
although mutable statements and imperative constructs are permitted.
In addition to that OCaml features an object system, that provides
inheritance, subtyping and dynamic binding, and modules, that
provide a way to encapsulate definitions. Modules are checked
statically and can be reificated through functors.
*** TODO Pattern matching [37%]
- [ ] capisci come mettere gli esempi uno accanto all'altro
Pattern matching is a widely adopted mechanism to interact with ADT.
C family languages provide branching on predicates through the use of
if statements and switch statements.
Pattern matching is a mechanism for destructuring and analyzing data
structures for the presence of values simbolically represented as
tokens. One common example of pattern matching is the use of regular
expressions on strings. OCaml provides pattern matching on ADT,
primitive data types.
- [X] Esempio enum, C e Ocaml
#+BEGIN_SRC ocaml
type color = | Red | Blue | Green
begin match color with
| Red -> print "red"
| Blue -> print "red"
| Green -> print "red"
#+END_SRC
OCaml provides tokens to express data destructoring
- [X] Esempio destructor list
#+BEGIN_SRC ocaml
begin match list with
| [ ] -> print "empty list"
| element1 :: [ ] -> print "one element"
| element1 :: element2 :: [ ] -> print "two elements"
| head :: tail-> print "head followed by many elements"
#+END_SRC
- [X] Esempio destructor tuples
#+BEGIN_SRC ocaml
begin match tuple with
| (Some _, Some _) -> print "Pair of optional types"
| (Some _, None) -> print "Pair of optional types, last null"
| (None, Some _) -> print "Pair of optional types, first null"
| (None, None) -> print "Pair of optional types, both null"
#+END_SRC
Pattern clauses can make the use of /guards/ to test predicates and
variables can be binded in scope.
- [ ] Esempio binding e guards
#+BEGIN_SRC ocaml
begin match token_list with
| "switch"::var::"{"::rest ->
| "case"::":"::var::rest when is_int var ->
| "case"::":"::var::rest when is_string var ->
| "}"::[ ] -> stop ()
| "}"::rest -> error "syntax error: " rest
#+END_SRC
- [ ] Un altro esempio con destructors e tutto i lresto
In general pattern matching on primitive and algebraic data types takes the
following form.
- [ ] Esempio informale
It can be described more formally through a BNF grammar.
- [ ] BNF
- [ ] Come funziona il pattern matching?
*** TODO 1.2.1 Pattern matching compilation to lambda code
- [ ] Da tabella a matrice
Formally, pattern are defined as follows:
| pattern | Patterns |
|-----------------+---------------------|
| _ | wildcard |
| x | variable |
| c(p₁,p₂,...,pₙ) | constructor pattern |
| (p₁\vert p₂) | or-pattern |
Values are defined as follows:
| values | Values |
|---------------------+-------------------|
| c(v₁, v₂, ..., vₙ) | constructor value |
The entire pattern matching code can be represented as a clause matrix
that associates rows of patterns (p_{i,1}, p_{i,2}, ..., p_{i,n}) to
lambda code action lⁱ
\begin{equation*}
(P → L) =
\begin{pmatrix}
p_{1,1} & p_{1,2} & \cdots & p_{1,n} & → l₁ \\
p_{2,1} & p_{2,2} & \cdots & p_{2,n} & → l₂ \\
\vdots & \vdots & \ddots & \vdots & → \vdots \\
p_{m,1} & p_{m,2} & \cdots & p_{m,n} & → lₘ
\end{pmatrix}
\end{equation*}
Most native data types in OCaml, such as integers, tuples, lists,
records, can be seen as instances of the following definition
#+BEGIN_SRC ocaml
type t = Nil | One of int | Cons of int * t
#+END_SRC
that is a type /t/ with three constructors that define its complete
signature.
Every constructor has an arity. Nil, a constructor of arity 0, is
called a constant constructor.
The pattern /p/ matches a value /v/, written as p ≼ v, when one of the
following rules apply
| | | | |
|--------------------+---+--------------------+-------------------------------------------|
| _ | ≼ | v | ∀v |
| x | ≼ | v | ∀v |
| (p₁ \vert\ p₂) | ≼ | v | iff p₁ ≼ v or p₂ ≼ v |
| c(p₁, p₂, ..., pₐ) | ≼ | c(v₁, v₂, ..., vₐ) | iff (p₁, p₂, ..., pₐ) ≼ (v₁, v₂, ..., vₐ) |
| (p₁, p₂, ..., pₐ) | ≼ | (v₁, v₂, ..., vₐ) | iff pᵢ ≼ vᵢ ∀i ∈ [1..a] |
|--------------------+---+--------------------+-------------------------------------------|
We can also say that /v/ is an /instance/ of /p/.
When we consider the pattern matrix P we say that the value vector
$\vec{v}$ = (v₁, v₂, ..., vᵢ) matches the line number i in P if and only if the following two
conditions are satisfied:
- p_{i,1}, p_{i,2}, \cdots, p_{i,n} ≼ (v₁, v₂, ..., vᵢ)
- ∀j < i p_{j,1}, p_{j,2}, \cdots, p_{j,n} ⋠ (v₁, v₂, ..., vᵢ)
We can define the following three relations with respect to patterns:
- Patter p is less precise than pattern q, written p ≼ q, when all
instances of q are instances of p
- Pattern p and q are equivalent, written p ≡ q, when their instances
are the same
- Patterns p and q are compatible when they share a common instance
** 1.2.1.1 Initial state of the compilation
Given a source of the following form:
#+BEGIN_SRC ocaml
match x with
| p₁ -> e₁
| p₂ -> e₂
...
| pₘ -> eₘ
#+END_SRC ocaml
the initial input of the algorithm consists of a vector of variables
$\vec{x}$ = (x₁, x₂, ..., xₙ) of size n
and a clause matrix P → L of width n and height m.
\begin{equation*}
(P → L) =
\begin{pmatrix}
p_{1,1} & p_{1,2} & \cdots & p_{1,n} → l₁ \\
p_{2,1} & p_{2,2} & \cdots & p_{2,n} → l₂ \\
\vdots & \vdots & \ddots \vdots → \vdots \\
p_{m,1} & p_{m,2} & \cdots & p_{m,n} → lₘ
\end{pmatrix}
\end{equation*}