inizio source

This commit is contained in:
Francesco Mecca 2020-03-03 17:18:40 +01:00
parent 2cd2b73451
commit e4fc6b92aa
4 changed files with 346 additions and 149 deletions

View file

@ -3,7 +3,7 @@ import re
from sys import argv from sys import argv
allsymbols = json.load(open('./unicode-latex.json')) allsymbols = json.load(open('./unicode-latex.json'))
mysymbols = ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''] mysymbols = ['', '', '', '', '', '', '', '', '', '', 'ε','', '', '', '', '', '', '', '', '', '', '']
symbols = {s: allsymbols[s] for s in mysymbols} symbols = {s: allsymbols[s] for s in mysymbols}
mathsymbols = {s: '$'+allsymbols[s]+'$' for s in symbols} mathsymbols = {s: '$'+allsymbols[s]+'$' for s in symbols}

Binary file not shown.

View file

@ -2,7 +2,7 @@
* TODO Scaletta [1/5] * TODO Scaletta [1/5]
- [X] Abstract - [X] Abstract
- [-] Background [40%] - [-] Background [40%]
- [X] Ocaml - [X]
- [ ] Lambda code [0%] - [ ] Lambda code [0%]
- [ ] Compiler optimizations - [ ] Compiler optimizations
- [ ] other instructions - [ ] other instructions
@ -23,7 +23,7 @@
\end{comment} \end{comment}
#+TITLE: Translation Verification of the OCaml pattern matching compiler #+TITLE: Translation Verification of the pattern matching compiler
#+AUTHOR: Francesco Mecca #+AUTHOR: Francesco Mecca
#+EMAIL: me@francescomecca.eu #+EMAIL: me@francescomecca.eu
#+DATE: #+DATE:
@ -44,7 +44,7 @@
#+STARTUP: showall #+STARTUP: showall
\begin{abstract} \begin{abstract}
This dissertation presents an algorithm for the translation validation of the OCaml This dissertation presents an algorithm for the translation validation of the
pattern matching compiler. Given the source representation of the target program and the pattern matching compiler. Given the source representation of the target program and the
target program compiled in untyped lambda form, the algoritmhm is capable of modelling target program compiled in untyped lambda form, the algoritmhm is capable of modelling
the source program in terms of symbolic constraints on it's branches and apply symbolic the source program in terms of symbolic constraints on it's branches and apply symbolic
@ -53,17 +53,17 @@ produced a valid result.
In this context a valid result means that for every input in the domain of the source In this context a valid result means that for every input in the domain of the source
program the untyped lambda translation produces the same output as the source program. program the untyped lambda translation produces the same output as the source program.
The input of the program is modelled in terms of symbolic constraints closely related to The input of the program is modelled in terms of symbolic constraints closely related to
the runtime representation of OCaml objects and the output consists of OCaml code the runtime representation of objects and the output consists of OCaml code
blackboxes that are not evaluated in the context of the verification. blackboxes that are not evaluated in the context of the verification.
\end{abstract} \end{abstract}
* Background * Background
** OCaml **
Objective Caml (OCaml) is a dialect of the ML (Meta-Language) family of programming Objective Caml () is a dialect of the ML (Meta-Language) family of programming
languages. languages.
OCaml shares many features with other dialects of ML, such as SML and Caml Light, shares many features with other dialects of ML, such as SML and Caml Light,
The main features of ML languages are the use of the Hindley-Milner type system that The main features of ML languages are the use of the Hindley-Milner type system that
provides many advantages with respect to static type systems of traditional imperative and object provides many advantages with respect to static type systems of traditional imperative and object
oriented language such as C, C++ and Java, such as: oriented language such as C, C++ and Java, such as:
@ -99,33 +99,33 @@ oriented language such as C, C++ and Java, such as:
Moreover ML languages are functional, meaning that functions are Moreover ML languages are functional, meaning that functions are
treated as first class citizens and variables are immutable, treated as first class citizens and variables are immutable,
although mutable statements and imperative constructs are permitted. although mutable statements and imperative constructs are permitted.
In addition to that OCaml features an object system, that provides In addition to that features an object system, that provides
inheritance, subtyping and dynamic binding, and modules, that inheritance, subtyping and dynamic binding, and modules, that
provide a way to encapsulate definitions. Modules are checked provide a way to encapsulate definitions. Modules are checked
statically and can be reificated through functors. statically and can be reificated through functors.
** Lambda form compilation ** Lambda form compilation
\begin{comment} \begin{comment}
https://dev.realworldocaml.org/compiler-backend.html https://dev.realworld.org/compiler-backend.html
\end{comment} \end{comment}
OCaml provides compilation in form of a byecode executable with an provides compilation in form of a byecode executable with an
optionally embeddable interpreter and a native executable that could optionally embeddable interpreter and a native executable that could
be statically linked to provide a single file executable. be statically linked to provide a single file executable.
After the OCaml typechecker has proven that the program is type safe, After the typechecker has proven that the program is type safe,
the OCaml compiler lower the code to /Lambda/, an s-expression based the compiler lower the code to /Lambda/, an s-expression based
language that assumes that its input has already been proved safe. language that assumes that its input has already been proved safe.
On the /Lambda/ representation of the source program, the compiler On the /Lambda/ representation of the source program, the compiler
performes a series of optimization passes before translating the performes a series of optimization passes before translating the
lambda form to assembly code. lambda form to assembly code.
*** OCaml datatypes *** datatypes
Most native data types in OCaml, such as integers, tuples, lists, Most native data types in , such as integers, tuples, lists,
records, can be seen as instances of the following definition records, can be seen as instances of the following definition
#+BEGIN_SRC ocaml #+BEGIN_SRC
type t = Nil | One of int | Cons of int * t type t = Nil | One of int | Cons of int * t
#+END_SRC #+END_SRC
that is a type /t/ with three constructors that define its complete that is a type /t/ with three constructors that define its complete
@ -134,7 +134,7 @@ Every constructor has an arity. Nil, a constructor of arity 0, is
called a constant constructor. called a constant constructor.
*** Lambda form types *** Lambda form types
A lambda form target file produced by the ocaml compiler consists of a A lambda form target file produced by the compiler consists of a
single s-expression. Every s-expression consist of /(/, a sequence of single s-expression. Every s-expression consist of /(/, a sequence of
elements separated by a whitespace and a closing /)/. elements separated by a whitespace and a closing /)/.
Elements of s-expressions are: Elements of s-expressions are:
@ -182,7 +182,7 @@ if statements and switch statements.
Pattern matching on the other hands express predicates through Pattern matching on the other hands express predicates through
syntactic templates that also allow to bind on data structures of syntactic templates that also allow to bind on data structures of
arbitrary shapes. One common example of pattern matching is the use of regular arbitrary shapes. One common example of pattern matching is the use of regular
expressions on strings. OCaml provides pattern matching on ADT and expressions on strings. provides pattern matching on ADT and
primitive data types. primitive data types.
The result of a pattern matching operation is always one of: The result of a pattern matching operation is always one of:
- this value does not match this pattern” - this value does not match this pattern”
@ -190,7 +190,7 @@ The result of a pattern matching operation is always one of:
names to values and the jump to the expression pointed at the names to values and the jump to the expression pointed at the
pattern. pattern.
#+BEGIN_SRC ocaml #+BEGIN_SRC
type color = | Red | Blue | Green | Black | White type color = | Red | Blue | Green | Black | White
match color with match color with
@ -200,10 +200,10 @@ match color with
| _ -> print "white or black" | _ -> print "white or black"
#+END_SRC #+END_SRC
OCaml provides tokens to express data destructoring. provides tokens to express data destructoring.
For example we can examine the content of a list with patten matching For example we can examine the content of a list with patten matching
#+BEGIN_SRC ocaml #+BEGIN_SRC
begin match list with begin match list with
| [ ] -> print "empty list" | [ ] -> print "empty list"
@ -216,7 +216,7 @@ Parenthesized patterns, such as the third one in the previous example,
matches the same value as the pattern without parenthesis. matches the same value as the pattern without parenthesis.
The same could be done with tuples The same could be done with tuples
#+BEGIN_SRC ocaml #+BEGIN_SRC
begin match tuple with begin match tuple with
| (Some _, Some _) -> print "Pair of optional types" | (Some _, Some _) -> print "Pair of optional types"
@ -231,7 +231,7 @@ pattern₂ if it matches pattern₁ or pattern₂.
Pattern clauses can make the use of /guards/ to test predicates and Pattern clauses can make the use of /guards/ to test predicates and
variables can captured (binded in scope). variables can captured (binded in scope).
#+BEGIN_SRC ocaml #+BEGIN_SRC
begin match token_list with begin match token_list with
| "switch"::var::"{"::rest -> ... | "switch"::var::"{"::rest -> ...
@ -242,11 +242,97 @@ begin match token_list with
#+END_SRC #+END_SRC
Moreover, the OCaml pattern matching compiler emits a warning when a Moreover, the pattern matching compiler emits a warning when a
pattern is not exhaustive or some patterns are shadowed by precedent ones. pattern is not exhaustive or some patterns are shadowed by precedent ones.
In general pattern matching on primitive and algebraic data types takes the ** Symbolic execution
following form.
** Translation validation
Translators, such as translators and code generators, are huge pieces of
software usually consisting of multiple subsystem and
constructing an actual specification of a translator implementation for
formal validation is a very long task. Moreover, different
translators implement different algorithms, so the correctness proof of
a translator cannot be generalized and reused to prove another translator.
Translation validation is an alternative to the verification of
existing translators that consists of taking the source and the target
(compiled) program and proving /a posteriori/ their semantic equivalence.
- [ ] Techniques for translation validation
- [ ] What does semantically equivalent mean
- [ ] What happens when there is no semantic equivalence
- [ ] Translation validation through symbolic execution
** Translation validation of the Pattern Matching Compiler
*** Source program
The algorithm takes as its input a source program and translates it
into an algebraic data structure called /constraint_tree/.
#+BEGIN_SRC
type constraint_tree =
| Unreachable
| Failure
| Leaf of source_expr
| Guard of source_blackbox * constraint_tree * constraint_tree
| Node of accessor * (constructor * constraint_tree) list * constraint_tree
#+END_SRC
Unreachable, Leaf of source_expr and Failure are the terminals of the three.
We distinguish
- Unreachable: statically it is known that no value can go there
- Failure: a value matching this part results in an error
- Leaf: a value matching this part results into the evaluation of a
source blackbox of code
The algorithm doesn't support type-declaration-based analysis
to know the list of constructors at a given type.
Let's consider some trivial examples:
#+BEGIN_SRC
function true -> 1
#+END_SRC
[ ] Converti a disegni
Is translated to
|Node ([(true, Leaf 1)], Failure)
while
#+BEGIN_SRC
function
true -> 1
| false -> 2
#+END_SRC
will give
|Node ([(true, Leaf 1); (false, Leaf 2)])
It is possible to produce Unreachable examples by using
refutation clauses (a "dot" in the right-hand-side)
#+BEGIN_SRC
function
true -> 1
| false -> 2
| _ -> .
#+END_SRC
that gets translated into
Node ([(true, Leaf 1); (false, Leaf 2)], Unreachable)
We trust this annotation, which is reasonable as the type-checker
verifies that it indeed holds.
Guard nodes of the tree are emitted whenever a guard is found. Guards
node contains a blackbox of code that is never evaluated and two
branches, one that is taken in case the guard evaluates to true and
the other one that contains the path taken when the guard evaluates to
true.
[ ] Finisci con Node
[ ] Spiega del fallback
[ ] rivedi compilazione per tenere in considerazione il tuo albero invece che le lambda
[ ] Specifica che stesso algoritmo usato per compilare a lambda, piu` optimizations
The source code of a pattern matching function in has the
following form:
|match variable with |match variable with
|\vert pattern₁ -> expr₁ |\vert pattern₁ -> expr₁
@ -255,30 +341,57 @@ following form.
|⋮ |⋮
|\vert pₙ -> exprₙ |\vert pₙ -> exprₙ
It can be described more formally through a BNF grammar. and can include any expression that is legal for the compiler,
such as "when" conditions and assignments. Patterns could or could not
be exhaustive.
Pattern matching code could also be written using the more compact form:
|function
|\vert pattern₁ -> expr₁
|\vert pattern₂ when guard -> expr₂
|\vert pattern₃ as var -> expr₃
|⋮
|\vert pₙ -> exprₙ
This BNF grammar describes formally the grammar of the source program:
#+BEGIN_SRC bnf #+BEGIN_SRC bnf
start ::= "match" id "with" patterns | "function" patterns
patterns ::= (pattern0|pattern1) pattern1+
;; pattern0 and pattern1 are needed to distinguish the first case in which
;; we can avoid writing the optional vertical line
pattern0 ::= clause
pattern1 ::= "|" clause
clause ::= lexpr "->" rexpr
pattern ::= value-name lexpr ::= rule (ε|condition)
| _ ;; wildcard pattern rexpr ::= _code ;; arbitrary code
| constant ;; matches a constant value
| pattern as value-name ;; binds to value-name
| ( pattern ) ;; parenthesized pattern
| pattern | pattern ;; or-pattern
| constr pattern ;; variant pattern
| [ pattern { ; pattern } [ ; ] ] ;; list patterns
| pattern :: pattern ;; lists patterns using cons operator (::)
| [| pattern { ; pattern } [ ; ] |] ;; array pattern
| char-literal .. char-literal ;; match on a range of characters
| { field [: typexpr] [= pattern] { ; field [: typexpr] [= pattern] } \
[; _ ] [ ; ] } ;; patterns that match on records
rule ::= wildcard|variable|constructor_pattern|or_pattern ;;
;; rules
wildcard ::= "_"
variable ::= identifier
constructor_pattern ::= constructor (rule|ε) (assignment|ε)
constructor ::= int|float|char|string|bool
|unit|record|exn|objects|ref
|list|tuple|array
|variant|parameterized_variant ;; data types
or_pattern ::= wildcard|variable|constructor_pattern ("|" wildcard|variable|constructor_pattern)+
condition ::= "when" bexpr
assignment ::= "as" id
bexpr ::= _code ;; arbitrary code
#+END_SRC #+END_SRC
\begin{comment} \begin{comment}
*** 1.2.1 Pattern matching compilation to lambda code Check that it is still this
\end{comment}
During compilation, patterns are in the form Patterns are of the form
| pattern | type of pattern | | pattern | type of pattern |
|-----------------+---------------------| |-----------------+---------------------|
| _ | wildcard | | _ | wildcard |
@ -286,10 +399,10 @@ During compilation, patterns are in the form
| c(p₁,p₂,...,pₙ) | constructor pattern | | c(p₁,p₂,...,pₙ) | constructor pattern |
| (p₁\vert p₂) | or-pattern | | (p₁\vert p₂) | or-pattern |
Expressions are compiled into lambda code and are referred as lambda During compilation by the translators expressions are compiled into
code actions. lambda code and are referred as lambda code actions lᵢ.
The entire pattern matching code can be represented as a clause matrix The entire pattern matching code is represented as a clause matrix
that associates rows of patterns (p_{i,1}, p_{i,2}, ..., p_{i,n}) to that associates rows of patterns (p_{i,1}, p_{i,2}, ..., p_{i,n}) to
lambda code action lⁱ lambda code action lⁱ
\begin{equation*} \begin{equation*}
@ -449,34 +562,8 @@ following four rules:
apply, and P₂ → L₂ containing the remaining rows. The algorithm is apply, and P₂ → L₂ containing the remaining rows. The algorithm is
applied to both matrices. applied to both matrices.
\end{comment}
\begin{comment} \begin{comment}
#+BEGIN_COMMENT #+BEGIN_COMMENT
CITE paper? CITE paper?
#+END_COMMENT #+END_COMMENT
\end{comment} \end{comment}
** Symbolic execution
** Translation validation
Translators, such as translators and code generators, are huge pieces of
software usually consisting of multiple subsystem and
constructing an actual specification of a translator implementation for
formal validation is a very long task. Moreover, different
translators implement different algorithms, so the correctness proof of
a translator cannot be generalized and reused to prove another translator.
Translation validation is an alternative to the verification of
existing translators that consists of taking the source and the target
(compiled) program and proving /a posteriori/ their semantic equivalence.
- [ ] Techniques for translation validation
- [ ] What does semantically equivalent mean
- [ ] What happens when there is no semantic equivalence
- [ ] Translation validation through symbolic execution
** Translation validation of the Pattern Matching Compiler
- [ ] Source translation
- [ ] Formal Grammar
- [ ] Compilation of source patterns
- [ ] Rest?

View file

@ -1,4 +1,4 @@
% Created 2020-03-02 Mon 14:31 % Created 2020-03-03 Tue 17:18
% Intended LaTeX compiler: pdflatex % Intended LaTeX compiler: pdflatex
\documentclass[11pt]{article} \documentclass[11pt]{article}
\usepackage[utf8]{inputenc} \usepackage[utf8]{inputenc}
@ -24,10 +24,10 @@
\usepackage{color} \usepackage{color}
\author{Francesco Mecca} \author{Francesco Mecca}
\date{} \date{}
\title{Translation Verification of the OCaml pattern matching compiler} \title{Translation Verification of the pattern matching compiler}
\hypersetup{ \hypersetup{
pdfauthor={Francesco Mecca}, pdfauthor={Francesco Mecca},
pdftitle={Translation Verification of the OCaml pattern matching compiler}, pdftitle={Translation Verification of the pattern matching compiler},
pdfkeywords={}, pdfkeywords={},
pdfsubject={}, pdfsubject={},
pdfcreator={Emacs 26.3 (Org mode 9.1.9)}, pdfcreator={Emacs 26.3 (Org mode 9.1.9)},
@ -37,12 +37,13 @@
\maketitle \maketitle
\begin{comment} \begin{comment}
\section{{\bfseries\sffamily TODO} Scaletta [1/5]} \section{{\bfseries\sffamily TODO} Scaletta [1/5]}
\label{sec:org62901c2} \label{sec:org7578cff}
\begin{itemize} \begin{itemize}
\item[{$\boxtimes$}] Abstract \item[{$\boxtimes$}] Abstract
\item[{$\boxminus$}] Background [40\%] \item[{$\boxminus$}] Background [40\%]
\begin{itemize} \begin{itemize}
\item[{$\boxtimes$}] Ocaml \item[{$\boxtimes$}]
\item[{$\square$}] Lambda code [0\%] \item[{$\square$}] Lambda code [0\%]
\begin{itemize} \begin{itemize}
\item[{$\square$}] Compiler optimizations \item[{$\square$}] Compiler optimizations
@ -58,6 +59,7 @@
\begin{itemize} \begin{itemize}
\item[{$\square$}] Formal Grammar \item[{$\square$}] Formal Grammar
\item[{$\square$}] Compilation of source patterns \item[{$\square$}] Compilation of source patterns
\item[{$\square$}] Rest?
\end{itemize} \end{itemize}
\item[{$\square$}] Target translation \item[{$\square$}] Target translation
\begin{itemize} \begin{itemize}
@ -74,7 +76,7 @@
\begin{abstract} \begin{abstract}
This dissertation presents an algorithm for the translation validation of the OCaml This dissertation presents an algorithm for the translation validation of the
pattern matching compiler. Given the source representation of the target program and the pattern matching compiler. Given the source representation of the target program and the
target program compiled in untyped lambda form, the algoritmhm is capable of modelling target program compiled in untyped lambda form, the algoritmhm is capable of modelling
the source program in terms of symbolic constraints on it's branches and apply symbolic the source program in terms of symbolic constraints on it's branches and apply symbolic
@ -83,19 +85,19 @@ produced a valid result.
In this context a valid result means that for every input in the domain of the source In this context a valid result means that for every input in the domain of the source
program the untyped lambda translation produces the same output as the source program. program the untyped lambda translation produces the same output as the source program.
The input of the program is modelled in terms of symbolic constraints closely related to The input of the program is modelled in terms of symbolic constraints closely related to
the runtime representation of OCaml objects and the output consists of OCaml code the runtime representation of objects and the output consists of OCaml code
blackboxes that are not evaluated in the context of the verification. blackboxes that are not evaluated in the context of the verification.
\end{abstract} \end{abstract}
\section{Background} \section{Background}
\label{sec:orgbf4de70} \label{sec:org5b6accf}
\subsection{OCaml} \subsection{}
\label{sec:orga9a97c9} \label{sec:org3c9e604}
Objective Caml (OCaml) is a dialect of the ML (Meta-Language) family of programming Objective Caml () is a dialect of the ML (Meta-Language) family of programming
languages. languages.
OCaml shares many features with other dialects of ML, such as SML and Caml Light, shares many features with other dialects of ML, such as SML and Caml Light,
The main features of ML languages are the use of the Hindley-Milner type system that The main features of ML languages are the use of the Hindley-Milner type system that
provides many advantages with respect to static type systems of traditional imperative and object provides many advantages with respect to static type systems of traditional imperative and object
oriented language such as C, C++ and Java, such as: oriented language such as C, C++ and Java, such as:
@ -133,33 +135,33 @@ in their definition and can be combined.
Moreover ML languages are functional, meaning that functions are Moreover ML languages are functional, meaning that functions are
treated as first class citizens and variables are immutable, treated as first class citizens and variables are immutable,
although mutable statements and imperative constructs are permitted. although mutable statements and imperative constructs are permitted.
In addition to that OCaml features an object system, that provides In addition to that features an object system, that provides
inheritance, subtyping and dynamic binding, and modules, that inheritance, subtyping and dynamic binding, and modules, that
provide a way to encapsulate definitions. Modules are checked provide a way to encapsulate definitions. Modules are checked
statically and can be reificated through functors. statically and can be reificated through functors.
\subsection{Lambda form compilation} \subsection{Lambda form compilation}
\label{sec:org2d10d35} \label{sec:org6065c14}
\begin{comment} \begin{comment}
https://dev.realworldocaml.org/compiler-backend.html https://dev.realworld.org/compiler-backend.html
\end{comment} \end{comment}
OCaml provides compilation in form of a byecode executable with an provides compilation in form of a byecode executable with an
optionally embeddable interpreter and a native executable that could optionally embeddable interpreter and a native executable that could
be statically linked to provide a single file executable. be statically linked to provide a single file executable.
After the OCaml typechecker has proven that the program is type safe, After the typechecker has proven that the program is type safe,
the OCaml compiler lower the code to \emph{Lambda}, an s-expression based the compiler lower the code to \emph{Lambda}, an s-expression based
language that assumes that its input has already been proved safe. language that assumes that its input has already been proved safe.
On the \emph{Lambda} representation of the source program, the compiler On the \emph{Lambda} representation of the source program, the compiler
performes a series of optimization passes before translating the performes a series of optimization passes before translating the
lambda form to assembly code. lambda form to assembly code.
\begin{enumerate} \begin{enumerate}
\item OCaml datatypes \item datatypes
\label{sec:orgd605d09} \label{sec:org7b158eb}
Most native data types in OCaml, such as integers, tuples, lists, Most native data types in , such as integers, tuples, lists,
records, can be seen as instances of the following definition records, can be seen as instances of the following definition
\begin{verbatim} \begin{verbatim}
@ -171,8 +173,8 @@ Every constructor has an arity. Nil, a constructor of arity 0, is
called a constant constructor. called a constant constructor.
\item Lambda form types \item Lambda form types
\label{sec:org1ee20a6} \label{sec:org737fa2f}
A lambda form target file produced by the ocaml compiler consists of a A lambda form target file produced by the compiler consists of a
single s-expression. Every s-expression consist of \emph{(}, a sequence of single s-expression. Every s-expression consist of \emph{(}, a sequence of
elements separated by a whitespace and a closing \emph{)}. elements separated by a whitespace and a closing \emph{)}.
Elements of s-expressions are: Elements of s-expressions are:
@ -203,7 +205,7 @@ The are varios numeric operations defined:
\end{itemize} \end{itemize}
\item Functions \item Functions
\label{sec:org914d5eb} \label{sec:org369db83}
Functions are defined using the following syntax, and close over all Functions are defined using the following syntax, and close over all
bindings in scope: (lambda (arg1 arg2 arg3) BODY) bindings in scope: (lambda (arg1 arg2 arg3) BODY)
@ -211,19 +213,19 @@ and are applied using the following syntax: (apply FUNC ARG ARG ARG)
Evaluation is eager. Evaluation is eager.
\item Bindings \item Bindings
\label{sec:org055206b} \label{sec:org120bc74}
The atom \emph{let} introduces a sequence of bindings: The atom \emph{let} introduces a sequence of bindings:
(let BINDING BINDING BINDING \ldots{} BODY) (let BINDING BINDING BINDING \ldots{} BODY)
\item Other atoms \item Other atoms
\label{sec:org0f92182} \label{sec:org58bd28f}
TODO: if, switch, stringswitch\ldots{} TODO: if, switch, stringswitch\ldots{}
TODO: magari esempi TODO: magari esempi
\end{enumerate} \end{enumerate}
\subsection{Pattern matching} \subsection{Pattern matching}
\label{sec:org9876fb9} \label{sec:org5d3b2f5}
Pattern matching is a widely adopted mechanism to interact with ADT. Pattern matching is a widely adopted mechanism to interact with ADT.
C family languages provide branching on predicates through the use of C family languages provide branching on predicates through the use of
@ -231,7 +233,7 @@ if statements and switch statements.
Pattern matching on the other hands express predicates through Pattern matching on the other hands express predicates through
syntactic templates that also allow to bind on data structures of syntactic templates that also allow to bind on data structures of
arbitrary shapes. One common example of pattern matching is the use of regular arbitrary shapes. One common example of pattern matching is the use of regular
expressions on strings. OCaml provides pattern matching on ADT and expressions on strings. provides pattern matching on ADT and
primitive data types. primitive data types.
The result of a pattern matching operation is always one of: The result of a pattern matching operation is always one of:
\begin{itemize} \begin{itemize}
@ -251,7 +253,7 @@ match color with
| _ -> print "white or black" | _ -> print "white or black"
\end{verbatim} \end{verbatim}
OCaml provides tokens to express data destructoring. provides tokens to express data destructoring.
For example we can examine the content of a list with patten matching For example we can examine the content of a list with patten matching
\begin{verbatim} \begin{verbatim}
@ -293,11 +295,114 @@ begin match token_list with
\end{verbatim} \end{verbatim}
Moreover, the OCaml pattern matching compiler emits a warning when a Moreover, the pattern matching compiler emits a warning when a
pattern is not exhaustive or some patterns are shadowed by precedent ones. pattern is not exhaustive or some patterns are shadowed by precedent ones.
In general pattern matching on primitive and algebraic data types takes the \subsection{Symbolic execution}
following form. \label{sec:orge2e0205}
\subsection{Translation validation}
\label{sec:orgbafe7bc}
Translators, such as translators and code generators, are huge pieces of
software usually consisting of multiple subsystem and
constructing an actual specification of a translator implementation for
formal validation is a very long task. Moreover, different
translators implement different algorithms, so the correctness proof of
a translator cannot be generalized and reused to prove another translator.
Translation validation is an alternative to the verification of
existing translators that consists of taking the source and the target
(compiled) program and proving \emph{a posteriori} their semantic equivalence.
\begin{itemize}
\item[{$\square$}] Techniques for translation validation
\item[{$\square$}] What does semantically equivalent mean
\item[{$\square$}] What happens when there is no semantic equivalence
\item[{$\square$}] Translation validation through symbolic execution
\end{itemize}
\subsection{Translation validation of the Pattern Matching Compiler}
\label{sec:org24ee133}
\begin{enumerate}
\item Source program
\label{sec:org8c21912}
The algorithm takes as its input a source program and translates it
into an algebraic data structure called \emph{constraint\_tree}.
\begin{verbatim}
type constraint_tree =
| Unreachable
| Failure
| Leaf of source_expr
| Guard of source_blackbox * constraint_tree * constraint_tree
| Node of accessor * (constructor * constraint_tree) list * constraint_tree
\end{verbatim}
Unreachable, Leaf of source\_expr and Failure are the terminals of the three.
We distinguish
\begin{itemize}
\item Unreachable: statically it is known that no value can go there
\item Failure: a value matching this part results in an error
\item Leaf: a value matching this part results into the evaluation of a
source blackbox of code
\end{itemize}
The algorithm doesn't support type-declaration-based analysis
to know the list of constructors at a given type.
Let's consider some trivial examples:
\begin{verbatim}
function true -> 1
\end{verbatim}
[ ] Converti a disegni
Is translated to
\begin{center}
\begin{tabular}{l}
Node ([(true, Leaf 1)], Failure)\\
\end{tabular}
\end{center}
while
\begin{verbatim}
function
true -> 1
| false -> 2
\end{verbatim}
will give
\begin{center}
\begin{tabular}{l}
Node ([(true, Leaf 1); (false, Leaf 2)])\\
\end{tabular}
\end{center}
It is possible to produce Unreachable examples by using
refutation clauses (a "dot" in the right-hand-side)
\begin{verbatim}
function
true -> 1
| false -> 2
| _ -> .
\end{verbatim}
that gets translated into
Node ([(true, Leaf 1); (false, Leaf 2)], Unreachable)
We trust this annotation, which is reasonable as the type-checker
verifies that it indeed holds.
Guard nodes of the tree are emitted whenever a guard is found. Guards
node contains a blackbox of code that is never evaluated and two
branches, one that is taken in case the guard evaluates to true and
the other one that contains the path taken when the guard evaluates to
true.
[ ] Finisci con Node
[ ] Spiega del fallback
[ ] rivedi compilazione per tenere in considerazione il tuo albero invece che le lambda
[ ] Specifica che stesso algoritmo usato per compilare a lambda, piu` optimizations
The source code of a pattern matching function in has the
following form:
\begin{center} \begin{center}
\begin{tabular}{l} \begin{tabular}{l}
@ -310,32 +415,61 @@ match variable with\\
\end{tabular} \end{tabular}
\end{center} \end{center}
It can be described more formally through a BNF grammar. and can include any expression that is legal for the compiler,
such as "when" conditions and assignments. Patterns could or could not
be exhaustive.
Pattern matching code could also be written using the more compact form:
\begin{center}
\begin{tabular}{l}
function\\
\(\vert{}\) pattern₁ -> expr₁\\
\(\vert{}\) pattern₂ when guard -> expr₂\\
\(\vert{}\) pattern₃ as var -> expr₃\\
\\
\(\vert{}\) pₙ -> exprₙ\\
\end{tabular}
\end{center}
This BNF grammar describes formally the grammar of the source program:
\begin{verbatim} \begin{verbatim}
start ::= "match" id "with" patterns | "function" patterns
patterns ::= (pattern0|pattern1) pattern1+
;; pattern0 and pattern1 are needed to distinguish the first case in which
;; we can avoid writing the optional vertical line
pattern0 ::= clause
pattern1 ::= "|" clause
clause ::= lexpr "->" rexpr
pattern ::= value-name lexpr ::= rule (ε|condition)
| _ ;; wildcard pattern rexpr ::= _code ;; arbitrary code
| constant ;; matches a constant value
| pattern as value-name ;; binds to value-name
| ( pattern ) ;; parenthesized pattern
| pattern | pattern ;; or-pattern
| constr pattern ;; variant pattern
| [ pattern { ; pattern } [ ; ] ] ;; list patterns
| pattern :: pattern ;; lists patterns using cons operator (::)
| [| pattern { ; pattern } [ ; ] |] ;; array pattern
| char-literal .. char-literal ;; match on a range of characters
| { field [: typexpr] [= pattern] { ; field [: typexpr] [= pattern] } \
[; _ ] [ ; ] } ;; patterns that match on records
rule ::= wildcard|variable|constructor_pattern|or_pattern ;;
;; rules
wildcard ::= "_"
variable ::= identifier
constructor_pattern ::= constructor (rule|ε) (assignment|ε)
constructor ::= int|float|char|string|bool
|unit|record|exn|objects|ref
|list|tuple|array
|variant|parameterized_variant ;; data types
or_pattern ::= wildcard|variable|constructor_pattern ("|" wildcard|variable|constructor_pattern)+
condition ::= "when" bexpr
assignment ::= "as" id
bexpr ::= _code ;; arbitrary code
\end{verbatim} \end{verbatim}
\begin{comment} \begin{comment}
\begin{enumerate} Check that it is still this
\item 1.2.1 Pattern matching compilation to lambda code \end{comment}
\label{sec:org4d27bd9}
During compilation, patterns are in the form Patterns are of the form
\begin{center} \begin{center}
\begin{tabular}{ll} \begin{tabular}{ll}
pattern & type of pattern\\ pattern & type of pattern\\
@ -347,10 +481,10 @@ c(p₁,p₂,\ldots{},pₙ) & constructor pattern\\
\end{tabular} \end{tabular}
\end{center} \end{center}
Expressions are compiled into lambda code and are referred as lambda During compilation by the translators expressions are compiled into
code actions. lambda code and are referred as lambda code actions lᵢ.
The entire pattern matching code can be represented as a clause matrix The entire pattern matching code is represented as a clause matrix
that associates rows of patterns (p\(_{\text{i,1}}\), p\(_{\text{i,2}}\), \ldots{}, p\(_{\text{i,n}}\)) to that associates rows of patterns (p\(_{\text{i,1}}\), p\(_{\text{i,2}}\), \ldots{}, p\(_{\text{i,n}}\)) to
lambda code action lⁱ lambda code action lⁱ
\begin{equation*} \begin{equation*}
@ -399,7 +533,7 @@ are the same
\begin{enumerate} \begin{enumerate}
\item Initial state of the compilation \item Initial state of the compilation
\label{sec:orge9c6bc4} \label{sec:org9a7b624}
Given a source of the following form: Given a source of the following form:
@ -528,8 +662,6 @@ apply, and P₂ → L₂ containing the remaining rows. The algorithm is
applied to both matrices. applied to both matrices.
\end{enumerate} \end{enumerate}
\end{comment}
\begin{comment} \begin{comment}
#+BEGIN_COMMENT #+BEGIN_COMMENT
CITE paper? CITE paper?
@ -537,26 +669,4 @@ CITE paper?
\end{comment} \end{comment}
\end{enumerate} \end{enumerate}
\end{enumerate} \end{enumerate}
\subsection{Symbolic execution}
\label{sec:orgdb60b84}
\subsection{Translation validation}
\label{sec:org096d047}
Translators, such as translators and code generators, are huge pieces of
software usually consisting of multiple subsystem and
constructing an actual specification of a translator implementation for
formal validation is a very long task. Moreover, different
translators implement different algorithms, so the correctness proof of
a translator cannot be generalized and reused to prove another translator.
Translation validation is an alternative to the verification of
existing translators that consists of taking the source and the target
(compiled) program and proving \emph{a posteriori} their semantic equivalence.
\begin{itemize}
\item[{$\square$}] Techniques for translation validation
\item[{$\square$}] What does semantically equivalent mean
\item[{$\square$}] What happens when there is no semantic equivalence
\item[{$\square$}] Translation validation through symbolic execution
\end{itemize}
\end{document} \end{document}