diff --git a/tesi/conv.py b/tesi/conv.py index b726503..120475a 100644 --- a/tesi/conv.py +++ b/tesi/conv.py @@ -3,7 +3,7 @@ import re from sys import argv allsymbols = json.load(open('./unicode-latex.json')) -mysymbols = ['≡', '≠', '≼', '→', '←', '⊀', '⋠', '≺', '∀', '∈', '₀', '₂', '₁', '₃', 'ₐ', 'ₖ', 'ₘ', 'ₙ', 'ᵢ', 'ⁱ', '⋮'] +mysymbols = ['≡', '≠', '≼', '→', '←', '⊀', '⋠', '≺', '∀', '∈', 'ε','₀', '₂', '₁', '₃', 'ₐ', 'ₖ', 'ₘ', 'ₙ', 'ᵢ', 'ⁱ', '⋮'] symbols = {s: allsymbols[s] for s in mysymbols} mathsymbols = {s: '$'+allsymbols[s]+'$' for s in symbols} diff --git a/tesi/tesi.pdf b/tesi/tesi.pdf index 7c086de..7c22a01 100644 Binary files a/tesi/tesi.pdf and b/tesi/tesi.pdf differ diff --git a/tesi/tesi_unicode.org b/tesi/tesi_unicode.org index 1807b99..f4d024b 100644 --- a/tesi/tesi_unicode.org +++ b/tesi/tesi_unicode.org @@ -2,7 +2,7 @@ * TODO Scaletta [1/5] - [X] Abstract - [-] Background [40%] - - [X] Ocaml + - [X] - [ ] Lambda code [0%] - [ ] Compiler optimizations - [ ] other instructions @@ -23,7 +23,7 @@ \end{comment} -#+TITLE: Translation Verification of the OCaml pattern matching compiler +#+TITLE: Translation Verification of the pattern matching compiler #+AUTHOR: Francesco Mecca #+EMAIL: me@francescomecca.eu #+DATE: @@ -44,7 +44,7 @@ #+STARTUP: showall \begin{abstract} -This dissertation presents an algorithm for the translation validation of the OCaml +This dissertation presents an algorithm for the translation validation of the pattern matching compiler. Given the source representation of the target program and the target program compiled in untyped lambda form, the algoritmhm is capable of modelling the source program in terms of symbolic constraints on it's branches and apply symbolic @@ -53,17 +53,17 @@ produced a valid result. In this context a valid result means that for every input in the domain of the source program the untyped lambda translation produces the same output as the source program. The input of the program is modelled in terms of symbolic constraints closely related to -the runtime representation of OCaml objects and the output consists of OCaml code +the runtime representation of objects and the output consists of OCaml code blackboxes that are not evaluated in the context of the verification. \end{abstract} * Background -** OCaml -Objective Caml (OCaml) is a dialect of the ML (Meta-Language) family of programming +** +Objective Caml () is a dialect of the ML (Meta-Language) family of programming languages. -OCaml shares many features with other dialects of ML, such as SML and Caml Light, + shares many features with other dialects of ML, such as SML and Caml Light, The main features of ML languages are the use of the Hindley-Milner type system that provides many advantages with respect to static type systems of traditional imperative and object oriented language such as C, C++ and Java, such as: @@ -99,33 +99,33 @@ oriented language such as C, C++ and Java, such as: Moreover ML languages are functional, meaning that functions are treated as first class citizens and variables are immutable, although mutable statements and imperative constructs are permitted. -In addition to that OCaml features an object system, that provides +In addition to that features an object system, that provides inheritance, subtyping and dynamic binding, and modules, that provide a way to encapsulate definitions. Modules are checked statically and can be reificated through functors. ** Lambda form compilation \begin{comment} -https://dev.realworldocaml.org/compiler-backend.html +https://dev.realworld.org/compiler-backend.html \end{comment} -OCaml provides compilation in form of a byecode executable with an + provides compilation in form of a byecode executable with an optionally embeddable interpreter and a native executable that could be statically linked to provide a single file executable. -After the OCaml typechecker has proven that the program is type safe, -the OCaml compiler lower the code to /Lambda/, an s-expression based +After the typechecker has proven that the program is type safe, +the compiler lower the code to /Lambda/, an s-expression based language that assumes that its input has already been proved safe. On the /Lambda/ representation of the source program, the compiler performes a series of optimization passes before translating the lambda form to assembly code. -*** OCaml datatypes +*** datatypes -Most native data types in OCaml, such as integers, tuples, lists, +Most native data types in , such as integers, tuples, lists, records, can be seen as instances of the following definition -#+BEGIN_SRC ocaml +#+BEGIN_SRC type t = Nil | One of int | Cons of int * t #+END_SRC that is a type /t/ with three constructors that define its complete @@ -134,7 +134,7 @@ Every constructor has an arity. Nil, a constructor of arity 0, is called a constant constructor. *** Lambda form types -A lambda form target file produced by the ocaml compiler consists of a +A lambda form target file produced by the compiler consists of a single s-expression. Every s-expression consist of /(/, a sequence of elements separated by a whitespace and a closing /)/. Elements of s-expressions are: @@ -182,7 +182,7 @@ if statements and switch statements. Pattern matching on the other hands express predicates through syntactic templates that also allow to bind on data structures of arbitrary shapes. One common example of pattern matching is the use of regular -expressions on strings. OCaml provides pattern matching on ADT and +expressions on strings. provides pattern matching on ADT and primitive data types. The result of a pattern matching operation is always one of: - this value does not match this pattern” @@ -190,7 +190,7 @@ The result of a pattern matching operation is always one of: names to values and the jump to the expression pointed at the pattern. -#+BEGIN_SRC ocaml +#+BEGIN_SRC type color = | Red | Blue | Green | Black | White match color with @@ -200,10 +200,10 @@ match color with | _ -> print "white or black" #+END_SRC -OCaml provides tokens to express data destructoring. + provides tokens to express data destructoring. For example we can examine the content of a list with patten matching -#+BEGIN_SRC ocaml +#+BEGIN_SRC begin match list with | [ ] -> print "empty list" @@ -216,7 +216,7 @@ Parenthesized patterns, such as the third one in the previous example, matches the same value as the pattern without parenthesis. The same could be done with tuples -#+BEGIN_SRC ocaml +#+BEGIN_SRC begin match tuple with | (Some _, Some _) -> print "Pair of optional types" @@ -231,7 +231,7 @@ pattern₂ if it matches pattern₁ or pattern₂. Pattern clauses can make the use of /guards/ to test predicates and variables can captured (binded in scope). -#+BEGIN_SRC ocaml +#+BEGIN_SRC begin match token_list with | "switch"::var::"{"::rest -> ... @@ -242,11 +242,97 @@ begin match token_list with #+END_SRC -Moreover, the OCaml pattern matching compiler emits a warning when a +Moreover, the pattern matching compiler emits a warning when a pattern is not exhaustive or some patterns are shadowed by precedent ones. -In general pattern matching on primitive and algebraic data types takes the -following form. +** Symbolic execution + +** Translation validation +Translators, such as translators and code generators, are huge pieces of +software usually consisting of multiple subsystem and +constructing an actual specification of a translator implementation for +formal validation is a very long task. Moreover, different +translators implement different algorithms, so the correctness proof of +a translator cannot be generalized and reused to prove another translator. +Translation validation is an alternative to the verification of +existing translators that consists of taking the source and the target +(compiled) program and proving /a posteriori/ their semantic equivalence. + +- [ ] Techniques for translation validation +- [ ] What does semantically equivalent mean +- [ ] What happens when there is no semantic equivalence +- [ ] Translation validation through symbolic execution + +** Translation validation of the Pattern Matching Compiler + +*** Source program +The algorithm takes as its input a source program and translates it +into an algebraic data structure called /constraint_tree/. + +#+BEGIN_SRC +type constraint_tree = + | Unreachable + | Failure + | Leaf of source_expr + | Guard of source_blackbox * constraint_tree * constraint_tree + | Node of accessor * (constructor * constraint_tree) list * constraint_tree +#+END_SRC + +Unreachable, Leaf of source_expr and Failure are the terminals of the three. +We distinguish +- Unreachable: statically it is known that no value can go there +- Failure: a value matching this part results in an error +- Leaf: a value matching this part results into the evaluation of a + source blackbox of code + +The algorithm doesn't support type-declaration-based analysis +to know the list of constructors at a given type. +Let's consider some trivial examples: + +#+BEGIN_SRC +function true -> 1 +#+END_SRC + +[ ] Converti a disegni + +Is translated to +|Node ([(true, Leaf 1)], Failure) +while +#+BEGIN_SRC +function +true -> 1 +| false -> 2 +#+END_SRC +will give +|Node ([(true, Leaf 1); (false, Leaf 2)]) + +It is possible to produce Unreachable examples by using + refutation clauses (a "dot" in the right-hand-side) +#+BEGIN_SRC +function +true -> 1 +| false -> 2 +| _ -> . +#+END_SRC +that gets translated into +Node ([(true, Leaf 1); (false, Leaf 2)], Unreachable) + +We trust this annotation, which is reasonable as the type-checker +verifies that it indeed holds. + +Guard nodes of the tree are emitted whenever a guard is found. Guards +node contains a blackbox of code that is never evaluated and two +branches, one that is taken in case the guard evaluates to true and +the other one that contains the path taken when the guard evaluates to +true. + +[ ] Finisci con Node +[ ] Spiega del fallback +[ ] rivedi compilazione per tenere in considerazione il tuo albero invece che le lambda +[ ] Specifica che stesso algoritmo usato per compilare a lambda, piu` optimizations + +The source code of a pattern matching function in has the +following form: |match variable with |\vert pattern₁ -> expr₁ @@ -255,30 +341,57 @@ following form. |⋮ |\vert pₙ -> exprₙ -It can be described more formally through a BNF grammar. +and can include any expression that is legal for the compiler, +such as "when" conditions and assignments. Patterns could or could not +be exhaustive. + +Pattern matching code could also be written using the more compact form: +|function +|\vert pattern₁ -> expr₁ +|\vert pattern₂ when guard -> expr₂ +|\vert pattern₃ as var -> expr₃ +|⋮ +|\vert pₙ -> exprₙ + + +This BNF grammar describes formally the grammar of the source program: #+BEGIN_SRC bnf +start ::= "match" id "with" patterns | "function" patterns +patterns ::= (pattern0|pattern1) pattern1+ +;; pattern0 and pattern1 are needed to distinguish the first case in which +;; we can avoid writing the optional vertical line +pattern0 ::= clause +pattern1 ::= "|" clause +clause ::= lexpr "->" rexpr -pattern ::= value-name -| _ ;; wildcard pattern -| constant ;; matches a constant value -| pattern as value-name ;; binds to value-name -| ( pattern ) ;; parenthesized pattern -| pattern | pattern ;; or-pattern -| constr pattern ;; variant pattern -| [ pattern { ; pattern } [ ; ] ] ;; list patterns -| pattern :: pattern ;; lists patterns using cons operator (::) -| [| pattern { ; pattern } [ ; ] |] ;; array pattern -| char-literal .. char-literal ;; match on a range of characters -| { field [: typexpr] [= pattern] { ; field [: typexpr] [= pattern] } \ - [; _ ] [ ; ] } ;; patterns that match on records +lexpr ::= rule (ε|condition) +rexpr ::= _code ;; arbitrary code +rule ::= wildcard|variable|constructor_pattern|or_pattern ;; + +;; rules +wildcard ::= "_" +variable ::= identifier +constructor_pattern ::= constructor (rule|ε) (assignment|ε) + +constructor ::= int|float|char|string|bool + |unit|record|exn|objects|ref + |list|tuple|array + |variant|parameterized_variant ;; data types + +or_pattern ::= wildcard|variable|constructor_pattern ("|" wildcard|variable|constructor_pattern)+ + +condition ::= "when" bexpr +assignment ::= "as" id +bexpr ::= _code ;; arbitrary code #+END_SRC \begin{comment} -*** 1.2.1 Pattern matching compilation to lambda code +Check that it is still this +\end{comment} -During compilation, patterns are in the form +Patterns are of the form | pattern | type of pattern | |-----------------+---------------------| | _ | wildcard | @@ -286,10 +399,10 @@ During compilation, patterns are in the form | c(p₁,p₂,...,pₙ) | constructor pattern | | (p₁\vert p₂) | or-pattern | -Expressions are compiled into lambda code and are referred as lambda -code actions. +During compilation by the translators expressions are compiled into +lambda code and are referred as lambda code actions lᵢ. -The entire pattern matching code can be represented as a clause matrix +The entire pattern matching code is represented as a clause matrix that associates rows of patterns (p_{i,1}, p_{i,2}, ..., p_{i,n}) to lambda code action lⁱ \begin{equation*} @@ -449,34 +562,8 @@ following four rules: apply, and P₂ → L₂ containing the remaining rows. The algorithm is applied to both matrices. -\end{comment} - \begin{comment} #+BEGIN_COMMENT CITE paper? #+END_COMMENT’ \end{comment} - -** Symbolic execution - -** Translation validation -Translators, such as translators and code generators, are huge pieces of -software usually consisting of multiple subsystem and -constructing an actual specification of a translator implementation for -formal validation is a very long task. Moreover, different -translators implement different algorithms, so the correctness proof of -a translator cannot be generalized and reused to prove another translator. -Translation validation is an alternative to the verification of -existing translators that consists of taking the source and the target -(compiled) program and proving /a posteriori/ their semantic equivalence. - -- [ ] Techniques for translation validation -- [ ] What does semantically equivalent mean -- [ ] What happens when there is no semantic equivalence -- [ ] Translation validation through symbolic execution - -** Translation validation of the Pattern Matching Compiler - - [ ] Source translation - - [ ] Formal Grammar - - [ ] Compilation of source patterns - - [ ] Rest? diff --git a/tesi/tesi_unicode.tex b/tesi/tesi_unicode.tex index e9319d1..3c3bea0 100644 --- a/tesi/tesi_unicode.tex +++ b/tesi/tesi_unicode.tex @@ -1,4 +1,4 @@ -% Created 2020-03-02 Mon 14:31 +% Created 2020-03-03 Tue 17:18 % Intended LaTeX compiler: pdflatex \documentclass[11pt]{article} \usepackage[utf8]{inputenc} @@ -24,10 +24,10 @@ \usepackage{color} \author{Francesco Mecca} \date{} -\title{Translation Verification of the OCaml pattern matching compiler} +\title{Translation Verification of the pattern matching compiler} \hypersetup{ pdfauthor={Francesco Mecca}, - pdftitle={Translation Verification of the OCaml pattern matching compiler}, + pdftitle={Translation Verification of the pattern matching compiler}, pdfkeywords={}, pdfsubject={}, pdfcreator={Emacs 26.3 (Org mode 9.1.9)}, @@ -37,12 +37,13 @@ \maketitle \begin{comment} \section{{\bfseries\sffamily TODO} Scaletta [1/5]} -\label{sec:org62901c2} +\label{sec:org7578cff} \begin{itemize} \item[{$\boxtimes$}] Abstract \item[{$\boxminus$}] Background [40\%] \begin{itemize} -\item[{$\boxtimes$}] Ocaml +\item[{$\boxtimes$}] + \item[{$\square$}] Lambda code [0\%] \begin{itemize} \item[{$\square$}] Compiler optimizations @@ -58,6 +59,7 @@ \begin{itemize} \item[{$\square$}] Formal Grammar \item[{$\square$}] Compilation of source patterns +\item[{$\square$}] Rest? \end{itemize} \item[{$\square$}] Target translation \begin{itemize} @@ -74,7 +76,7 @@ \begin{abstract} -This dissertation presents an algorithm for the translation validation of the OCaml +This dissertation presents an algorithm for the translation validation of the pattern matching compiler. Given the source representation of the target program and the target program compiled in untyped lambda form, the algoritmhm is capable of modelling the source program in terms of symbolic constraints on it's branches and apply symbolic @@ -83,19 +85,19 @@ produced a valid result. In this context a valid result means that for every input in the domain of the source program the untyped lambda translation produces the same output as the source program. The input of the program is modelled in terms of symbolic constraints closely related to -the runtime representation of OCaml objects and the output consists of OCaml code +the runtime representation of objects and the output consists of OCaml code blackboxes that are not evaluated in the context of the verification. \end{abstract} \section{Background} -\label{sec:orgbf4de70} +\label{sec:org5b6accf} -\subsection{OCaml} -\label{sec:orga9a97c9} -Objective Caml (OCaml) is a dialect of the ML (Meta-Language) family of programming +\subsection{} +\label{sec:org3c9e604} +Objective Caml () is a dialect of the ML (Meta-Language) family of programming languages. -OCaml shares many features with other dialects of ML, such as SML and Caml Light, + shares many features with other dialects of ML, such as SML and Caml Light, The main features of ML languages are the use of the Hindley-Milner type system that provides many advantages with respect to static type systems of traditional imperative and object oriented language such as C, C++ and Java, such as: @@ -133,33 +135,33 @@ in their definition and can be combined. Moreover ML languages are functional, meaning that functions are treated as first class citizens and variables are immutable, although mutable statements and imperative constructs are permitted. -In addition to that OCaml features an object system, that provides +In addition to that features an object system, that provides inheritance, subtyping and dynamic binding, and modules, that provide a way to encapsulate definitions. Modules are checked statically and can be reificated through functors. \subsection{Lambda form compilation} -\label{sec:org2d10d35} +\label{sec:org6065c14} \begin{comment} -https://dev.realworldocaml.org/compiler-backend.html +https://dev.realworld.org/compiler-backend.html \end{comment} -OCaml provides compilation in form of a byecode executable with an + provides compilation in form of a byecode executable with an optionally embeddable interpreter and a native executable that could be statically linked to provide a single file executable. -After the OCaml typechecker has proven that the program is type safe, -the OCaml compiler lower the code to \emph{Lambda}, an s-expression based +After the typechecker has proven that the program is type safe, +the compiler lower the code to \emph{Lambda}, an s-expression based language that assumes that its input has already been proved safe. On the \emph{Lambda} representation of the source program, the compiler performes a series of optimization passes before translating the lambda form to assembly code. \begin{enumerate} -\item OCaml datatypes -\label{sec:orgd605d09} +\item datatypes +\label{sec:org7b158eb} -Most native data types in OCaml, such as integers, tuples, lists, +Most native data types in , such as integers, tuples, lists, records, can be seen as instances of the following definition \begin{verbatim} @@ -171,8 +173,8 @@ Every constructor has an arity. Nil, a constructor of arity 0, is called a constant constructor. \item Lambda form types -\label{sec:org1ee20a6} -A lambda form target file produced by the ocaml compiler consists of a +\label{sec:org737fa2f} +A lambda form target file produced by the compiler consists of a single s-expression. Every s-expression consist of \emph{(}, a sequence of elements separated by a whitespace and a closing \emph{)}. Elements of s-expressions are: @@ -203,7 +205,7 @@ The are varios numeric operations defined: \end{itemize} \item Functions -\label{sec:org914d5eb} +\label{sec:org369db83} Functions are defined using the following syntax, and close over all bindings in scope: (lambda (arg1 arg2 arg3) BODY) @@ -211,19 +213,19 @@ and are applied using the following syntax: (apply FUNC ARG ARG ARG) Evaluation is eager. \item Bindings -\label{sec:org055206b} +\label{sec:org120bc74} The atom \emph{let} introduces a sequence of bindings: (let BINDING BINDING BINDING \ldots{} BODY) \item Other atoms -\label{sec:org0f92182} +\label{sec:org58bd28f} TODO: if, switch, stringswitch\ldots{} TODO: magari esempi \end{enumerate} \subsection{Pattern matching} -\label{sec:org9876fb9} +\label{sec:org5d3b2f5} Pattern matching is a widely adopted mechanism to interact with ADT. C family languages provide branching on predicates through the use of @@ -231,7 +233,7 @@ if statements and switch statements. Pattern matching on the other hands express predicates through syntactic templates that also allow to bind on data structures of arbitrary shapes. One common example of pattern matching is the use of regular -expressions on strings. OCaml provides pattern matching on ADT and +expressions on strings. provides pattern matching on ADT and primitive data types. The result of a pattern matching operation is always one of: \begin{itemize} @@ -251,7 +253,7 @@ match color with | _ -> print "white or black" \end{verbatim} -OCaml provides tokens to express data destructoring. + provides tokens to express data destructoring. For example we can examine the content of a list with patten matching \begin{verbatim} @@ -293,11 +295,114 @@ begin match token_list with \end{verbatim} -Moreover, the OCaml pattern matching compiler emits a warning when a +Moreover, the pattern matching compiler emits a warning when a pattern is not exhaustive or some patterns are shadowed by precedent ones. -In general pattern matching on primitive and algebraic data types takes the -following form. +\subsection{Symbolic execution} +\label{sec:orge2e0205} + +\subsection{Translation validation} +\label{sec:orgbafe7bc} +Translators, such as translators and code generators, are huge pieces of +software usually consisting of multiple subsystem and +constructing an actual specification of a translator implementation for +formal validation is a very long task. Moreover, different +translators implement different algorithms, so the correctness proof of +a translator cannot be generalized and reused to prove another translator. +Translation validation is an alternative to the verification of +existing translators that consists of taking the source and the target +(compiled) program and proving \emph{a posteriori} their semantic equivalence. + +\begin{itemize} +\item[{$\square$}] Techniques for translation validation +\item[{$\square$}] What does semantically equivalent mean +\item[{$\square$}] What happens when there is no semantic equivalence +\item[{$\square$}] Translation validation through symbolic execution +\end{itemize} + +\subsection{Translation validation of the Pattern Matching Compiler} +\label{sec:org24ee133} + +\begin{enumerate} +\item Source program +\label{sec:org8c21912} +The algorithm takes as its input a source program and translates it +into an algebraic data structure called \emph{constraint\_tree}. + +\begin{verbatim} +type constraint_tree = + | Unreachable + | Failure + | Leaf of source_expr + | Guard of source_blackbox * constraint_tree * constraint_tree + | Node of accessor * (constructor * constraint_tree) list * constraint_tree +\end{verbatim} + +Unreachable, Leaf of source\_expr and Failure are the terminals of the three. +We distinguish +\begin{itemize} +\item Unreachable: statically it is known that no value can go there +\item Failure: a value matching this part results in an error +\item Leaf: a value matching this part results into the evaluation of a +source blackbox of code +\end{itemize} + +The algorithm doesn't support type-declaration-based analysis +to know the list of constructors at a given type. +Let's consider some trivial examples: + +\begin{verbatim} +function true -> 1 +\end{verbatim} + +[ ] Converti a disegni + +Is translated to +\begin{center} +\begin{tabular}{l} +Node ([(true, Leaf 1)], Failure)\\ +\end{tabular} +\end{center} +while +\begin{verbatim} +function +true -> 1 +| false -> 2 +\end{verbatim} +will give +\begin{center} +\begin{tabular}{l} +Node ([(true, Leaf 1); (false, Leaf 2)])\\ +\end{tabular} +\end{center} + +It is possible to produce Unreachable examples by using + refutation clauses (a "dot" in the right-hand-side) +\begin{verbatim} +function +true -> 1 +| false -> 2 +| _ -> . +\end{verbatim} +that gets translated into +Node ([(true, Leaf 1); (false, Leaf 2)], Unreachable) + +We trust this annotation, which is reasonable as the type-checker +verifies that it indeed holds. + +Guard nodes of the tree are emitted whenever a guard is found. Guards +node contains a blackbox of code that is never evaluated and two +branches, one that is taken in case the guard evaluates to true and +the other one that contains the path taken when the guard evaluates to +true. + +[ ] Finisci con Node +[ ] Spiega del fallback +[ ] rivedi compilazione per tenere in considerazione il tuo albero invece che le lambda +[ ] Specifica che stesso algoritmo usato per compilare a lambda, piu` optimizations + +The source code of a pattern matching function in has the +following form: \begin{center} \begin{tabular}{l} @@ -310,32 +415,61 @@ match variable with\\ \end{tabular} \end{center} -It can be described more formally through a BNF grammar. +and can include any expression that is legal for the compiler, +such as "when" conditions and assignments. Patterns could or could not +be exhaustive. + +Pattern matching code could also be written using the more compact form: +\begin{center} +\begin{tabular}{l} +function\\ +\(\vert{}\) pattern₁ -> expr₁\\ +\(\vert{}\) pattern₂ when guard -> expr₂\\ +\(\vert{}\) pattern₃ as var -> expr₃\\ +⋮\\ +\(\vert{}\) pₙ -> exprₙ\\ +\end{tabular} +\end{center} + + +This BNF grammar describes formally the grammar of the source program: \begin{verbatim} +start ::= "match" id "with" patterns | "function" patterns +patterns ::= (pattern0|pattern1) pattern1+ +;; pattern0 and pattern1 are needed to distinguish the first case in which +;; we can avoid writing the optional vertical line +pattern0 ::= clause +pattern1 ::= "|" clause +clause ::= lexpr "->" rexpr -pattern ::= value-name -| _ ;; wildcard pattern -| constant ;; matches a constant value -| pattern as value-name ;; binds to value-name -| ( pattern ) ;; parenthesized pattern -| pattern | pattern ;; or-pattern -| constr pattern ;; variant pattern -| [ pattern { ; pattern } [ ; ] ] ;; list patterns -| pattern :: pattern ;; lists patterns using cons operator (::) -| [| pattern { ; pattern } [ ; ] |] ;; array pattern -| char-literal .. char-literal ;; match on a range of characters -| { field [: typexpr] [= pattern] { ; field [: typexpr] [= pattern] } \ - [; _ ] [ ; ] } ;; patterns that match on records +lexpr ::= rule (ε|condition) +rexpr ::= _code ;; arbitrary code +rule ::= wildcard|variable|constructor_pattern|or_pattern ;; + +;; rules +wildcard ::= "_" +variable ::= identifier +constructor_pattern ::= constructor (rule|ε) (assignment|ε) + +constructor ::= int|float|char|string|bool + |unit|record|exn|objects|ref + |list|tuple|array + |variant|parameterized_variant ;; data types + +or_pattern ::= wildcard|variable|constructor_pattern ("|" wildcard|variable|constructor_pattern)+ + +condition ::= "when" bexpr +assignment ::= "as" id +bexpr ::= _code ;; arbitrary code \end{verbatim} \begin{comment} -\begin{enumerate} -\item 1.2.1 Pattern matching compilation to lambda code -\label{sec:org4d27bd9} +Check that it is still this +\end{comment} -During compilation, patterns are in the form +Patterns are of the form \begin{center} \begin{tabular}{ll} pattern & type of pattern\\ @@ -347,10 +481,10 @@ c(p₁,p₂,\ldots{},pₙ) & constructor pattern\\ \end{tabular} \end{center} -Expressions are compiled into lambda code and are referred as lambda -code actions. +During compilation by the translators expressions are compiled into +lambda code and are referred as lambda code actions lᵢ. -The entire pattern matching code can be represented as a clause matrix +The entire pattern matching code is represented as a clause matrix that associates rows of patterns (p\(_{\text{i,1}}\), p\(_{\text{i,2}}\), \ldots{}, p\(_{\text{i,n}}\)) to lambda code action lⁱ \begin{equation*} @@ -399,7 +533,7 @@ are the same \begin{enumerate} \item Initial state of the compilation -\label{sec:orge9c6bc4} +\label{sec:org9a7b624} Given a source of the following form: @@ -528,8 +662,6 @@ apply, and P₂ → L₂ containing the remaining rows. The algorithm is applied to both matrices. \end{enumerate} -\end{comment} - \begin{comment} #+BEGIN_COMMENT CITE paper? @@ -537,26 +669,4 @@ CITE paper? \end{comment} \end{enumerate} \end{enumerate} - -\subsection{Symbolic execution} -\label{sec:orgdb60b84} - -\subsection{Translation validation} -\label{sec:org096d047} -Translators, such as translators and code generators, are huge pieces of -software usually consisting of multiple subsystem and -constructing an actual specification of a translator implementation for -formal validation is a very long task. Moreover, different -translators implement different algorithms, so the correctness proof of -a translator cannot be generalized and reused to prove another translator. -Translation validation is an alternative to the verification of -existing translators that consists of taking the source and the target -(compiled) program and proving \emph{a posteriori} their semantic equivalence. - -\begin{itemize} -\item[{$\square$}] Techniques for translation validation -\item[{$\square$}] What does semantically equivalent mean -\item[{$\square$}] What happens when there is no semantic equivalence -\item[{$\square$}] Translation validation through symbolic execution -\end{itemize} \end{document}