diff --git a/tesi/tesi.pdf b/tesi/tesi.pdf index 51f413b..683ed05 100644 Binary files a/tesi/tesi.pdf and b/tesi/tesi.pdf differ diff --git a/tesi/tesi_unicode.org b/tesi/tesi_unicode.org index 5ddbbbc..462e4ba 100644 --- a/tesi/tesi_unicode.org +++ b/tesi/tesi_unicode.org @@ -1,5 +1,4 @@ \begin{comment} -TODO: neg is parsed incorrectly TODO: chiedi a Gabriel se T e S vanno bene, ma prima controlla che siano coerenti * TODO Scaletta [1/6] - [X] Introduction @@ -103,6 +102,10 @@ into simple control flow structures such as \texttt{if, switch}, for example: | x::[] -> (1, Some x) | _::y::_ -> (2, Some y) \end{lstlisting} +Given as input to the pattern matching compiler, this snippet of code +gets translated into the Lambda intermediate representation of +the OCaml compiler. The Lambda representation of a program is shown by +calling the \texttt{ocamlc} compiler with \texttt{-drawlambda} flag. \begin{lstlisting} (if scrutinee (let (field_1 =a (field 1 scrutinee)) @@ -115,12 +118,6 @@ into simple control flow structures such as \texttt{if, switch}, for example: (makeblock 0 1 (makeblock 0 y))))) [0: 0 0a]) \end{lstlisting} -\begin{comment} -%% TODO: side by side -\end{comment} -The code on the right is in the Lambda intermediate representation of -the OCaml compiler. The Lambda representation of a program is shown by -calling the \texttt{ocamlc} compiler with \texttt{-drawlambda} flag. The OCaml pattern matching compiler is a critical part of the OCaml compiler in terms of correctness because bugs typically would result in wrong code @@ -151,6 +148,7 @@ execution. Here are the decision trees for the source and target example program. \begin{minipage}{0.5\linewidth} + \scriptsize \begin{verbatim} Switch(Root) / \ @@ -166,21 +164,22 @@ Here are the decision trees for the source and target example program. \end{verbatim} \end{minipage} \hfill -\begin{minipage}{0.5\linewidth} +\begin{minipage}{0.4\linewidth} + \scriptsize \begin{verbatim} Switch(Root) / \ (= int 0) (!= int 0) / \ Leaf Switch(Root.1) -(makeblock 0 / \ +(mkblock 0 / \ 0 0a) / \ (= int 0) (!= int 0) / \ Leaf Leaf [x = Root.0] [y = Root.1.0] -(makeblock 0 (makeblock 0 - 1 (makeblock 0 x)) 2 (makeblock 0 y)) +(mkblock 0 (mkblock 0 + 1 (mkblock 0 x)) 2 (mkblock 0 y)) \end{verbatim} \end{minipage} @@ -207,7 +206,7 @@ reduced tree is equivalent to $C_i$. \subsection{From source programs to decision trees} Our source language supports integers, lists, tuples and all algebraic datatypes. Patterns support wildcards, constructors and literals, -or-patterns $(p_1 | p_2)$ and pattern variables. We also support +Or-patterns such as $(p_1 | p_2)$ and pattern variables. We also support \texttt{when} guards, which are interesting as they introduce the evaluation of expressions during matching. Decision trees have nodes of the form: @@ -742,21 +741,17 @@ respectively the /precondition/ and the /postcondition/ that constitute the assertions of the program. C is the directive being executed. The language of the assertions P is: -#+BEGIN_SRC -P ::= true | false | a < b | P_{1} \wedge P_{2} | P_{1} \lor P_{2} | \notP -#+END_SRC +| P ::= true \vert false \vert a < b \vert P_{1} \wedge P_{2} \vert P_{1} \lor P_{2} \vert \not P where a and b are numbers. In the Hoare rules assertions could also take the form of -#+BEGIN_SRC -P ::= \forall i. P | \exists i. P | P_{1} \Rightarrow P_{2} -#+END_SRC +| P ::= \forall i. P \vert \exists i. P \vert P_{1} \Rightarrow P_{2} where i is a logical variable, but assertions of these kinds increases the complexity of the symbolic engine. Execution follows the rules of Hoare logic: - Empty statement : \begin{mathpar} \infer{ } -{ \{P\} skip \{P\} } +{ \{P\}skip\{P\} } \end{mathpar} - Assignment statement : The truthness of P[a/x] is equivalent to the truth of {P} after the assignment. @@ -775,14 +770,14 @@ Execution follows the rules of Hoare logic: - Conditional : \begin{mathpar} \infer { \{P \wedge b \} c_1 \{Q\}, \{P\wedge\not b\}c_2\{Q\} } -{ \{P\}if b then c_1 else c_2\{Q\} } +{ \{P\}\text{if b then $c_1$ else $c_2$}\{Q\} } \end{mathpar} - Loop : {P} is the loop invariant. After the loop is finished /P/ holds and ¬̸b caused the loop to end. \begin{mathpar} \infer { \{P \wedge b \}c\{P\} } -{ \{P\} while b do c \{P \wedge \neg b\} } +{ \{P\}\text{while b do c} \{P \wedge \neg b\} } \end{mathpar} Even if the semantics of symbolic execution engines are well defined, @@ -816,7 +811,7 @@ existing translators that consists of taking the source and the target ** Source program The algorithm takes as its input a source program and translates it -into an algebraic data structure called /decision_tree/. +into an algebraic data structure which type we call /decision_tree/. #+BEGIN_SRC type decision_tree = @@ -824,7 +819,7 @@ type decision_tree = | Failure | Leaf of source_expr | Guard of source_blackbox * decision_tree * decision_tree - | Node of accessor * (constructor * decision_tree) list * decision_tree + | Switch of accessor * (constructor * decision_tree) list * decision_tree #+END_SRC Unreachable, Leaf of source_expr and Failure are the terminals of the three. @@ -842,18 +837,16 @@ Let's consider some trivial examples: function true -> 1 #+END_SRC -[ ] Converti a disegni - -Is translated to -|Node ([(true, Leaf 1)], Failure) +is translated to +|Switch ([(true, Leaf 1)], Failure) while #+BEGIN_SRC function true -> 1 | false -> 2 #+END_SRC -will give -|Node ([(true, Leaf 1); (false, Leaf 2)]) +will be translated to +|Switch ([(true, Leaf 1); (false, Leaf 2)]) It is possible to produce Unreachable examples by using refutation clauses (a "dot" in the right-hand-side) @@ -864,7 +857,7 @@ true -> 1 | _ -> . #+END_SRC that gets translated into -Node ([(true, Leaf 1); (false, Leaf 2)], Unreachable) +Switch ([(true, Leaf 1); (false, Leaf 2)], Unreachable) We trust this annotation, which is reasonable as the type-checker verifies that it indeed holds. @@ -875,20 +868,21 @@ branches, one that is taken in case the guard evaluates to true and the other one that contains the path taken when the guard evaluates to true. -[ ] Finisci con Node +\begin{comment} +[ ] Finisci con Switch [ ] Spiega del fallback [ ] rivedi compilazione per tenere in considerazione il tuo albero invece che le Lambda -[ ] Specifica che stesso algoritmo usato per compilare a lambda, piu` optimizations +\end{comment} The source code of a pattern matching function has the following form: |match variable with -|\vert pattern₁ -> expr₁ -|\vert pattern₂ when guard -> expr₂ -|\vert pattern₃ as var -> expr₃ +|\vert pattern₁ \to expr₁ +|\vert pattern₂ when guard \to expr₂ +|\vert pattern₃ as var \to expr₃ |⋮ -|\vert pₙ -> exprₙ +|\vert pₙ \to exprₙ and can include any expression that is legal for the OCaml compiler, such as /when/ guards and assignments. Patterns could or could not @@ -896,45 +890,34 @@ be exhaustive. Pattern matching code could also be written using the more compact form: |function -|\vert pattern₁ -> expr₁ -|\vert pattern₂ when guard -> expr₂ -|\vert pattern₃ as var -> expr₃ +|\vert pattern₁ \to expr₁ +|\vert pattern₂ when guard \to expr₂ +|\vert pattern₃ as var \to expr₃ |⋮ -|\vert pₙ -> exprₙ +|\vert pₙ \to exprₙ This BNF grammar describes formally the grammar of the source program: -#+BEGIN_SRC bnf -start ::= "match" id "with" patterns | "function" patterns -patterns ::= (pattern0|pattern1) pattern1+ -;; pattern0 and pattern1 are needed to distinguish the first case in which -;; we can avoid writing the optional vertical line -pattern0 ::= clause -pattern1 ::= "|" clause -clause ::= lexpr "->" rexpr - -lexpr ::= rule (ε|condition) -rexpr ::= _code ;; arbitrary code - -rule ::= wildcard|variable|constructor_pattern|or_pattern ;; - -;; rules -wildcard ::= "_" -variable ::= identifier -constructor_pattern ::= constructor (rule|ε) (assignment|ε) - -constructor ::= int|float|char|string|bool - |unit|record|exn|objects|ref - |list|tuple|array - |variant|parameterized_variant ;; data types - -or_pattern ::= rule ("|" wildcard|variable|constructor_pattern)+ - -condition ::= "when" bexpr -assignment ::= "as" id -bexpr ::= _code ;; arbitrary code -#+END_SRC +| start ::= "match" id "with" patterns \vert{} "function" patterns +| patterns ::= (pattern0\vert{}pattern1) pattern1+ +| ;; pattern0 and pattern1 are needed to distinguish the first case in which +| ;; we can avoid writing the optional vertical line +| pattern0 ::= clause +| pattern1 ::= "\vert" clause +| clause ::= lexpr "->" rexpr +| lexpr ::= rule (ε\vert{}condition) +| rexpr ::= _code ;; arbitrary code +| rule ::= wildcard\vert{}variable\vert{}constructor_pattern\vert{}or_pattern ;; +| ;; rules +| wildcard ::= "_" +| variable ::= identifier +| constructor_pattern ::= constructor (rule\vert{}ε) (assignment\vert{}ε) +| constructor ::= int\vert{}float\vert{}char\vert{}string\vert{}bool \vert{}unit\vert{}record\vert{}exn\vert{}objects\vert{}ref \vert{}list\vert{}tuple\vert{}array\vert{}variant\vert{}parameterized_variant ;; data types +| or_pattern ::= rule ("\vert{}" wildcard\vert{}variable\vert{}constructor_pattern)+ +| condition ::= "when" bexpr +| assignment ::= "as" id +| bexpr ::= _code ;; arbitrary code \begin{comment} Check that it is still coherent to this bnf @@ -948,7 +931,7 @@ Patterns are of the form | c(p₁,p₂,...,pₙ) | constructor pattern | | (p₁\vert p₂) | or-pattern | -During compilation by the translators expressions are compiled into +During compilation by the translators, expressions are compiled into Lambda code and are referred as lambda code actions lᵢ. The entire pattern matching code is represented as a clause matrix @@ -970,7 +953,7 @@ following rules apply |--------------------+---+--------------------+-------------------------------------------| | _ | ≼ | v | ∀v | | x | ≼ | v | ∀v | -| (p₁ \vert\ p₂) | ≼ | v | iff p₁ ≼ v or p₂ ≼ v | +| (p₁ \vert p₂) | ≼ | v | iff p₁ ≼ v or p₂ ≼ v | | c(p₁, p₂, ..., pₐ) | ≼ | c(v₁, v₂, ..., vₐ) | iff (p₁, p₂, ..., pₐ) ≼ (v₁, v₂, ..., vₐ) | | (p₁, p₂, ..., pₐ) | ≼ | (v₁, v₂, ..., vₐ) | iff pᵢ ≼ vᵢ ∀i ∈ [1..a] | |--------------------+---+--------------------+-------------------------------------------| @@ -1038,14 +1021,14 @@ Similarly, all the right-hand-side expressions are of the form \texttt{observe ...} with the same constraints on arguments. #+BEGIN_SRC -type t = Z | S of t +type t = K1 | K2 of t (* declaration of an algebraic and recursive datatype t *) let _ = function - | Z -> observe 0 - | S Z -> observe 1 - | S x when guard x -> observe 2 - | S (S x) as y when guard x y -> observe 3 - | S _ -> observe 4 + | K1 -> observe 0 + | K2 K1 -> observe 1 + | K2 x when guard x -> observe 2 + | K2 (K2 x) as y when guard x y -> observe 3 + | K2 _ -> observe 4 #+END_SRC Once parsed, the type declarations and the list of clauses are encoded in the form of a matrix @@ -1069,7 +1052,7 @@ p_{m,1} & p_{m,2} & \cdots & p_{m,n} → lₘ) \end{equation*} The base case C₀ of the algorithm is the case in which the $\vec{x}$ -is empty, that is $\vec{x}$ = (), then the result of the compilation +is empty and the result of the compilation C₀ is l₁ \begin{equation*} C₀((), @@ -1102,7 +1085,7 @@ following four rules: \end{equation*} That means in every lambda action lᵢ there is a binding of x₁ to the - variable that appears on the pattern $p_{i,1}. Bindings are omitted + variable that appears on the pattern p_{i,1}. Bindings are omitted for wildcard patterns and the lambda action lᵢ remains unchanged. 2) Constructor rule: if all patterns in the first column of P are @@ -1344,7 +1327,7 @@ possible input values /S/. The fallback is not needed when the user doesn't use a wildcard pattern. %%% Give example of thing -| Cₛ ::= Leaf bb | Node(a, (Kᵢ → Cᵢ)^{i∈S} , C?) +| Cₛ ::= Leaf bb | Switch(a, (Kᵢ → Cᵢ)^{i∈S} , C?) | a ::= Here | n.a | vₛ ::= K(vᵢ)^{i∈I} | n ∈ ℕ @@ -1492,11 +1475,11 @@ We prove this by induction on Cₜ: That means that the result of trimming on a Leaf is the Leaf itself b. The same applies to Failure terminal | Failure_{/a→π}(v) = Failure(v) - c. When Cₜ = Node(b, (π→Cᵢ)ⁱ)_{/a→π} then + c. When Cₜ = Switch(b, (π→Cᵢ)ⁱ)_{/a→π} then we look at the accessor /a/ of the subtree Cᵢ and we define πᵢ' = πᵢ if a≠b else πᵢ∩π Trimming a switch node yields the following result: - | Node(b, (π→Cᵢ)ⁱ)_{/a→π} := Node(b, (π'ᵢ→C_{i/a→π})ⁱ) + | Switch(b, (π→Cᵢ)ⁱ)_{/a→π} := Switch(b, (π'ᵢ→C_{i/a→π})ⁱ) \begin{comment} Actually in the proof.org file I transcribed: @@ -1507,10 +1490,10 @@ This is not correct because you don't have Unreachable nodes in target decision For the trimming lemma we have to prove that running the value vₜ against the decistion tree Cₜ is the same as running vₜ against the tree C_{trim} that is the result of the trimming operation on Cₜ -| Cₜ(vₜ) = C_{trim}(vₜ) = Node(b, (πᵢ'→C_{i/a→π})ⁱ)(vₜ) +| Cₜ(vₜ) = C_{trim}(vₜ) = Switch(b, (πᵢ'→C_{i/a→π})ⁱ)(vₜ) We can reason by first noting that when vₜ∉(b→πᵢ)ⁱ the node must be a Failure node. In the case where ∃k| vₜ∈(b→πₖ) then we can prove that -| C_{k/a→π}(vₜ) = Node(b, (πᵢ'→C_{i/a→π})ⁱ)(vₜ) +| C_{k/a→π}(vₜ) = Switch(b, (πᵢ'→C_{i/a→π})ⁱ)(vₜ) because when a ≠ b then πₖ'= πₖ and this means that vₜ∈πₖ' while when a = b then πₖ'=(πₖ∩π) and vt∈πₖ' because: - by the hypothesis, vₜ∈π @@ -1580,8 +1563,8 @@ I think the unreachable case should go at the end. |equiv(S, Failure, Failure) := Yes Given that ∀v, Failure(v) = Failure, the statement holds. 3. When we have a Leaf or a Failure at the left side: - | equiv(S, Failure as Cₛ, Node(a, (πᵢ → Cₜᵢ)ⁱ)) := Forall(equiv( S∩a→π(kᵢ)), Cₛ, Cₜᵢ)ⁱ) - | equiv(S, Leaf bbₛ as Cₛ, Node(a, (πᵢ → Cₜᵢ)ⁱ)) := Forall(equiv( S∩a→π(kᵢ)), Cₛ, Cₜᵢ)ⁱ) + | equiv(S, Failure as Cₛ, Switch(a, (πᵢ → Cₜᵢ)ⁱ)) := Forall(equiv( S∩a→π(kᵢ)), Cₛ, Cₜᵢ)ⁱ) + | equiv(S, Leaf bbₛ as Cₛ, Switch(a, (πᵢ → Cₜᵢ)ⁱ)) := Forall(equiv( S∩a→π(kᵢ)), Cₛ, Cₜᵢ)ⁱ) The algorithm either returns Yes for every sub-input space Sᵢ := S∩(a→π(kᵢ)) and subtree Cₜᵢ | equiv(Sᵢ, Cₛ, Cₜᵢ) = Yes ∀i @@ -1592,11 +1575,11 @@ I think the unreachable case should go at the end. | vₛ≃vₜ∈S ∧ Cₛ(vₛ)≠Cₜ(vₜ) we can say that | equiv(Sᵢ, Cₛ, Cₜᵢ) = No(vₛ, vₜ) for some minimal k∈I -4. When we have a Node on the right we define πₙ as the domain of +4. When we have a Switch on the right we define πₙ as the domain of values not covered but the union of the constructors kᵢ | πₙ = ¬(⋃π(kᵢ)ⁱ) The algorithm proceeds by trimming - | equiv(S, Node(a, (kᵢ → Cₛᵢ)ⁱ, C_{sf}), Cₜ) := + | equiv(S, Switch(a, (kᵢ → Cₛᵢ)ⁱ, C_{sf}), Cₜ) := | Forall(equiv( S∩(a→π(kᵢ)ⁱ), Cₛᵢ, C_{t/a→π(kᵢ)})ⁱ +++ equiv(S∩(a→π(kᵢ)), Cₛ, C_{a→πₙ})) The statement still holds and we show this by first analyzing the /Yes/ case: