diff --git a/tesi/mumble_meeting_gabriel b/tesi/mumble_meeting_gabriel new file mode 100644 index 0000000..3436c00 --- /dev/null +++ b/tesi/mumble_meeting_gabriel @@ -0,0 +1,121 @@ +result r ::= guard list * (Match bb | NoMatch | Absurd) +guard := bb + +(equiv S Cₛ Cₜ gs = Yes) ∧ covers(Cₜ, S) ⇒ ∀vₛ≃vₜ ∈ S, Cₛ(vₛ) ≃gs Cₜ(vₜ) +(equiv S Cₛ Cₜ gs = No(vₛ,vₜ) ∧ covers(Cₜ, S) ⇒ vₛ≃vₜ ∈ S ∧ Cₛ(vₛ) ≠gs Cₜ(vₜ) + +(vs,vt) \in S +------------------------- +Test(S,false) = No(vs,vt) + +Test(S,true) = Yes + + +------------------------------- +equiv \emptyset Cs Ct gs ~> Yes + +----------------------------------------- +equiv S Failure (Leaf BBt) gs ~>No(vs, vt) + +----------------------------------------- +equiv S (Leaf BBs) Failure gs ~>No(vs, vt) + +---------------------------------------------- +equiv S Failure Failure gs ~> Test(S, gs = []) + +rBB = Test(S, equiv_BB BBs BBt) +rGuard = Test(s, gs = []) +------------------------------------------------------ +equiv S (Leaf BBs) (Leaf BBt) gs ~> Merge(rBB, rGuard) + + + +Cs \in (Leaf BBs, Failure) + +(equiv S Cs Ci gs)^i ~> ri)^i +equiv S Cs Cf gs ~> rf +----------------------------------------------------- +equiv S Cs (Node(a, (Domi,Ci)^i, Cf)) gs ~> Merge((ri)^i, rf) + +(equiv S Ci (trim Ct a=Ki) gs ~> ri)^i +equiv S Cf (trim Ct (a\notin(K_i)^i)) gs ~> rf +---------------------------------------------------------- +equiv S (Node(a, (Ki,Ci)^i, Cf) Ct gs ~> Merge((ri)^i, rf) + + + +equiv S Ctrue Ct (gs++[condition]) ~> rt +equiv S Cfalse Ct (gs++[condition]) ~> rf +------------------------------------------- +equiv S (Guard condition Ctrue Cfalse) Ct gs ~> (Merge rt rf) + +rCond = Test(S, cond = condition) +equiv S Cs Ctrue gs ~> rt +equiv S Cs Cfalse gs ~> rf +------------------------------------------- +equiv S Cs (Guard condition Ctrue Cfalse) ([cond]++gs) ~> (Merge rt rf) + +(vs,vt) \in S +------------------------------------------- +equiv S Cs (Guard condition Ctrue Cfalse) [] ~> No(vs,vt) + + +(Simplified rules) + +Idea: only talk about the valid equivalences (no No business) +so instead of proving facts of the form +equiv S Cs Ct gs ~> r +we just prove +equiv S Cs Ct gs + +------------------------ +equiv \emptyset Cs Ct gs + +-------------------------- +equiv S Failure Failure [] + +equiv_BB BBs BBt +------------------------------- +equiv S (Leaf BBs) (Leaf BBt) [] + +Cs \in (Leaf BBs, Failure) + +(equiv S Cs Ci gs)^i +equiv S Cs Cf gs +----------------------------------------- +equiv S Cs (Node(a, (Domi,Ci)^i, Cf)) gs + +equiv S Ci (trim Ct a=Ki) gs +equiv S Cf (trim Ct (a\notin(K_i)^i) gs +------------------------------------- +equiv S (Node(a, (Ki,Ci)^i, Cf) Ct gs + +equiv S Ctrue Ct (gs++[condition]) +equiv S Cfalse Ct (gs++[condition]) +-------------------------------------------- +equiv S (Guard condition Ctrue Cfalse) Ct gs + +equiv S Cs Ctrue gs +equiv S Cs Cfalse gs +-------------------------------------------- +equiv S Cs (Guard condition Ctrue Cfalse) ([condition]++gs) + + +(Let's forget about those that are below) + +forall i, (equiv S Ci (Ct/a=Ki) = Yes) +equiv S Cf (Ct/a\notin(K_i)^i) = Yes + ------------------------------------------- + equiv S (Node(a, (Ki,Ci)^i, Cf) Ct gs ~> Yes + + exists i, (equiv S Ci (Ct/a=Ki) = No(vs,vt)) + ------------------------------------------------- + equiv S (Node(a, (Ki,Ci)^i, Cf) Ct gs ~> No(vs,vt) + + equiv S Cf (Ct/a\notin(K_i)^i) = No(vs,vt) + ------------------------------------------------- + equiv kS (Node(a, (Ki,Ci)^i, Cf) Ct gs ~> No(vs,vt) + + forall S Cs Ct gs, exists (a unique) r such that + (equiv S Cs Ct gs = r) is provable + \ No newline at end of file diff --git a/tesi/tesi.pdf b/tesi/tesi.pdf index ac6cb94..19973f6 100644 Binary files a/tesi/tesi.pdf and b/tesi/tesi.pdf differ diff --git a/tesi/tesi_unicode.org b/tesi/tesi_unicode.org index d967388..6f70226 100644 --- a/tesi/tesi_unicode.org +++ b/tesi/tesi_unicode.org @@ -35,12 +35,18 @@ Magari prima pattern matching poi compilatore? #+LaTeX_HEADER: \usepackage{comment} #+LaTeX_HEADER: \usepackage{algpseudocode} #+LaTeX_HEADER: \usepackage{amsmath,amssymb,amsthm} -#+Latex_HEADER: \newtheorem{definition}{Definition} +#+LaTeX_HEADER: \newtheorem{definition}{Definition} +#+LaTeX_HEADER: \usepackage{mathpartir} #+LaTeX_HEADER: \usepackage{graphicx} #+LaTeX_HEADER: \usepackage{listings} #+LaTeX_HEADER: \usepackage{color} #+LaTeX_HEADER: \usepackage{stmaryrd} #+LaTeX_HEADER: \newcommand{\sem}[1]{{\llbracket{#1}\rrbracket}} +#+LaTeX_HEADER: \newcommand{\Equiv}[3]{\mathsf{equiv}(#1, #2, #3)} % \equiv is already taken +#+LaTeX_HEADER: \newcommand{\covers}[2]{#1 \mathrel{\mathsf{covers}} #2} +#+LaTeX_HEADER: \newcommand{\Yes}{\mathsf{Yes}} +#+LaTeX_HEADER: \newcommand{\No}[2]{\mathsf{No}(#1, #2)} + #+EXPORT_SELECT_TAGS: export #+EXPORT_EXCLUDE_TAGS: noexport #+OPTIONS: H:2 toc:nil \n:nil @:t ::t |:t ^:{} _:{} *:t TeX:t LaTeX:t @@ -89,16 +95,16 @@ into simple control flow structures such as \texttt{if, switch}, for example: \begin{comment} %% TODO: side by side \end{comment} -The code in the right is in the Lambda intermediate representation of +The code on the right is in the Lambda intermediate representation of the OCaml compiler. The Lambda representation of a program is shown by calling the \texttt{ocamlc} compiler with \texttt{-drawlambda} flag. The OCaml pattern matching compiler is a critical part of the OCaml compiler -in terms of correctness because any bug would result in wrong code +in terms of correctness because bugs typically would result in wrong code production rather than triggering compilation failures. Such bugs also are hard to catch by testing because they arise in corner cases of complex patterns which are typically not in the -compiler test suite. +compiler test suite or most user programs. The OCaml core developers group considered evolving the pattern matching compiler, either by using a new algorithm or by incremental refactoring of its code base. @@ -120,19 +126,42 @@ Our algorithm translates both source and target programs into a common representation, decision trees. Decision trees where chosen because they model the space of possible values at a given branch of execution. -Here is the decision tree for the source example program. +Here are the decision trees for the source and target example program. + +\begin{minipage}{0.5\linewidth} \begin{verbatim} - Node(Root) + Switch(Root) / \ (= []) (= ::) / \ - Leaf Node(Root.1) + Leaf Switch(Root.1) (0, None) / \ (= []) (= ::) / \ Leaf Leaf - (1, Some(Root.0)) (2, Some(Root.1.0)) + [x = Root.0] [y = Root.1.0] + (1, Some x) (2, Some y) \end{verbatim} +\end{minipage} +\hfill +\begin{minipage}{0.5\linewidth} +\begin{verbatim} + Switch(Root) + / \ + (= int 0) (!= int 0) + / \ + Leaf Switch(Root.1) +(makeblock 0 / \ + 0 0a) / \ + (= int 0) (!= int 0) + / \ + Leaf Leaf +[x = Root.0] [y = Root.1.0] +(makeblock 0 (makeblock 0 + 1 (makeblock 0 x)) 2 (makeblock 0 y)) +\end{verbatim} +\end{minipage} + \texttt{(Root.0)} is called an \emph{accessor}, that represents the access path to a value that can be reached by deconstructing the scrutinee. In this example \texttt{Root.0} is the first subvalue of @@ -140,8 +169,8 @@ the scrutinee. Target decision trees have a similar shape but the tests on the branches are related to the low level representation of values in -Lambda code. For example, cons cells \texttt{x::xs} are blocks with -tag 0. +Lambda code. For example, cons cells \texttt{x::xs} or tuples +\texttt{(x,y)} are blocks with tag 0. To check the equivalence of a source and a target decision tree, we proceed by case analysis. @@ -155,10 +184,11 @@ reduced tree is equivalent to $C_i$. \subsection{From source programs to decision trees} Our source language supports integers, lists, tuples and all algebraic -datatypes. Patterns support wildcards, constructors and literals, or -patterns $(p_1|p_2)$ and pattern variables. -We also support \texttt{when} guards. -Decision trees have nodes of the form: +datatypes. Patterns support wildcards, constructors and literals, +or-patterns $(p_1 | p_2)$ and pattern variables. We also support +\texttt{when} guards, which are interesting as they introduce the +evaluation of expressions during matching. Decision trees have nodes +of the form: \begin{lstlisting} type decision_tree = | Unreachable @@ -183,7 +213,7 @@ It satisfies the following correctness statement: \[ \forall t_S, \forall v_S, \quad t_S(v_S) = \sem{t_S}_S(v_S) \] -Running any source values $v_S$ against the source program gives the +Running any source value $v_S$ against the source program gives the same result as running it against the decision tree. \subsection{From target programs to decision trees} @@ -192,8 +222,10 @@ The target programs include the following Lambda constructs: various comparison operations, guards. The symbolic execution engine traverses the target program and builds an environment that maps variables to accessors. It branches at every control flow statement -and emits a Switch node. The branch condition $\pi_i$ is expressed as +and emits a \texttt{Switch} node. The branch condition $\pi_i$ is expressed as an interval set of possible values at that point. +In comparison with the source decision trees, \texttt{Unreachable} +nodes are never emitted. Guards result in branching. In comparison with the source decision trees, \texttt{Unreachable} nodes are never emitted. @@ -204,8 +236,106 @@ $t_T$, satisfying the following correctness statement: \] \subsection{Equivalence checking} -TODO +The equivalence checking algorithm takes as input a domain of +possible values \emph{S} and a pair of source and target decision trees and +in case the two trees are not equivalent it returns a counter example. +The algorithm respects the following correctness statement: +\begin{comment} +% TODO: we have to define what \covers mean for readers to understand the specifications +% (or we use a simplifying lie by hiding \covers in the statements). +\end{comment} +\begin{align*} + \Equiv S {C_S} {C_T} = \Yes \;\land\; \covers {C_T} S + & \implies + \forall v_S \approx v_T \in S,\; C_S(v_S) = C_T(v_T) +\\ + \Equiv S {C_S} {C_T} = \No {v_S} {v_T} \;\land\; \covers {C_T} S + & \implies + v_S \approx v_T \in S \;\land\; C_S(v_S) \neq C_T(v_T) +\end{align*} + +The algorithm proceeds by case analysis. Inference rules are shown. +If $S$ is empty the results is $\Yes$. + +\begin{verbatim} +------------------------ +equiv \emptyset Cs Ct gs +\end{verbatim} + +If the two decision trees are both terminal nodes the algorithm checks +for content equality. +\begin{verbatim} +-------------------------- +equiv S Failure Failure [] + +equiv_BB BBs BBt +------------------------------- +equiv S (Leaf BBs) (Leaf BBt) [] +\end{verbatim} + +If the source decision tree (left hand side) is a terminal while the +target decistion tree (right hand side) is not, the algorithm proceeds +by \emph{explosion} of the right hand side. Explosion means that every +child of the right hand side is tested for equality against the left +hand side. + +\begin{verbatim} +(equiv S Cs Ci gs)^i +equiv S Cs Cf gs +----------------------------------------- +equiv S Cs (Node(a, (Domi,Ci)^i, Cf)) gs +\end{verbatim} + + +\begin{comment} +% TODO: [Gabriel] in practice the $dom_S$ are constructors and the +% $dom_T$ are domains. Do we want to hide this fact, or be explicit +% about it? (Maybe we should introduce explicitly/clearly the idea of +% target domains at some point). +\end{comment} + +When the left hand side is not a terminal, the algorithm explodes the +left hand side while trimming every right hand side subtree. Trimming +a left hand side tree on an interval set $dom_S$ computed from the right hand +side tree constructor means mapping every branch condition $dom_T$ (interval set of +possible values) on the left to the intersection of $dom_T$ and $dom_S$ when +the accessors on both side are equal, and removing the branches that +result in an empty intersection. If the accessors are different, +\emph{$dom_T$} is left unchanged. + +\begin{verbatim} +equiv S Ci (trim Ct a=Ki) gs +equiv S Cf (trim Ct (a\notin(K_i)^i) gs +------------------------------------- +equiv S (Node(a, (Ki,Ci)^i, Cf) Ct gs +\end{verbatim} + +The equivalence checking algorithm deals with guards by storing a +queue. A guard blackbox is pushed to the queue whenever the algorithm +encounters a Guard node on the right, while it pops a blackbox from +the queue whenever a Guard node appears on the left hand side. +The algorithm stops with failure if the popped blackbox and the and +blackbox on the left hand Guard node are different, otherwise in +continues by exploding to two subtrees, one in which the guard +condition evaluates to true, the other when it evaluates to false. +Termination of the algorithm is successful only when the guards queue +is empty. +\begin{verbatim} +equiv S Ctrue Ct (gs++[condition]) +equiv S Cfalse Ct (gs++[condition]) +-------------------------------------------- +equiv S (Guard condition Ctrue Cfalse) Ct gs + +equiv S Cs Ctrue gs +equiv S Cs Cfalse gs +-------------------------------------------- +equiv S Cs (Guard condition Ctrue Cfalse) ([condition]++gs) +\end{verbatim} + +\begin{comment} +TODO: replace inference rules with good latex +\end{comment} * Background @@ -503,7 +633,9 @@ Moreover, the pattern matching compiler emits a warning when a pattern is not exhaustive or some patterns are shadowed by precedent ones. ** Symbolic execution -TODO + +Symbolic execution is a widely used techniques in the field of +computer security. ** Translation validation Translators, such as translators and code generators, are huge pieces of