results of mumble meeting

2020-04-03 15:53:54 +02:00 · 2020-04-03 15:53:54 +02:00 · 42f531b3e3
commit 42f531b3e3
parent b433663bdd
3 changed files with 271 additions and 18 deletions
--- a/tesi/mumble_meeting_gabriel
+++ b/tesi/mumble_meeting_gabriel
@ -0,0 +1,121 @@
+result r ::= guard list * (Match bb | NoMatch | Absurd)
+guard := bb
+
+(equiv S Cₛ Cₜ gs = Yes)      ∧ covers(Cₜ, S) ⇒ ∀vₛ≃vₜ ∈ S, Cₛ(vₛ) ≃gs Cₜ(vₜ)
+(equiv S Cₛ Cₜ gs = No(vₛ,vₜ) ∧ covers(Cₜ, S) ⇒ vₛ≃vₜ ∈ S ∧ Cₛ(vₛ) ≠gs Cₜ(vₜ)
+
+(vs,vt) \in S
+-------------------------
+Test(S,false) = No(vs,vt)
+
+Test(S,true) = Yes
+
+
+-------------------------------
+equiv \emptyset Cs Ct gs ~> Yes
+
+-----------------------------------------
+equiv S Failure (Leaf BBt) gs ~>No(vs, vt)
+
+-----------------------------------------
+equiv S (Leaf BBs) Failure gs ~>No(vs, vt)
+
+----------------------------------------------
+equiv S Failure Failure gs ~> Test(S, gs = [])
+
+rBB = Test(S, equiv_BB BBs BBt)
+rGuard = Test(s, gs = [])
+------------------------------------------------------
+equiv S (Leaf BBs) (Leaf BBt) gs ~> Merge(rBB, rGuard)
+
+
+
+Cs \in (Leaf BBs, Failure)
+
+(equiv S Cs Ci gs)^i ~> ri)^i
+equiv S Cs Cf gs ~> rf
+-----------------------------------------------------
+equiv S Cs (Node(a, (Domi,Ci)^i, Cf)) gs ~> Merge((ri)^i, rf)
+
+(equiv S Ci (trim Ct a=Ki) gs ~> ri)^i
+equiv S Cf (trim Ct (a\notin(K_i)^i)) gs ~> rf
+----------------------------------------------------------
+equiv S (Node(a, (Ki,Ci)^i, Cf) Ct gs ~> Merge((ri)^i, rf)
+
+<guard here>
+
+equiv S Ctrue Ct (gs++[condition]) ~> rt
+equiv S Cfalse Ct (gs++[condition]) ~> rf
+-------------------------------------------
+equiv S (Guard condition Ctrue Cfalse) Ct gs ~> (Merge rt rf)
+
+rCond = Test(S, cond = condition)
+equiv S Cs Ctrue gs ~> rt
+equiv S Cs Cfalse gs ~> rf
+-------------------------------------------
+equiv S Cs (Guard condition Ctrue Cfalse) ([cond]++gs) ~> (Merge rt rf)
+
+(vs,vt) \in S
+-------------------------------------------
+equiv S Cs (Guard condition Ctrue Cfalse) [] ~> No(vs,vt)
+
+
+(Simplified rules)
+
+Idea: only talk about the valid equivalences (no No business)
+so instead of proving facts of the form
+equiv S Cs Ct gs ~> r
+we just prove
+equiv S Cs Ct gs
+
+------------------------
+equiv \emptyset Cs Ct gs
+
+--------------------------
+equiv S Failure Failure []
+
+equiv_BB BBs BBt
+-------------------------------
+equiv S (Leaf BBs) (Leaf BBt) []
+
+Cs \in (Leaf BBs, Failure)
+
+(equiv S Cs Ci gs)^i
+equiv S Cs Cf gs
+-----------------------------------------
+equiv S Cs (Node(a, (Domi,Ci)^i, Cf)) gs
+
+equiv S Ci (trim Ct a=Ki) gs
+equiv S Cf (trim Ct (a\notin(K_i)^i) gs
+-------------------------------------
+equiv S (Node(a, (Ki,Ci)^i, Cf) Ct gs
+
+equiv S Ctrue Ct (gs++[condition])
+equiv S Cfalse Ct (gs++[condition])
+--------------------------------------------
+equiv S (Guard condition Ctrue Cfalse) Ct gs
+
+equiv S Cs Ctrue gs
+equiv S Cs Cfalse gs
+--------------------------------------------
+equiv S Cs (Guard condition Ctrue Cfalse) ([condition]++gs)
+
+
+(Let's forget about those that are below)
+
+forall i, (equiv S Ci (Ct/a=Ki) = Yes)
+equiv S Cf (Ct/a\notin(K_i)^i) = Yes
+            -------------------------------------------
+                    equiv S (Node(a, (Ki,Ci)^i, Cf) Ct gs ~> Yes
+
+  exists i, (equiv S Ci (Ct/a=Ki) = No(vs,vt))
+      -------------------------------------------------
+            equiv S (Node(a, (Ki,Ci)^i, Cf) Ct gs ~> No(vs,vt)
+
+  equiv S Cf (Ct/a\notin(K_i)^i) = No(vs,vt)
+      -------------------------------------------------
+            equiv kS (Node(a, (Ki,Ci)^i, Cf) Ct gs ~> No(vs,vt)
+
+  forall S Cs Ct gs, exists (a unique) r such that
+        (equiv S Cs Ct gs = r) is provable
+        
--- a/tesi/tesi.pdf
+++ b/tesi/tesi.pdf
--- a/tesi/tesi_unicode.org
+++ b/tesi/tesi_unicode.org
@ -35,12 +35,18 @@ Magari prima pattern matching poi compilatore?
 #+LaTeX_HEADER: \usepackage{comment} 
 #+LaTeX_HEADER: \usepackage{algpseudocode} 
 #+LaTeX_HEADER: \usepackage{amsmath,amssymb,amsthm}
-#+Latex_HEADER: \newtheorem{definition}{Definition}
+#+LaTeX_HEADER: \newtheorem{definition}{Definition}
+#+LaTeX_HEADER: \usepackage{mathpartir}
 #+LaTeX_HEADER: \usepackage{graphicx} 
 #+LaTeX_HEADER: \usepackage{listings} 
 #+LaTeX_HEADER: \usepackage{color} 
 #+LaTeX_HEADER: \usepackage{stmaryrd}
 #+LaTeX_HEADER: \newcommand{\sem}[1]{{\llbracket{#1}\rrbracket}}
+#+LaTeX_HEADER: \newcommand{\Equiv}[3]{\mathsf{equiv}(#1, #2, #3)} % \equiv is already taken
+#+LaTeX_HEADER: \newcommand{\covers}[2]{#1 \mathrel{\mathsf{covers}} #2}
+#+LaTeX_HEADER: \newcommand{\Yes}{\mathsf{Yes}}
+#+LaTeX_HEADER: \newcommand{\No}[2]{\mathsf{No}(#1, #2)}
+
 #+EXPORT_SELECT_TAGS: export
 #+EXPORT_EXCLUDE_TAGS: noexport
 #+OPTIONS: H:2 toc:nil \n:nil @:t ::t |:t ^:{} _:{} *:t TeX:t LaTeX:t
@ -89,16 +95,16 @@ into simple control flow structures such as \texttt{if, switch}, for example:
 \begin{comment}
 %% TODO: side by side
 \end{comment}
-The code in the right is in the Lambda intermediate representation of
+The code on the right is in the Lambda intermediate representation of
 the OCaml compiler. The Lambda representation of a program is shown by
 calling the \texttt{ocamlc} compiler with \texttt{-drawlambda} flag.

 The OCaml pattern matching compiler is a critical part of the OCaml compiler
-in terms of correctness because any bug would result in wrong code
+in terms of correctness because bugs typically would result in wrong code
 production rather than triggering compilation failures.
 Such bugs also are hard to catch by testing because they arise in
 corner cases of complex patterns which are typically not in the
-compiler test suite.
+compiler test suite or most user programs.

 The OCaml core developers group considered evolving the pattern matching compiler, either by
 using a new algorithm or by incremental refactoring of its code base.
@ -120,19 +126,42 @@ Our algorithm translates both source and target programs into a common
 representation, decision trees. Decision trees where chosen because
 they model the space of possible values at a given branch of
 execution.
-Here is the decision tree for the source example program.
+Here are the decision trees for the source and target example program.
+
+\begin{minipage}{0.5\linewidth}
 \begin{verbatim}
-       Node(Root)
+       Switch(Root)
       /        \
     (= [])    (= ::)
     /             \
-   Leaf         Node(Root.1)
+   Leaf         Switch(Root.1)
 (0, None)       /         \
             (= [])      (= ::)
             /               \
          Leaf              Leaf
-   (1, Some(Root.0))   (2, Some(Root.1.0))
+   [x = Root.0]         [y = Root.1.0]
+   (1, Some x)          (2, Some y)
 \end{verbatim}
+\end{minipage}
+\hfill
+\begin{minipage}{0.5\linewidth}
+\begin{verbatim}
+       Switch(Root)
+       /        \
+     (= int 0)  (!= int 0)
+     /             \
+   Leaf         Switch(Root.1)
+(makeblock 0     /       \
+  0 0a)         /         \
+             (= int 0)    (!= int 0)
+             /               \
+          Leaf              Leaf
+[x = Root.0]            [y = Root.1.0]
+(makeblock 0            (makeblock 0
+  1 (makeblock 0 x))      2 (makeblock 0 y))
+\end{verbatim}
+\end{minipage}
+
 \texttt{(Root.0)} is called an \emph{accessor}, that represents the
 access path to a value that can be reached by deconstructing the
 scrutinee. In this example \texttt{Root.0} is the first subvalue of
@ -140,8 +169,8 @@ the scrutinee.

 Target decision trees have a similar shape but the tests on the
 branches are related to the low level representation of values in
-Lambda code. For example, cons cells \texttt{x::xs} are blocks with
-tag 0.
+Lambda code. For example, cons cells \texttt{x::xs} or tuples
+\texttt{(x,y)} are blocks with tag 0.

 To check the equivalence of a source and a target decision tree,
 we proceed by case analysis.
@ -155,10 +184,11 @@ reduced tree is equivalent to $C_i$.

 \subsection{From source programs to decision trees}
 Our source language supports integers, lists, tuples and all algebraic
-datatypes. Patterns support wildcards, constructors and literals, or
-patterns $(p_1|p_2)$ and pattern variables.
-We also support \texttt{when} guards.
-Decision trees have nodes of the form:
+datatypes. Patterns support wildcards, constructors and literals,
+or-patterns $(p_1 | p_2)$ and pattern variables.  We also support
+\texttt{when} guards, which are interesting as they introduce the
+evaluation of expressions during matching.  Decision trees have nodes
+of the form:
 \begin{lstlisting}
 type decision_tree =
  | Unreachable
@ -183,7 +213,7 @@ It satisfies the following correctness statement:
 \[
 \forall t_S, \forall v_S, \quad t_S(v_S) = \sem{t_S}_S(v_S)
 \]
-Running any source values $v_S$ against the source program gives the
+Running any source value $v_S$ against the source program gives the
 same result as running it against the decision tree.

 \subsection{From target programs to decision trees}
@ -192,8 +222,10 @@ The target programs include the following Lambda constructs:
 various comparison operations, guards. The symbolic execution engine
 traverses the target program and builds an environment that maps
 variables to accessors. It branches at every control flow statement
-and emits a Switch node. The branch condition $\pi_i$ is expressed as
+and emits a \texttt{Switch} node. The branch condition $\pi_i$ is expressed as
 an interval set of possible values at that point.
+In comparison with the source decision trees, \texttt{Unreachable}
+nodes are never emitted.
 Guards result in branching. In comparison with the source decision
 trees, \texttt{Unreachable} nodes are never emitted.

@ -204,8 +236,106 @@ $t_T$, satisfying the following correctness statement:
 \]

 \subsection{Equivalence checking}
-TODO
+The equivalence checking algorithm takes as input a domain of
+possible values \emph{S} and a pair of source and target decision trees and
+in case the two trees are not equivalent it returns a counter example.
+The algorithm respects the following correctness statement:
+\begin{comment}
+% TODO: we have to define what \covers mean for readers to understand the specifications
+% (or we use a simplifying lie by hiding \covers in the statements).
+\end{comment}

+\begin{align*}
+ \Equiv S {C_S} {C_T} = \Yes \;\land\; \covers {C_T} S
+ & \implies
+ \forall v_S \approx v_T \in S,\; C_S(v_S) = C_T(v_T)
+\\
+ \Equiv S {C_S} {C_T} = \No {v_S} {v_T} \;\land\; \covers {C_T} S
+ & \implies
+ v_S \approx v_T \in S \;\land\; C_S(v_S) \neq C_T(v_T)
+\end{align*}
+
+The algorithm proceeds by case analysis. Inference rules are shown.
+If $S$ is empty the results is $\Yes$.
+
+\begin{verbatim}
+------------------------
+equiv \emptyset Cs Ct gs
+\end{verbatim}
+
+If the two decision trees are both terminal nodes the algorithm checks
+for content equality.
+\begin{verbatim}
+--------------------------
+equiv S Failure Failure []
+
+equiv_BB BBs BBt
+-------------------------------
+equiv S (Leaf BBs) (Leaf BBt) []
+\end{verbatim}
+
+If the source decision tree (left hand side) is a terminal while the
+target decistion tree (right hand side) is not, the algorithm proceeds
+by \emph{explosion} of the right hand side. Explosion means that every
+child of the right hand side is tested for equality against the left
+hand side.
+
+\begin{verbatim}
+(equiv S Cs Ci gs)^i
+equiv S Cs Cf gs
+-----------------------------------------
+equiv S Cs (Node(a, (Domi,Ci)^i, Cf)) gs
+\end{verbatim}
+
+
+\begin{comment}
+% TODO: [Gabriel] in practice the $dom_S$ are constructors and the
+% $dom_T$ are domains. Do we want to hide this fact, or be explicit
+% about it? (Maybe we should introduce explicitly/clearly the idea of
+% target domains at some point).
+\end{comment}
+
+When the left hand side is not a terminal, the algorithm explodes the
+left hand side while trimming every right hand side subtree. Trimming
+a left hand side tree on an interval set $dom_S$ computed from the right hand
+side tree constructor means mapping every branch condition $dom_T$ (interval set of
+possible values) on the left to the intersection of $dom_T$ and $dom_S$ when
+the accessors on both side are equal, and removing the branches that
+result in an empty intersection. If the accessors are different,
+\emph{$dom_T$} is left unchanged.
+
+\begin{verbatim}
+equiv S Ci (trim Ct a=Ki) gs
+equiv S Cf (trim Ct (a\notin(K_i)^i) gs
+-------------------------------------
+equiv S (Node(a, (Ki,Ci)^i, Cf) Ct gs
+\end{verbatim}
+
+The equivalence checking algorithm deals with guards by storing a
+queue. A guard blackbox is pushed to the queue whenever the algorithm
+encounters a Guard node on the right, while it pops a blackbox from
+the queue whenever a Guard node appears on the left hand side.
+The algorithm stops with failure if the popped blackbox and the and
+blackbox on the left hand Guard node are different, otherwise in
+continues by exploding to two subtrees, one in which the guard
+condition evaluates to true, the other when it evaluates to false.
+Termination of the algorithm is successful only when the guards queue
+is empty.
+\begin{verbatim}
+equiv S Ctrue Ct (gs++[condition])
+equiv S Cfalse Ct (gs++[condition])
+--------------------------------------------
+equiv S (Guard condition Ctrue Cfalse) Ct gs
+
+equiv S Cs Ctrue gs
+equiv S Cs Cfalse gs
+--------------------------------------------
+equiv S Cs (Guard condition Ctrue Cfalse) ([condition]++gs)
+\end{verbatim}
+
+\begin{comment}
+TODO: replace inference rules with good latex
+\end{comment}

 * Background
  
@ -503,7 +633,9 @@ Moreover, the  pattern matching compiler emits a warning when a
 pattern is not exhaustive or some patterns are shadowed by precedent ones.

 ** Symbolic execution
-TODO
+
+Symbolic execution is a widely used techniques in the field of
+computer security.

 ** Translation validation
 Translators, such as translators and code generators, are huge pieces of