correzioni coppo #1 vero

2020-04-11 17:56:40 +02:00 · 2020-04-11 17:56:40 +02:00 · 14cf53ee8d
commit 14cf53ee8d
parent f464d23c8f
3 changed files with 142 additions and 57 deletions
--- a/tesi/conv.py
+++ b/tesi/conv.py
@ -7,7 +7,7 @@ try:
 except:
    allsymbols = json.load(open('../unicode-latex.json'))
    
-mysymbols = ['≡', '≠', '≼', '→', '←', '⊀', '⋠', '≺', '∀',  'ε', '₀', '₂', '₁', '₃', 'ₐ', 'ₖ', 'ₘ', 'ₙ', 'ᵢ', 'ⁱ', '⋮', 'ₛ', 'ₜ', '≃', '⇔', '∧', '∅', 'ℕ', 'ⱼ', 'ʲ', '⊥', 'π', 'α', 'β', '∞', 'σ', '≤', '⊈', '∧', '∨', '∃', '⇒', '∩', '∉', '⋃', 'ᵏ', 'ₗ', 'ˡ', 'ₒ', 'ᵣ', 'ᴵ', '≈', '⊆' ]
+mysymbols = ['≡', '≠', '≼', '→', '←', '⊀', '⋠', '≺', '∀',  'ε', '₀', '₂', '₁', '₃', '₄', 'ₐ', 'ₖ', 'ₘ', 'ₙ', 'ᵢ', 'ⁱ', '⋮', 'ₛ', 'ₜ', '≃', '⇔', '∧', '∅', 'ℕ', 'ⱼ', 'ʲ', '⊥', 'π', 'α', 'β', '∞', 'σ', '≤', '⊈', '∧', '∨', '∃', '⇒', '∩', '∉', '⋃', 'ᵏ', 'ₗ', 'ˡ', 'ₒ', 'ᵣ', 'ᴵ', '≈', '⊆' ]
 extrasymbols = {'〚': '\llbracket', '〛': r'\rrbracket', '̸': '\neg', '¬̸': '\neg', '∈': '\in ', 'ₛ': '_S', 'ₜ': '_T'}

 symbols = {s: allsymbols[s] for s in mysymbols}
--- a/tesi/tesi.pdf
+++ b/tesi/tesi.pdf
--- a/tesi/tesi_unicode.org
+++ b/tesi/tesi_unicode.org
@ -106,7 +106,7 @@ by applying symbolic execution.
 A pattern matching compiler turns a series of pattern matching clauses
 into simple control flow structures such as \texttt{if, switch}, for example:
 \begin{lstlisting}
-  match x with
+  match scrutinee with
  | [] -> (0, None)
  | x::[] -> (1, Some x)
  | _::y::_ -> (2, Some y)
@ -115,17 +115,25 @@ Given as input to the pattern matching compiler, this snippet of code
 gets translated into the Lambda intermediate representation of
 the OCaml compiler. The Lambda representation of a program is shown by
 calling the \texttt{ocamlc} compiler with \texttt{-drawlambda} flag.
+In this example we renamed the variables assigned in order to ease the
+understanding of the tests that are performed when the code is
+translated into the Lambda form.
+code phase.
 \begin{lstlisting}
-(if scrutinee
-    (let (field_1 =a (field 1 scrutinee))
-        (if field_1
+(function scrutinee
+    (if scrutinee    ;;; true when scrutinee (list) not empty
+    (let (tail =a (field 1 scrutinee/81))    ;;; assignment
+        (if tail
        (let
-                (field_1_1 =a (field 1 field_1)
-                 x =a (field 0 field_1))
-                (makeblock 0 2 (makeblock 0 x)))
-            (let (y =a (field 0 scrutinee))
-                (makeblock 0 1 (makeblock 0 y)))))
-    [0: 0 0a])
+            y =a (field 0 tail)) 
+            ;;; y is the first element of the tail
+            (makeblock 0 2 (makeblock 0 y)))
+            ;;; allocate memory for tuple (2, Some y)
+        (let (x =a (field 0 scrutinee))      
+            ;;; x is the head of the scrutinee
+            (makeblock 0 1 (makeblock 0 x))))) 
+            ;;; allocate memory for tuple (1, Some x)
+    [0: 0 0a])))  ;;; low level representatio of (0, None)
 \end{lstlisting}

 The OCaml pattern matching compiler is a critical part of the OCaml compiler
@ -215,10 +223,28 @@ reduced tree is equivalent to $C_i$.
 \subsection{From source programs to decision trees}
 Our source language supports integers, lists, tuples and all algebraic
 datatypes. Patterns support wildcards, constructors and literals,
-Or-patterns such as $(p_1 | p_2)$ and pattern variables.  We also support
-\texttt{when} guards, which are interesting as they introduce the
-evaluation of expressions during matching.  Decision trees have nodes
-of the form:
+Or-patterns such as $(p_1 | p_2)$ and pattern variables.
+In particular Or-patterns provide a more compact way to group patterns
+that point to the same expression.
+
+\begin{minipage}{0.4\linewidth}
+\begin{lstlisting}
+match w with
+| p₁ -> expr
+| p₂ -> expr
+| p₃ -> expr
+\end{lstlisting}
+\end{minipage}
+\begin{minipage}{0.6\linewidth}
+\begin{lstlisting}
+match w with
+| p₁|p₂|p₃ -> expr
+\end{lstlisting}
+\end{minipage}
+We also support \texttt{when} guards, which are interesting as they
+introduce the evaluation of expressions during matching. 
+This is the type definition of decision tree as they are used in the
+prototype implementation:
 \begin{lstlisting}
 type decision_tree =
  | Unreachable
@ -236,15 +262,18 @@ matched by the source clauses.
 \texttt{Unreachable} is used when we statically know that no value
 can flow to that subtree.

-We write 〚tₛ〛ₛ for the decision tree of the source program
-t_S, computed by a matrix decomposition algorithm (each column
+We write 〚tₛ〛ₛ to denote the translation of the source program (the
+set of pattern matching clauses) into a decision tree, computed by a matrix decomposition algorithm (each column
 decomposition step gives a \texttt{Switch} node).
 It satisfies the following correctness statement:
 \[
 \forall t_s, \forall v_s, \quad t_s(v_s) = \semTEX{t_s}_s(v_s)
 \]
-Running any source value $v_S$ against the source program gives the
-same result as running it against the decision tree.
+The correctness statement intuitively states that for every source
+program, for every source value that is well-formed input to a source
+program, running the program tₛ against the input value vₛ is the same
+as running the compiled source program 〚tₛ〛 (that is a decision tree) against the same input
+value vₛ".

 \subsection{From target programs to decision trees}
 The target programs include the following Lambda constructs:
@ -259,8 +288,10 @@ nodes are never emitted.
 Guards result in branching. In comparison with the source decision
 trees, \texttt{Unreachable} nodes are never emitted.

-We write $\semTEX{t_T}_T$ for the decision tree of the target program
-$t_T$, satisfying the following correctness statement:
+We write $\semTEX{t_T}_T$ to denote the translation of a target
+program tₜ into a decision tree of the target program
+$t_T$, satisfying the following correctness statement that is
+simmetric to the correctness statement for the translation of source programs:
 \[
 \forall t_T, \forall v_T, \quad t_T(v_T) = \semTEX{t_T}_T(v_T)
 \]
@ -819,7 +850,7 @@ existing translators that consists of taking the source and the target
 * Translation validation of the Pattern Matching Compiler

 ** Source program
-The algorithm takes as its input a source program and translates it
+Our algorithm takes as its input a source program and translates it
 into an algebraic data structure which type we call /decision_tree/.

 #+BEGIN_SRC 
@ -957,18 +988,32 @@ All the guards are of the form \texttt{guard <arg> <arg> <arg>}, where the
 <arg> are expressed using the OCaml pattern matching language.
 Similarly, all the right-hand-side expressions are of the form
 \texttt{observe <arg> <arg> ...} with the same constraints on arguments.
-
 #+BEGIN_SRC 
 type t = K1 | K2 of t (* declaration of an algebraic and recursive datatype t *)

 let _ = function
  | K1 -> observe 0
  | K2 K1 -> observe 1
-  | K2 x when guard x -> observe 2
+  | K2 x when guard x -> observe 2 (* guard inspects the x variable *)
  | K2 (K2 x) as y when guard x y -> observe 3
  | K2 _ -> observe 4
 #+END_SRC
-
+We note that the right hand side of /observe/ is just an arbitrary
+value and in this case just enumerates the order in which expressions
+appear.
+Even if this is an oversimplification of the problem for the
+prototype, it must be noted that at the compiler level we have the
+possibility to compile the pattern clauses in two separate steps so
+that the guards and right-hand-side expressions are semantically equal
+to their counterparts at the target program level.
+\begin{lstlisting}
+let _ = function
+  | K1 -> lambda₀
+  | K2 K1 -> lambda₁
+  | K2 x when lambda-guard₀ -> lambda₂
+  | K2 (K2 x) as y when lambda-guard₁ -> lambda₃
+  | K2 _ -> lambda₄
+\end{lstlisting}
 The source program is parsed using the ocaml-compiler-libs library.
 The result of parsing, when successful, results in a list of clauses
 and a list of type declarations.
@ -1008,24 +1053,27 @@ following rules apply
 When a value /v/ matches pattern /p/ we say that /v/ is an /instance/ of /p/.


-During compilation by the translators, expressions are compiled into
+During compilation by the translators, expressions at the
+right-hand-side are compiled into
 Lambda code and are referred as lambda code actions lᵢ.

-The entire pattern matching code is represented as a clause matrix
-that associates rows of patterns (p_{i,1}, p_{i,2}, ..., p_{i,n}) to
-lambda code action lⁱ
+We define the /pattern matrix/ P as the matrix |m x n| where m bigger
+or equal than the number of clauses in the source program and n is
+equal to the arity of the constructor with the gratest arity.
 \begin{equation*}
-(P → L) =
+P =
 \begin{pmatrix}
-p_{1,1} & p_{1,2} & \cdots & p_{1,n} & → l₁ \\
-p_{2,1} & p_{2,2} & \cdots & p_{2,n} & →  l₂ \\
-\vdots & \vdots & \ddots & \vdots & → \vdots \\
-p_{m,1} & p_{m,2} & \cdots & p_{m,n} & → lₘ
+p_{1,1} & p_{1,2} & \cdots & p_{1,n}  \\
+p_{2,1} & p_{2,2} & \cdots & p_{2,n}  \\
+\vdots & \vdots & \ddots & \vdots  \\
+p_{m,1} & p_{m,2} & \cdots & p_{m,n} )
 \end{pmatrix}
 \end{equation*}
-
+every row of /P/ is called a pattern vector
+ $\vec{p_i}$ = (p₁, p₂, ..., pₙ); In every instance of P pattern
+vectors appear normalized on the length of the longest pattern vector.
 Considering the pattern matrix P we say that the value vector
-$\vec{v}$ = (v₁, v₂, ..., vᵢ) matches the line number i in P  if and only if the following two
+$\vec{v}$ = (v₁, v₂, ..., vᵢ) matches the pattern vector pᵢ in P  if and only if the following two
 conditions are satisfied:
 - p_{i,1}, p_{i,2}, \cdots, p_{i,n}  ≼ (v₁, v₂, ..., vᵢ)
 - ∀j < i p_{j,1}, p_{j,2}, \cdots, p_{j,n} ⋠ (v₁, v₂, ..., vᵢ) 
@ -1038,10 +1086,20 @@ We can define the following three relations with respect to patterns:
 - Patterns p and q are compatible when they share a common instance

 \subsubsection{Matrix decomposition of pattern clauses}
-
+We define a new object, the /clause matrix/ P → L of size |m x n+1| that associates
+pattern vectors $\vec{p_i}$ to lambda code action lᵢ.
+\begin{equation*}
+P → L = 
+\begin{pmatrix}
+p_{1,1} & p_{1,2} & \cdots & p_{1,n} → l₁ \\
+p_{2,1} & p_{2,2} & \cdots & p_{2,n} → l₂ \\
+\vdots & \vdots & \ddots & \vdots →  \vdots \\
+p_{m,1} & p_{m,2} & \cdots & p_{m,n} → lₘ
+\end{pmatrix}
+\end{equation*}
 The initial input of the decomposition algorithm C consists of a vector of variables
 $\vec{x}$ = (x₁, x₂, ..., xₙ)  of size /n/ where /n/ is the arity of
-the type of /x/ and a clause matrix P → L of width n and height m.
+the type of /x/ and the /clause matrix/ P → L.
 That is:

 \begin{equation*}
@ -1050,12 +1108,12 @@ C((\vec{x} = (x₁, x₂, ..., xₙ),
 p_{1,1} & p_{1,2} & \cdots & p_{1,n} → l₁ \\
 p_{2,1} & p_{2,2} & \cdots & p_{2,n} → l₂ \\
 \vdots & \vdots & \ddots & \vdots →  \vdots \\
-p_{m,1} & p_{m,2} & \cdots & p_{m,n} → lₘ)
-\end{pmatrix}
+p_{m,1} & p_{m,2} & \cdots & p_{m,n} → lₘ
+\end{pmatrix})
 \end{equation*}

 The base case C₀ of the algorithm is the case in which the $\vec{x}$
-is empty and the result of the compilation
+is an empty sequence and the result of the compilation
 C₀ is l₁
 \begin{equation*}
 C₀((),
@ -1091,7 +1149,7 @@ following four rules:
    for wildcard patterns and the lambda action lᵢ remains unchanged.

 2) Constructor rule: if all patterns in the first column of P are
-   constructors patterns of the form k(q₁, q₂, ..., qₙ) we define a
+   constructors patterns of the form k(q₁, q₂, ..., q_{n'}) we define a
   new matrix, the specialized clause matrix S, by applying the
   following transformation on every row /p/:
    \begin{lstlisting}[mathescape,columns=fullflexible,basicstyle=\fontfamily{lmvtt}\selectfont,]
@ -1147,11 +1205,40 @@ following four rules:
   largest prefix matrix for which one of the three previous rules
   apply, and P₂ → L₂ containing the remaining rows. The algorithm is
   applied to both matrices.
-
-
+It is important to note that the application of the decomposition
+algorithm converges. This intuition can be verified by defining the
+size of the clause matrix P → L as equal to the length of the longest
+pattern vector $\vec{p_i}$ and the length of a pattern vector as the
+number of symbols that appear in the clause.
+While it is very easy to see that the application of rules 1) and 4)
+produces new matrices of length equal or smaller than the original
+clause matrix, we can show that:
+- with the application of the constructor rule the pattern vector $\vec{p_i}$ loses one
+  symbol after its decomposition:
+  | \vert{}(p_{i,1} (q₁, q₂, ..., q_{n'}), p_{i,2}, p_{i,3}, ..., p_{i,n})\vert{} = n + n'
+  | \vert{}(q_{i,1}, q_{i,2}, ..., q_{i,n'}, p_{i,2}, p_{i,3}, ..., p_{i,n})\vert{} = n + n' - 1
+- with the application of the orpat rule, we add one row to the clause
+  matrix P → L but the length of a row containing an
+  Or-pattern decreases
+\begin{equation*}
+\vert{}P → L\vert{} = \big\lvert
+\begin{pmatrix}
+(p_{1,1}\vert{}q_{1,1}) & p_{1,2} & \cdots & p_{1,n} → l₁ \\
+\vdots & \vdots & \ddots & \vdots →  \vdots \\
+\end{pmatrix}\big\rvert = n + 1
+\end{equation*}
+\begin{equation*}
+\vert{}P' → L'\vert{} = \big\lvert
+\begin{pmatrix}
+p_{1,1} & p_{1,2} & \cdots & p_{1,n} → l₁ \\
+q_{1,1} & p_{1,2} & \cdots & p_{1,n} → l₁ \\
+\vdots & \vdots & \ddots & \vdots →  \vdots \\
+\end{pmatrix}\big\rvert = n
+\end{equation*}
 In our prototype we make use of accessors to encode stored values.
 \begin{minipage}{0.2\linewidth}
 \begin{verbatim}
+
 let value =  1 :: 2 :: 3 :: []
 (* that can also be written *)
 let value = []
@ -1228,7 +1315,6 @@ Base cases:
   because a source matrix in the case of empty rows returns
   the first expression and (Leaf bb)(v) := Match bb
 2. [| (aⱼ)ʲ, ∅ |] ≡ Failure 
-
 Regarding non base cases:
 Let's first define
 | let Idx(k) := [0; arity(k)[
@ -1240,16 +1326,15 @@ m := ((a_i)^i ((p_{ij})^i \to e_j)^{ij})
 \[
 (k_k)^k := headconstructor(p_{i0})^i
 \]
-\begin{equation}
- Groups(m) := ( k_k \to  ((a)_{0.l})^{l  \in  Idx(k_k)} +++ (a_i)^{i \in I\DZ}), \\
-    ( if p_{0j} is k(q_l) then \\
-        (qₗ)^{l \in Idx(k_k)} +++ (p_{ij})^{i \in I\DZ} \to e_j \\
-     if p_{0j} is \_ then \\
-        (\_)^{l \in Idx(k_k)} +++ (p_{ij})^{i \in I\DZ}   \to e_j \\
-     else \bot  )^j ), \\
-  ((a_i)^{i \in I\DZ}, ((p_{ij})^{i \in I\DZ}  \to eⱼ if p_{0j} is \_ else \bot)^{j \in J}) 
-\end{equation}
-
+| Groups(m) := (k_{k} \to  ((a)_{0.l})^{l  \in  Idx(k_{k})} +++ (a_{i})^{i\in{}I\DZ}), 
+|(
+| \quad if p_{0j} is k(q_{l}) then 
+| \quad \quad \quad (qₗ)^{l \in Idx(k_k)} +++ (p_{ij})^{i\in{}I\DZ} \to e_{j} 
+| \quad    if p_{0j} is "_" then 
+| \quad \quad \quad ("_")^{l \in Idx(k_{k})} +++ (p_{ij})^{i\in{}I\DZ}   \to e_{j} 
+|  \quad   else \bot
+|)^{j ∈ J},
+|  ((a_{i})^{i\in{}I\DZ}, ((p_{ij})^{i\in{}I\DZ}  \to eⱼ if p_{0j} is \_ else \bot)^{j\in{}J}) 
 Groups(m) is an auxiliary function that source a matrix m into
 submatrices, according to the head constructor of their first pattern.
 Groups(m) returns one submatrix m_r for each head constructor k that
@ -1266,7 +1351,7 @@ the wildcard submatrix.

 We formalize this intuition as follows:
 Lemma (Groups):
-Let \[m\] be a matrix with \[Groups(m) = (k_r \to m_r)^k, m_{wild}\].
+Let /m/ be a matrix with \[Groups(m) = (k_r \to m_r)^k, m_{wild}\].
 For any value vector $(v_i)^l$ such that $v_0 = k(v'_l)^l$ for some
 constructor k,
 we have: