correzioni coppo #1 vero

This commit is contained in:
Francesco Mecca 2020-04-11 17:56:40 +02:00
parent f464d23c8f
commit 14cf53ee8d
3 changed files with 142 additions and 57 deletions

View file

@ -7,7 +7,7 @@ try:
except: except:
allsymbols = json.load(open('../unicode-latex.json')) allsymbols = json.load(open('../unicode-latex.json'))
mysymbols = ['', '', '', '', '', '', '', '', '', 'ε', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'ʲ', '', 'π', 'α', 'β', '', 'σ', '', '', '', '', '', '', '', '', '', '', '', 'ˡ', '', '', '', '', '' ] mysymbols = ['', '', '', '', '', '', '', '', '', 'ε', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'ʲ', '', 'π', 'α', 'β', '', 'σ', '', '', '', '', '', '', '', '', '', '', '', 'ˡ', '', '', '', '', '' ]
extrasymbols = {'': '\llbracket', '': r'\rrbracket', '̸': '\neg', '¬̸': '\neg', '': '\in ', '': '_S', '': '_T'} extrasymbols = {'': '\llbracket', '': r'\rrbracket', '̸': '\neg', '¬̸': '\neg', '': '\in ', '': '_S', '': '_T'}
symbols = {s: allsymbols[s] for s in mysymbols} symbols = {s: allsymbols[s] for s in mysymbols}

Binary file not shown.

View file

@ -106,7 +106,7 @@ by applying symbolic execution.
A pattern matching compiler turns a series of pattern matching clauses A pattern matching compiler turns a series of pattern matching clauses
into simple control flow structures such as \texttt{if, switch}, for example: into simple control flow structures such as \texttt{if, switch}, for example:
\begin{lstlisting} \begin{lstlisting}
match x with match scrutinee with
| [] -> (0, None) | [] -> (0, None)
| x::[] -> (1, Some x) | x::[] -> (1, Some x)
| _::y::_ -> (2, Some y) | _::y::_ -> (2, Some y)
@ -115,17 +115,25 @@ Given as input to the pattern matching compiler, this snippet of code
gets translated into the Lambda intermediate representation of gets translated into the Lambda intermediate representation of
the OCaml compiler. The Lambda representation of a program is shown by the OCaml compiler. The Lambda representation of a program is shown by
calling the \texttt{ocamlc} compiler with \texttt{-drawlambda} flag. calling the \texttt{ocamlc} compiler with \texttt{-drawlambda} flag.
In this example we renamed the variables assigned in order to ease the
understanding of the tests that are performed when the code is
translated into the Lambda form.
code phase.
\begin{lstlisting} \begin{lstlisting}
(if scrutinee (function scrutinee
(let (field_1 =a (field 1 scrutinee)) (if scrutinee ;;; true when scrutinee (list) not empty
(if field_1 (let (tail =a (field 1 scrutinee/81)) ;;; assignment
(if tail
(let (let
(field_1_1 =a (field 1 field_1) y =a (field 0 tail))
x =a (field 0 field_1)) ;;; y is the first element of the tail
(makeblock 0 2 (makeblock 0 x))) (makeblock 0 2 (makeblock 0 y)))
(let (y =a (field 0 scrutinee)) ;;; allocate memory for tuple (2, Some y)
(makeblock 0 1 (makeblock 0 y))))) (let (x =a (field 0 scrutinee))
[0: 0 0a]) ;;; x is the head of the scrutinee
(makeblock 0 1 (makeblock 0 x)))))
;;; allocate memory for tuple (1, Some x)
[0: 0 0a]))) ;;; low level representatio of (0, None)
\end{lstlisting} \end{lstlisting}
The OCaml pattern matching compiler is a critical part of the OCaml compiler The OCaml pattern matching compiler is a critical part of the OCaml compiler
@ -215,10 +223,28 @@ reduced tree is equivalent to $C_i$.
\subsection{From source programs to decision trees} \subsection{From source programs to decision trees}
Our source language supports integers, lists, tuples and all algebraic Our source language supports integers, lists, tuples and all algebraic
datatypes. Patterns support wildcards, constructors and literals, datatypes. Patterns support wildcards, constructors and literals,
Or-patterns such as $(p_1 | p_2)$ and pattern variables. We also support Or-patterns such as $(p_1 | p_2)$ and pattern variables.
\texttt{when} guards, which are interesting as they introduce the In particular Or-patterns provide a more compact way to group patterns
evaluation of expressions during matching. Decision trees have nodes that point to the same expression.
of the form:
\begin{minipage}{0.4\linewidth}
\begin{lstlisting}
match w with
| p₁ -> expr
| p₂ -> expr
| p₃ -> expr
\end{lstlisting}
\end{minipage}
\begin{minipage}{0.6\linewidth}
\begin{lstlisting}
match w with
| p₁|p₂|p₃ -> expr
\end{lstlisting}
\end{minipage}
We also support \texttt{when} guards, which are interesting as they
introduce the evaluation of expressions during matching.
This is the type definition of decision tree as they are used in the
prototype implementation:
\begin{lstlisting} \begin{lstlisting}
type decision_tree = type decision_tree =
| Unreachable | Unreachable
@ -236,15 +262,18 @@ matched by the source clauses.
\texttt{Unreachable} is used when we statically know that no value \texttt{Unreachable} is used when we statically know that no value
can flow to that subtree. can flow to that subtree.
We write 〚tₛ〛ₛ for the decision tree of the source program We write 〚tₛ〛ₛ to denote the translation of the source program (the
t_S, computed by a matrix decomposition algorithm (each column set of pattern matching clauses) into a decision tree, computed by a matrix decomposition algorithm (each column
decomposition step gives a \texttt{Switch} node). decomposition step gives a \texttt{Switch} node).
It satisfies the following correctness statement: It satisfies the following correctness statement:
\[ \[
\forall t_s, \forall v_s, \quad t_s(v_s) = \semTEX{t_s}_s(v_s) \forall t_s, \forall v_s, \quad t_s(v_s) = \semTEX{t_s}_s(v_s)
\] \]
Running any source value $v_S$ against the source program gives the The correctness statement intuitively states that for every source
same result as running it against the decision tree. program, for every source value that is well-formed input to a source
program, running the program tₛ against the input value vₛ is the same
as running the compiled source program 〚tₛ〛 (that is a decision tree) against the same input
value vₛ".
\subsection{From target programs to decision trees} \subsection{From target programs to decision trees}
The target programs include the following Lambda constructs: The target programs include the following Lambda constructs:
@ -259,8 +288,10 @@ nodes are never emitted.
Guards result in branching. In comparison with the source decision Guards result in branching. In comparison with the source decision
trees, \texttt{Unreachable} nodes are never emitted. trees, \texttt{Unreachable} nodes are never emitted.
We write $\semTEX{t_T}_T$ for the decision tree of the target program We write $\semTEX{t_T}_T$ to denote the translation of a target
$t_T$, satisfying the following correctness statement: program tₜ into a decision tree of the target program
$t_T$, satisfying the following correctness statement that is
simmetric to the correctness statement for the translation of source programs:
\[ \[
\forall t_T, \forall v_T, \quad t_T(v_T) = \semTEX{t_T}_T(v_T) \forall t_T, \forall v_T, \quad t_T(v_T) = \semTEX{t_T}_T(v_T)
\] \]
@ -819,7 +850,7 @@ existing translators that consists of taking the source and the target
* Translation validation of the Pattern Matching Compiler * Translation validation of the Pattern Matching Compiler
** Source program ** Source program
The algorithm takes as its input a source program and translates it Our algorithm takes as its input a source program and translates it
into an algebraic data structure which type we call /decision_tree/. into an algebraic data structure which type we call /decision_tree/.
#+BEGIN_SRC #+BEGIN_SRC
@ -957,18 +988,32 @@ All the guards are of the form \texttt{guard <arg> <arg> <arg>}, where the
<arg> are expressed using the OCaml pattern matching language. <arg> are expressed using the OCaml pattern matching language.
Similarly, all the right-hand-side expressions are of the form Similarly, all the right-hand-side expressions are of the form
\texttt{observe <arg> <arg> ...} with the same constraints on arguments. \texttt{observe <arg> <arg> ...} with the same constraints on arguments.
#+BEGIN_SRC #+BEGIN_SRC
type t = K1 | K2 of t (* declaration of an algebraic and recursive datatype t *) type t = K1 | K2 of t (* declaration of an algebraic and recursive datatype t *)
let _ = function let _ = function
| K1 -> observe 0 | K1 -> observe 0
| K2 K1 -> observe 1 | K2 K1 -> observe 1
| K2 x when guard x -> observe 2 | K2 x when guard x -> observe 2 (* guard inspects the x variable *)
| K2 (K2 x) as y when guard x y -> observe 3 | K2 (K2 x) as y when guard x y -> observe 3
| K2 _ -> observe 4 | K2 _ -> observe 4
#+END_SRC #+END_SRC
We note that the right hand side of /observe/ is just an arbitrary
value and in this case just enumerates the order in which expressions
appear.
Even if this is an oversimplification of the problem for the
prototype, it must be noted that at the compiler level we have the
possibility to compile the pattern clauses in two separate steps so
that the guards and right-hand-side expressions are semantically equal
to their counterparts at the target program level.
\begin{lstlisting}
let _ = function
| K1 -> lambda₀
| K2 K1 -> lambda₁
| K2 x when lambda-guard₀ -> lambda₂
| K2 (K2 x) as y when lambda-guard₁ -> lambda₃
| K2 _ -> lambda₄
\end{lstlisting}
The source program is parsed using the ocaml-compiler-libs library. The source program is parsed using the ocaml-compiler-libs library.
The result of parsing, when successful, results in a list of clauses The result of parsing, when successful, results in a list of clauses
and a list of type declarations. and a list of type declarations.
@ -1008,24 +1053,27 @@ following rules apply
When a value /v/ matches pattern /p/ we say that /v/ is an /instance/ of /p/. When a value /v/ matches pattern /p/ we say that /v/ is an /instance/ of /p/.
During compilation by the translators, expressions are compiled into During compilation by the translators, expressions at the
right-hand-side are compiled into
Lambda code and are referred as lambda code actions lᵢ. Lambda code and are referred as lambda code actions lᵢ.
The entire pattern matching code is represented as a clause matrix We define the /pattern matrix/ P as the matrix |m x n| where m bigger
that associates rows of patterns (p_{i,1}, p_{i,2}, ..., p_{i,n}) to or equal than the number of clauses in the source program and n is
lambda code action lⁱ equal to the arity of the constructor with the gratest arity.
\begin{equation*} \begin{equation*}
(P → L) = P =
\begin{pmatrix} \begin{pmatrix}
p_{1,1} & p_{1,2} & \cdots & p_{1,n} & → l₁ \\ p_{1,1} & p_{1,2} & \cdots & p_{1,n} \\
p_{2,1} & p_{2,2} & \cdots & p_{2,n} & → l₂ \\ p_{2,1} & p_{2,2} & \cdots & p_{2,n} \\
\vdots & \vdots & \ddots & \vdots & → \vdots \\ \vdots & \vdots & \ddots & \vdots \\
p_{m,1} & p_{m,2} & \cdots & p_{m,n} & → lₘ p_{m,1} & p_{m,2} & \cdots & p_{m,n} )
\end{pmatrix} \end{pmatrix}
\end{equation*} \end{equation*}
every row of /P/ is called a pattern vector
$\vec{p_i}$ = (p₁, p₂, ..., pₙ); In every instance of P pattern
vectors appear normalized on the length of the longest pattern vector.
Considering the pattern matrix P we say that the value vector Considering the pattern matrix P we say that the value vector
$\vec{v}$ = (v₁, v₂, ..., vᵢ) matches the line number i in P if and only if the following two $\vec{v}$ = (v₁, v₂, ..., vᵢ) matches the pattern vector pᵢ in P if and only if the following two
conditions are satisfied: conditions are satisfied:
- p_{i,1}, p_{i,2}, \cdots, p_{i,n} ≼ (v₁, v₂, ..., vᵢ) - p_{i,1}, p_{i,2}, \cdots, p_{i,n} ≼ (v₁, v₂, ..., vᵢ)
- ∀j < i p_{j,1}, p_{j,2}, \cdots, p_{j,n} ⋠ (v₁, v₂, ..., vᵢ) - ∀j < i p_{j,1}, p_{j,2}, \cdots, p_{j,n} ⋠ (v₁, v₂, ..., vᵢ)
@ -1038,10 +1086,20 @@ We can define the following three relations with respect to patterns:
- Patterns p and q are compatible when they share a common instance - Patterns p and q are compatible when they share a common instance
\subsubsection{Matrix decomposition of pattern clauses} \subsubsection{Matrix decomposition of pattern clauses}
We define a new object, the /clause matrix/ P → L of size |m x n+1| that associates
pattern vectors $\vec{p_i}$ to lambda code action lᵢ.
\begin{equation*}
P → L =
\begin{pmatrix}
p_{1,1} & p_{1,2} & \cdots & p_{1,n} → l₁ \\
p_{2,1} & p_{2,2} & \cdots & p_{2,n} → l₂ \\
\vdots & \vdots & \ddots & \vdots → \vdots \\
p_{m,1} & p_{m,2} & \cdots & p_{m,n} → lₘ
\end{pmatrix}
\end{equation*}
The initial input of the decomposition algorithm C consists of a vector of variables The initial input of the decomposition algorithm C consists of a vector of variables
$\vec{x}$ = (x₁, x₂, ..., xₙ) of size /n/ where /n/ is the arity of $\vec{x}$ = (x₁, x₂, ..., xₙ) of size /n/ where /n/ is the arity of
the type of /x/ and a clause matrix P → L of width n and height m. the type of /x/ and the /clause matrix/ P → L.
That is: That is:
\begin{equation*} \begin{equation*}
@ -1050,12 +1108,12 @@ C((\vec{x} = (x₁, x₂, ..., xₙ),
p_{1,1} & p_{1,2} & \cdots & p_{1,n} → l₁ \\ p_{1,1} & p_{1,2} & \cdots & p_{1,n} → l₁ \\
p_{2,1} & p_{2,2} & \cdots & p_{2,n} → l₂ \\ p_{2,1} & p_{2,2} & \cdots & p_{2,n} → l₂ \\
\vdots & \vdots & \ddots & \vdots → \vdots \\ \vdots & \vdots & \ddots & \vdots → \vdots \\
p_{m,1} & p_{m,2} & \cdots & p_{m,n} → lₘ) p_{m,1} & p_{m,2} & \cdots & p_{m,n} → lₘ
\end{pmatrix} \end{pmatrix})
\end{equation*} \end{equation*}
The base case C₀ of the algorithm is the case in which the $\vec{x}$ The base case C₀ of the algorithm is the case in which the $\vec{x}$
is empty and the result of the compilation is an empty sequence and the result of the compilation
C₀ is l₁ C₀ is l₁
\begin{equation*} \begin{equation*}
C₀((), C₀((),
@ -1091,7 +1149,7 @@ following four rules:
for wildcard patterns and the lambda action lᵢ remains unchanged. for wildcard patterns and the lambda action lᵢ remains unchanged.
2) Constructor rule: if all patterns in the first column of P are 2) Constructor rule: if all patterns in the first column of P are
constructors patterns of the form k(q₁, q₂, ..., q) we define a constructors patterns of the form k(q₁, q₂, ..., q_{n'}) we define a
new matrix, the specialized clause matrix S, by applying the new matrix, the specialized clause matrix S, by applying the
following transformation on every row /p/: following transformation on every row /p/:
\begin{lstlisting}[mathescape,columns=fullflexible,basicstyle=\fontfamily{lmvtt}\selectfont,] \begin{lstlisting}[mathescape,columns=fullflexible,basicstyle=\fontfamily{lmvtt}\selectfont,]
@ -1147,11 +1205,40 @@ following four rules:
largest prefix matrix for which one of the three previous rules largest prefix matrix for which one of the three previous rules
apply, and P₂ → L₂ containing the remaining rows. The algorithm is apply, and P₂ → L₂ containing the remaining rows. The algorithm is
applied to both matrices. applied to both matrices.
It is important to note that the application of the decomposition
algorithm converges. This intuition can be verified by defining the
size of the clause matrix P → L as equal to the length of the longest
pattern vector $\vec{p_i}$ and the length of a pattern vector as the
number of symbols that appear in the clause.
While it is very easy to see that the application of rules 1) and 4)
produces new matrices of length equal or smaller than the original
clause matrix, we can show that:
- with the application of the constructor rule the pattern vector $\vec{p_i}$ loses one
symbol after its decomposition:
| \vert{}(p_{i,1} (q₁, q₂, ..., q_{n'}), p_{i,2}, p_{i,3}, ..., p_{i,n})\vert{} = n + n'
| \vert{}(q_{i,1}, q_{i,2}, ..., q_{i,n'}, p_{i,2}, p_{i,3}, ..., p_{i,n})\vert{} = n + n' - 1
- with the application of the orpat rule, we add one row to the clause
matrix P → L but the length of a row containing an
Or-pattern decreases
\begin{equation*}
\vert{}P → L\vert{} = \big\lvert
\begin{pmatrix}
(p_{1,1}\vert{}q_{1,1}) & p_{1,2} & \cdots & p_{1,n} → l₁ \\
\vdots & \vdots & \ddots & \vdots → \vdots \\
\end{pmatrix}\big\rvert = n + 1
\end{equation*}
\begin{equation*}
\vert{}P' → L'\vert{} = \big\lvert
\begin{pmatrix}
p_{1,1} & p_{1,2} & \cdots & p_{1,n} → l₁ \\
q_{1,1} & p_{1,2} & \cdots & p_{1,n} → l₁ \\
\vdots & \vdots & \ddots & \vdots → \vdots \\
\end{pmatrix}\big\rvert = n
\end{equation*}
In our prototype we make use of accessors to encode stored values. In our prototype we make use of accessors to encode stored values.
\begin{minipage}{0.2\linewidth} \begin{minipage}{0.2\linewidth}
\begin{verbatim} \begin{verbatim}
let value = 1 :: 2 :: 3 :: [] let value = 1 :: 2 :: 3 :: []
(* that can also be written *) (* that can also be written *)
let value = [] let value = []
@ -1228,7 +1315,6 @@ Base cases:
because a source matrix in the case of empty rows returns because a source matrix in the case of empty rows returns
the first expression and (Leaf bb)(v) := Match bb the first expression and (Leaf bb)(v) := Match bb
2. [| (aⱼ)ʲ, ∅ |] ≡ Failure 2. [| (aⱼ)ʲ, ∅ |] ≡ Failure
Regarding non base cases: Regarding non base cases:
Let's first define Let's first define
| let Idx(k) := [0; arity(k)[ | let Idx(k) := [0; arity(k)[
@ -1240,16 +1326,15 @@ m := ((a_i)^i ((p_{ij})^i \to e_j)^{ij})
\[ \[
(k_k)^k := headconstructor(p_{i0})^i (k_k)^k := headconstructor(p_{i0})^i
\] \]
\begin{equation} | Groups(m) := (k_{k} \to ((a)_{0.l})^{l \in Idx(k_{k})} +++ (a_{i})^{i\in{}I\DZ}),
Groups(m) := ( k_k \to ((a)_{0.l})^{l \in Idx(k_k)} +++ (a_i)^{i \in I\DZ}), \\ |(
( if p_{0j} is k(q_l) then \\ | \quad if p_{0j} is k(q_{l}) then
(qₗ)^{l \in Idx(k_k)} +++ (p_{ij})^{i \in I\DZ} \to e_j \\ | \quad \quad \quad (qₗ)^{l \in Idx(k_k)} +++ (p_{ij})^{i\in{}I\DZ} \to e_{j}
if p_{0j} is \_ then \\ | \quad if p_{0j} is "_" then
(\_)^{l \in Idx(k_k)} +++ (p_{ij})^{i \in I\DZ} \to e_j \\ | \quad \quad \quad ("_")^{l \in Idx(k_{k})} +++ (p_{ij})^{i\in{}I\DZ} \to e_{j}
else \bot )^j ), \\ | \quad else \bot
((a_i)^{i \in I\DZ}, ((p_{ij})^{i \in I\DZ} \to eⱼ if p_{0j} is \_ else \bot)^{j \in J}) |)^{j ∈ J},
\end{equation} | ((a_{i})^{i\in{}I\DZ}, ((p_{ij})^{i\in{}I\DZ} \to eⱼ if p_{0j} is \_ else \bot)^{j\in{}J})
Groups(m) is an auxiliary function that source a matrix m into Groups(m) is an auxiliary function that source a matrix m into
submatrices, according to the head constructor of their first pattern. submatrices, according to the head constructor of their first pattern.
Groups(m) returns one submatrix m_r for each head constructor k that Groups(m) returns one submatrix m_r for each head constructor k that
@ -1266,7 +1351,7 @@ the wildcard submatrix.
We formalize this intuition as follows: We formalize this intuition as follows:
Lemma (Groups): Lemma (Groups):
Let \[m\] be a matrix with \[Groups(m) = (k_r \to m_r)^k, m_{wild}\]. Let /m/ be a matrix with \[Groups(m) = (k_r \to m_r)^k, m_{wild}\].
For any value vector $(v_i)^l$ such that $v_0 = k(v'_l)^l$ for some For any value vector $(v_i)^l$ such that $v_0 = k(v'_l)^l$ for some
constructor k, constructor k,
we have: we have: