659 lines
22 KiB
TeX
659 lines
22 KiB
TeX
% Created 2020-04-09 Thu 01:36
|
|
% Intended LaTeX compiler: pdflatex
|
|
\documentclass[11pt]{article}
|
|
\usepackage[utf8]{inputenc}
|
|
\usepackage[T1]{fontenc}
|
|
\usepackage{graphicx}
|
|
\usepackage{grffile}
|
|
\usepackage{longtable}
|
|
\usepackage{wrapfig}
|
|
\usepackage{rotating}
|
|
\usepackage[normalem]{ulem}
|
|
\usepackage{amsmath}
|
|
\usepackage{textcomp}
|
|
\usepackage{amssymb}
|
|
\usepackage{capt-of}
|
|
\usepackage{hyperref}
|
|
\linespread{1.25}
|
|
\usepackage{algorithm}
|
|
\usepackage{comment}
|
|
\usepackage{algpseudocode}
|
|
\usepackage{amsmath,amssymb,amsthm}
|
|
\newtheorem{definition}{Definition}
|
|
\usepackage{mathpartir}
|
|
\usepackage{graphicx}
|
|
\usepackage{listings}
|
|
\usepackage{color}
|
|
\usepackage{stmaryrd}
|
|
\newcommand{\semTEX}[1]{{\llbracket{#1}\rrbracket}}
|
|
\newcommand{\EquivTEX}[3]{\mathsf{equiv}(#1, #2, #3)} % \equiv is already taken
|
|
\newcommand{\coversTEX}[2]{#1 \mathrel{\mathsf{covers}} #2}
|
|
\newcommand{\YesTEX}{\mathsf{Yes}}
|
|
\newcommand{\NoTEX}[2]{\mathsf{No}(#1, #2)}
|
|
\usepackage{comment}
|
|
\usepackage{mathpartir}
|
|
\usepackage{stmaryrd} % llbracket, rrbracket
|
|
\usepackage{listings}
|
|
\usepackage{notations}
|
|
\lstset{
|
|
mathescape=true,
|
|
language=[Objective]{Caml},
|
|
basicstyle=\ttfamily,
|
|
extendedchars=true,
|
|
showstringspaces=false,
|
|
aboveskip=\smallskipamount,
|
|
% belowskip=\smallskipamount,
|
|
columns=fullflexible,
|
|
moredelim=**[is][\color{blue}]{/*}{*/},
|
|
moredelim=**[is][\color{green!60!black}]{/!}{!/},
|
|
moredelim=**[is][\color{orange}]{/(}{)/},
|
|
moredelim=[is][\color{red}]{/[}{]/},
|
|
xleftmargin=1em,
|
|
}
|
|
\lstset{aboveskip=0.4ex,belowskip=0.4ex}
|
|
\author{Francesco Mecca}
|
|
\date{}
|
|
\hypersetup{
|
|
pdfkeywords={},
|
|
pdfsubject={},
|
|
pdfcreator={Emacs 26.3 (Org mode 9.1.9)},
|
|
pdflang={English}}
|
|
\begin{document}
|
|
|
|
\maketitle
|
|
\begin{comment}
|
|
TODO: neg is parsed incorrectly
|
|
TODO: chiedi a Gabriel se T e S vanno bene, ma prima controlla che siano coerenti
|
|
\section{{\bfseries\sffamily TODO} Scaletta [1/6]}
|
|
\label{sec:org530707f}
|
|
\begin{itemize}
|
|
\item[{$\boxtimes$}] Introduction
|
|
\item[{$\boxminus$}] Background [80\%]
|
|
\begin{itemize}
|
|
\item[{$\boxtimes$}] Low level representation
|
|
\item[{$\boxtimes$}] Lambda code [0\%]
|
|
\item[{$\boxtimes$}] Pattern matching
|
|
\item[{$\boxtimes$}] Symbolic execution
|
|
\item[{$\square$}] Translation Validation
|
|
\end{itemize}
|
|
\item[{$\square$}] Translation validation of the Pattern Matching Compiler
|
|
\begin{itemize}
|
|
\item[{$\square$}] Source translation
|
|
\begin{itemize}
|
|
\item[{$\square$}] Formal Grammar
|
|
\item[{$\square$}] Compilation of source patterns
|
|
\item[{$\square$}] Rest?
|
|
\end{itemize}
|
|
\item[{$\square$}] Target translation
|
|
\begin{itemize}
|
|
\item[{$\square$}] Formal Grammar
|
|
\item[{$\square$}] Symbolic execution of the Lambda target
|
|
\end{itemize}
|
|
\item[{$\square$}] Equivalence between source and target
|
|
\end{itemize}
|
|
\item[{$\square$}] Proof of correctness
|
|
\item[{$\square$}] Practical results
|
|
\end{itemize}
|
|
|
|
Magari prima pattern matching poi compilatore?
|
|
|
|
\end{comment}
|
|
\section{Correctness of the algorithm}
|
|
\label{sec:org4188d38}
|
|
Running a program t$_S$ or its translation $\llbracket$t$_S$$\rrbracket$ against an input v$_S$
|
|
produces as a result \emph{r} in the following way:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
( $\llbracket$t$_S$$\rrbracket$$_S$(v$_S$) = C$_S$(v$_S$) ) $\to$ r\\
|
|
t$_S$(v$_S$) $\to$ r\\
|
|
\end{tabular}
|
|
\end{center}
|
|
Likewise
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
( $\llbracket$t$_T$$\rrbracket$$_T$(v$_T$) = C$_T$(v$_T$) ) $\to$ r\\
|
|
t$_T$(v$_T$) $\to$ r\\
|
|
\end{tabular}
|
|
\end{center}
|
|
where result r ::= guard list * (Match blackbox | NoMatch | Absurd)
|
|
and guard ::= blackbox.
|
|
|
|
Having defined equivalence between two inputs of which one is
|
|
expressed in the source language and the other in the target language
|
|
v$_S$ $\simeq$ v$_T$ (TODO define, this talks about the representation of source values in the target)
|
|
|
|
we can define the equivalence between a couple of programs or a couple
|
|
of decision trees
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
t$_S$ $\simeq$ t$_T$ := $\forall$v$_S$$\simeq$v$_T$, t$_S$(v$_S$) = t$_T$(v$_T$)\\
|
|
C$_S$ $\simeq$ C$_T$ := $\forall$v$_S$$\simeq$v$_T$, C$_S$(v$_S$) = C$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
The proposed equivalence algorithm that works on a couple of
|
|
decision trees is returns either \emph{Yes} or \emph{No(v$_S$, v$_T$)} where v$_S$ and
|
|
v$_T$ are a couple of possible counter examples for which the constraint
|
|
trees produce a different result.
|
|
|
|
\subsection{Statements}
|
|
\label{sec:orgccf94c1}
|
|
Theorem. We say that a translation of a source program to a decision tree
|
|
is correct when for every possible input, the source program and its
|
|
respective decision tree produces the same result
|
|
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
$\forall$v$_S$, t$_S$(v$_S$) = $\llbracket$t$_S$$\rrbracket$$_S$(v$_S$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
|
|
Likewise, for the target language:
|
|
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
$\forall$v$_T$, t$_T$(v$_T$) = $\llbracket$t$_T$$\rrbracket$$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
Definition: in the presence of guards we can say that two results are
|
|
equivalent modulo the guards queue, written \emph{r$_1$ $\simeq$gs r$_2$}, when:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
(gs$_1$, r$_1$) $\simeq$gs (gs$_2$, r$_2$) $\Leftrightarrow$ (gs$_1$, r$_1$) = (gs$_2$ ++ gs, r$_2$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
Definition: we say that C$_T$ covers the input space \emph{S}, written
|
|
/covers(C$_T$, S) when every value v$_S$$\in $S is a valid input to the
|
|
decision tree C$_T$. (TODO: rephrase)
|
|
|
|
Theorem: Given an input space \emph{S} and a couple of decision trees, where
|
|
the target decision tree C$_T$ covers the input space \emph{S}, we say that
|
|
the two decision trees are equivalent when:
|
|
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S, C$_S$, C$_T$, gs) = Yes $\wedge$ covers(C$_T$, S) $\to$ $\forall$v$_S$$\simeq$v$_T$ $\in $ S, C$_S$(v$_S$) $\simeq$gs C$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
Similarly we say that a couple of decision trees in the presence of
|
|
an input space \emph{S} are \emph{not} equivalent when:
|
|
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S, C$_S$, C$_T$, gs) = No(v$_S$,v$_T$) $\wedge$ covers(C$_T$, S) $\to$ v$_S$$\simeq$v$_T$ $\in $ S $\wedge$ C$_S$(v$_S$) $\ne$gs C$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
Corollary: For a full input space \emph{S}, that is the universe of the
|
|
target program we say:
|
|
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S, $\llbracket$t$_S$$\rrbracket$$_S$, $\llbracket$t$_T$$\rrbracket$$_T$, $\varnothing$) = Yes $\Leftrightarrow$ t$_S$ $\simeq$ t$_T$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
|
|
\begin{enumerate}
|
|
\item Proof of the correctness of the translation from source programs to source decision trees
|
|
\label{sec:orgfa2a152}
|
|
|
|
We define a source term t$_S$ as a collection of patterns pointing to blackboxes
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
t$_S$ ::= (p $\to$ bb)\(^{\text{i$\in $I}}\)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
A pattern is defined as either a constructor pattern, an or pattern or
|
|
a constant pattern
|
|
\begin{center}
|
|
\begin{tabular}{lllll}
|
|
p ::= & K(p$_i$)$^i$, i $\in $ I & (p & q) & n $\in $ $\mathbb{N}$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
A decision tree is defined as either a Leaf, a Failure terminal or
|
|
an intermediate node with different children sharing the same accessor \emph{a}
|
|
and an optional fallback.
|
|
Failure is emitted only when the patterns don't cover the whole set of
|
|
possible input values \emph{S}. The fallback is not needed when the user
|
|
doesn't use a wildcard pattern.
|
|
\%\%\% Give example of thing
|
|
|
|
\begin{center}
|
|
\begin{tabular}{ll}
|
|
C$_S$ ::= Leaf bb & Node(a, (K$_i$ $\to$ C$_i$)\(^{\text{i$\in $S}}\) , C?)\\
|
|
a ::= Here & n.a\\
|
|
v$_S$ ::= K(v$_i$)\(^{\text{i$\in $I}}\) & n $\in $ $\mathbb{N}$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
\begin{comment}
|
|
Are K and Here clear here?
|
|
\end{comment}
|
|
|
|
We define the decomposition matrix \emph{m$_S$} as
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
SMatrix m$_S$ := (a$_j$)\(^{\text{j$\in $J}}\), ((p\(_{\text{ij}}\))\(^{\text{j$\in $J}}\) $\to$ bb$_i$)\(^{\text{i$\in $I}}\)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\begin{comment}
|
|
Correggi prendendo in considerazione l'accessor
|
|
\end{comment}
|
|
|
|
We define the decision tree of source programs
|
|
$\llbracket$t$_S$$\rrbracket$
|
|
in terms of the decision tree of pattern matrices
|
|
$\llbracket$m$_S$$\rrbracket$
|
|
by the following:
|
|
$\llbracket$((p$_i$ $\to$ bb$_i$)\(^{\text{i$\in $I}}\)$\rrbracket$ := $\llbracket$(Root), (p$_i$ $\to$ bb$_i$)\(^{\text{i$\in $I}}\) $\rrbracket$
|
|
|
|
decision tree computed from pattern matrices respect the following invariant:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
$\forall$v (v$_i$)\(^{\text{i$\in $I}}\) = v(a$_i$)\(^{\text{i$\in $I}}\) $\to$ $\llbracket$m$\rrbracket$(v) = m(v$_i$)\(^{\text{i$\in $I}}\) for m = ((a$_i$)\(^{\text{i$\in $I}}\), (r$_i$)\(^{\text{i$\in $I}}\))\\
|
|
\end{tabular}
|
|
\end{center}
|
|
where
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
v(Here) = v\\
|
|
K(v$_i$)$^i$(k.a) = v$_k$(a) if k $\in $ [0;n[\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\begin{comment}
|
|
TODO: EXPLAIN
|
|
\end{comment}
|
|
|
|
We proceed to show the correctness of the invariant by a case
|
|
analysys.
|
|
|
|
Base cases:
|
|
\begin{enumerate}
|
|
\item\relax [| $\varnothing$, ($\varnothing$ $\to$ bb$_i$)$^i$ |] := Leaf bb$_i$ where i := min(I), that is a
|
|
decision tree [|ms|] defined by an empty accessor and empty
|
|
patterns pointing to blackboxes \emph{bb$_i$}. This respects the invariant
|
|
because a decomposition matrix in the case of empty rows returns
|
|
the first expression and we known that (Leaf bb)(v) := Match bb
|
|
\item\relax [| (a$_j$)$^j$, $\varnothing$ |] := Failure
|
|
\end{enumerate}
|
|
|
|
Regarding non base cases:
|
|
Let's first define
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
let Idx(k) := [0; arity(k)[\\
|
|
let First($\varnothing$) := $\bot$\\
|
|
let First((a$_j$)$^j$) := a\(_{\text{min(j$\in $J$\ne$$\varnothing$)}}\)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\[
|
|
m := ((a_i)^i ((p_{ij})^i \to e_j)^{ij})
|
|
\]
|
|
\[
|
|
(k_k)^k := headconstructor(p_{i0})^i
|
|
\]
|
|
\begin{equation}
|
|
Groups(m) := ( k_k \to ((a)_{0.l})^{l \in Idx(k_k)} +++ (a_i)^{i \in I\backslash \{0\} }), \\
|
|
( if p_{0j} is k(q_l) then \\
|
|
(q_l)^{l \in Idx(k_k)} +++ (p_{ij})^{i \in I\backslash \{0\}} \to e_j \\
|
|
if p_{0j} is \_ then \\
|
|
(\_)^{l \in Idx(k_k)} +++ (p_{ij})^{i \in I\backslash \{0\}} \to e_j \\
|
|
else \bot )^j ), \\
|
|
((a_i)^{i \in I\backslash \{0\}}, ((p_{ij})^{i \in I\backslash \{0\}} \to e_j if p_{0j} is \_ else \bot)^{j \in J})
|
|
\end{equation}
|
|
|
|
Groups(m) is an auxiliary function that decomposes a matrix m into
|
|
submatrices, according to the head constructor of their first pattern.
|
|
Groups(m) returns one submatrix m\_r for each head constructor k that
|
|
occurs on the first row of m, plus one "wildcard submatrix" m\(_{\text{wild}}\)
|
|
that matches on all values that do not start with one of those head
|
|
constructors.
|
|
|
|
Intuitively, m is equivalent to its decompositionin the following
|
|
sense: if the first pattern of an input vector (v\_i)\^{}i starts with one
|
|
of the head constructors k, then running (v\_i)\^{}i against m is the same
|
|
as running it against the submatrix m\_k; otherwise (its head
|
|
constructor is none of the k) it is equivalent to running it against
|
|
the wildcard submatrix.
|
|
|
|
We formalize this intuition as follows:
|
|
Lemma (Groups):
|
|
Let \[m\] be a matrix with \[Groups(m) = (k_r \to m_r)^k, m_{wild}\].
|
|
For any value vector \[(v_i)^l\] such that \[v_0 = k(v'_l)^l\] for some
|
|
constructor k,
|
|
we have:
|
|
\[
|
|
if k = k_k for some k then
|
|
m(v_i)^i = m_k((v'_l)^l +++ (v_i)^{i\in I\backslash \{0\}})
|
|
else
|
|
m(v_i)^i = m_{wild}(v_i)^{i\in I\backslash \{0\}}
|
|
\]
|
|
|
|
\item Proof:
|
|
\label{sec:org8dad6fc}
|
|
Let \[m\] be a matrix with \[Group(m) = (k_r \to m_r)^k, m_{wild}\].
|
|
Let \[(v_i)^i\] be an input matrix with \[v_0 = k(v'_l)^l\] for some k.
|
|
We proceed by case analysis:
|
|
\begin{itemize}
|
|
\item either k is one of the k$_k$ for some k
|
|
\item or k is none of the (k$_k$)$^k$
|
|
\end{itemize}
|
|
|
|
Both m(v$_i$)$^i$ and m$_k$(v$_k$)$^k$ are defined as the first matching result of
|
|
a family over each row r$_j$ of a matrix
|
|
|
|
We know, from the definition of
|
|
Groups(m), that m$_k$ is
|
|
\[
|
|
((a){0.l})^{l\in Idx(k_k)} +++ (a_i)^{i\in I\backslash \{0\}}),
|
|
(
|
|
if p_{0j} is k(q_l) then
|
|
(q_l)^l +++ (p_{ij})^{i\in I\backslash \{0\}} \to e_j
|
|
if p_{0j} is _ then
|
|
(_)^l +++ (p_{ij})^{i\in I\backslash \{0\}} \to e_j
|
|
else \bot
|
|
)^j
|
|
\]
|
|
|
|
By definition, m(v$_i$)$^i$ is
|
|
m(v$_i$)$^i$ = First(r$_j$(v$_i$)$^i$)$^j$ for m = ((a$_i$)$^i$, (r$_j$)$^j$)
|
|
(p$_i$)$^i$ (v$_i$)$^i$ = \{
|
|
if p$_0$ = k(q$_l$)$^l$, v$_0$ = k'(v'$_k$)$^k$, k=Idx(k') and l=Idx(k)
|
|
if k $\ne$ k' then $\bot$
|
|
if k = k' then ((q$_l$)$^l$ \sout{+} (p$_i$)\(^{\text{i$\in $I}\backslash\ \text{$\backslash${0$\backslash$}}}\)) ((v'$_k$)$^k$ \sout{+} (v$_i$)\(^{\text{i$\in $I}\backslash\ \text{$\backslash${0$\backslash$}}}\))
|
|
if p$_0$ = (q$_1$|q$_2$) then
|
|
First( (q$_1$p$_i$\(^{\text{i$\in $I }\backslash\ \text{$\backslash${0$\backslash$}}}\)) v$_i$\(^{\text{i$\in $I }\backslash\ \text{$\backslash${0$\backslash$}}}\), (q$_2$p$_i$\(^{\text{i$\in $I }\backslash\ \text{$\backslash${0$\backslash$}}}\)) v$_i$\(^{\text{i$\in $I }\backslash\ \text{$\backslash${0$\backslash$}}}\) )
|
|
\}
|
|
|
|
For this reason, if we can prove that
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
$\forall$j, r$_j$(v$_i$)$^i$ = r'$_j$((v'$_k$)$^k$ ++ (v$_i$)$^i$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
it follows that
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
m(v$_i$)$^i$ = m$_k$((v'$_k$)$^k$ ++ (v$_i$)$^i$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
from the above definition.
|
|
|
|
We can also show that a$_i$ = a\(_{\text{0.l}}\)$^l$ \sout{+} a\(_{\text{i$\in $I}\backslash\ \text{$\backslash${0$\backslash$}}}\) because v(a$_0$) = K(v(a)\{0.l\})$^l$)
|
|
\end{enumerate}
|
|
|
|
|
|
|
|
|
|
\subsection{Proof of equivalence checking}
|
|
\label{sec:orgffce3ea}
|
|
\begin{comment}
|
|
TODO: put ^i$\in $I where needed
|
|
\end{comment}
|
|
\subsubsection{The trimming lemma}
|
|
The trimming lemma allows to reduce the size of a decision tree given
|
|
an accessor $\to$ $\pi$ relation (TODO: expand)
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
$\forall$v$_T$ $\in $ (a$\to$$\pi$), C$_T$(v$_T$) = C\(_{\text{t/a$\to$$\pi$(k$_i$)}}\)(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
We prove this by induction on C$_T$:
|
|
a. C$_T$ = Leaf\(_{\text{bb}}\): when the decision tree is a leaf terminal, we
|
|
know that
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
Leaf\(_{\text{bb/a$\to$$\pi$}}\)(v) = Leaf\(_{\text{bb}}\)(v)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
That means that the result of trimming on a Leaf is the Leaf itself
|
|
b. The same applies to Failure terminal
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
Failure\(_{\text{/a$\to$$\pi$}}\)(v) = Failure(v)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
c. When C$_T$ = Node(b, ($\pi$$\to$C$_i$)$^i$)\(_{\text{/a$\to$$\pi$}}\) then
|
|
we look at the accessor \emph{a} of the subtree C$_i$ and
|
|
we define $\pi$$_i$' = $\pi$$_i$ if a$\ne$b else $\pi$$_i$$\cap$$\pi$
|
|
Trimming a switch node yields the following result:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
Node(b, ($\pi$$\to$C$_i$)$^i$)\(_{\text{/a$\to$$\pi$}}\) := Node(b, ($\pi$'$_i$$\to$C\(_{\text{i/a$\to$$\pi$}}\))$^i$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
\begin{comment}
|
|
Actually in the proof.org file I transcribed:
|
|
e. Unreachabe $\to$ $\bot$
|
|
This is not correct because you don't have Unreachable nodes in target decision trees
|
|
\end{comment}
|
|
|
|
For the trimming lemma we have to prove that running the value v$_T$ against
|
|
the decistion tree C$_T$ is the same as running v$_T$ against the tree
|
|
C\(_{\text{trim}}\) that is the result of the trimming operation on C$_T$
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
C$_T$(v$_T$) = C\(_{\text{trim}}\)(v$_T$) = Node(b, ($\pi$$_i$'$\to$C\(_{\text{i/a$\to$$\pi$}}\))$^i$)(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
We can reason by first noting that when v$_T$$\notin$(b$\to$$\pi$$_i$)$^i$ the node must be a Failure node.
|
|
In the case where $\exists$k| v$_T$$\in $(b$\to$$\pi$$_k$) then we can prove that
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
C\(_{\text{k/a$\to$$\pi$}}\)(v$_T$) = Node(b, ($\pi$$_i$'$\to$C\(_{\text{i/a$\to$$\pi$}}\))$^i$)(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
because when a $\ne$ b then $\pi$$_k$'= $\pi$$_k$ and this means that v$_T$$\in $$\pi$$_k$'
|
|
while when a = b then $\pi$$_k$'=($\pi$$_k$$\cap$$\pi$) and vt$\in $$\pi$$_k$' because:
|
|
\begin{itemize}
|
|
\item by the hypothesis, v$_T$$\in $$\pi$
|
|
\item we are in the case where v$_T$$\in $$\pi$$_k$
|
|
\end{itemize}
|
|
So v$_T$ $\in $ $\pi$$_k$' and by induction
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
C$_k$(v$_T$) = C\(_{\text{k/a$\to$$\pi$}}\)(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
We also know that $\forall$v$_T$$\in $(b$\to$$\pi$$_k$) $\to$ C$_T$(v$_T$) = C$_k$(v$_T$)
|
|
By putting together the last two steps, we have proven the trimming
|
|
lemma.
|
|
|
|
\begin{comment}
|
|
TODO: what should I say about covering??? I swap $\pi$ and $\pi$'
|
|
Covering lemma:
|
|
$\forall$a,$\pi$ covers(C$_T$,S) $\to$ covers(C_{t/a$\to$$\pi$}, (S$\cap$a$\to$$\pi$))
|
|
U$_i$$\pi$$^i$ $\approx$ U$_i$$\pi$'$\cap$(a$\to$$\pi$) $\approx$ (U$_i$$\pi$')$\cap$(a$\to$$\pi$) %%
|
|
|
|
|
|
%%%%%%% Also: Should I swap $\pi$ and $\pi$' ?
|
|
\end{comment}
|
|
|
|
\subsubsection{Equivalence checking}
|
|
The equivalence checking algorithm takes as parameters an input space
|
|
\emph{S}, a source decision tree \emph{C$_S$} and a target decision tree \emph{C$_T$}:
|
|
\begin{center}
|
|
\begin{tabular}{ll}
|
|
equiv(S, C$_S$, C$_T$) $\to$ Yes & No(v$_S$, v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
When the algorithm returns Yes and the input space is covered by \emph{C$_S$}
|
|
we can say that the couple of decision trees are the same for
|
|
every couple of source value \emph{v$_S$} and target value \emph{v$_T$} that are equivalent.
|
|
\begin{comment}
|
|
Define "covered"
|
|
Is it correct to say the same? How to correctly distinguish in words $\simeq$ and = ?
|
|
\end{comment}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S, C$_S$, C$_T$) = Yes and cover(C$_T$, S) $\to$ $\forall$ v$_S$ $\simeq$ v$_T$$\in $S $\wedge$ C$_S$(v$_S$) = C$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
In the case where the algorithm returns No we have at least a couple
|
|
of counter example values v$_S$ and v$_T$ for which the two decision trees
|
|
outputs a different result.
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S, C$_S$, C$_T$) = No(v$_S$,v$_T$) and cover(C$_T$, S) $\to$ $\forall$ v$_S$ $\simeq$ v$_T$$\in $S $\wedge$ C$_S$(v$_S$) $\ne$ C$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
We define the following
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
Forall(Yes) = Yes\\
|
|
Forall(Yes::l) = Forall(l)\\
|
|
Forall(No(v$_S$,v$_T$)::\_) = No(v$_S$,v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
There exists and are injective:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
int(k) $\in $ $\mathbb{N}$ (arity(k) = 0)\\
|
|
tag(k) $\in $ $\mathbb{N}$ (arity(k) > 0)\\
|
|
$\pi$(k) = \{n\(\vert{}\) int(k) = n\} x \{n\(\vert{}\) tag(k) = n\}\\
|
|
\end{tabular}
|
|
\end{center}
|
|
where k is a constructor.
|
|
|
|
\begin{comment}
|
|
TODO: explain:
|
|
$\forall$v$\in $a$\to$$\pi$, C_{/a$\to$$\pi$}(v) = C(v)
|
|
\end{comment}
|
|
|
|
We proceed by case analysis:
|
|
\begin{comment}
|
|
I start numbering from zero to leave the numbers as they were on the blackboard, were we skipped some things
|
|
I think the unreachable case should go at the end.
|
|
\end{comment}
|
|
\begin{enumerate}
|
|
\item in case of unreachable:
|
|
\end{enumerate}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
C$_S$(v$_S$) = Absurd(Unreachable) $\ne$ C$_T$(v$_T$) $\forall$v$_S$,v$_T$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\begin{enumerate}
|
|
\item In the case of an empty input space
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv($\varnothing$, C$_S$, C$_T$) := Yes\\
|
|
\end{tabular}
|
|
\end{center}
|
|
and that is trivial to prove because there is no pair of values (v$_S$, v$_T$) that could be
|
|
tested against the decision trees.
|
|
In the other subcases S is always non-empty.
|
|
\item When there are \emph{Failure} nodes at both sides the result is \emph{Yes}:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S, Failure, Failure) := Yes\\
|
|
\end{tabular}
|
|
\end{center}
|
|
Given that $\forall$v, Failure(v) = Failure, the statement holds.
|
|
\item When we have a Leaf or a Failure at the left side:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S, Failure as C$_S$, Node(a, ($\pi$$_i$ $\to$ C$_T$$_i$)$^i$)) := Forall(equiv( S$\cap$a$\to$$\pi$(k$_i$)), C$_S$, C$_T$$_i$)$^i$)\\
|
|
equiv(S, Leaf bb$_S$ as C$_S$, Node(a, ($\pi$$_i$ $\to$ C$_T$$_i$)$^i$)) := Forall(equiv( S$\cap$a$\to$$\pi$(k$_i$)), C$_S$, C$_T$$_i$)$^i$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
The algorithm either returns Yes for every sub-input space S$_i$ := S$\cap$(a$\to$$\pi$(k$_i$)) and
|
|
subtree C$_T$$_i$
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S$_i$, C$_S$, C$_T$$_i$) = Yes $\forall$i\\
|
|
\end{tabular}
|
|
\end{center}
|
|
or we have a counter example v$_S$, v$_T$ for which
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
v$_S$$\simeq$v$_T$$\in $S$_k$ $\wedge$ c$_S$(v$_S$) $\ne$ C$_T$$_k$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
then because
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
v$_T$$\in $(a$\to$$\pi$$_k$) $\to$ C$_T$(v$_T$) = C$_T$$_k$(v$_T$) ,\\
|
|
v$_S$$\simeq$v$_T$$\in $S $\wedge$ C$_S$(v$_S$)$\ne$C$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
we can say that
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S$_i$, C$_S$, C$_T$$_i$) = No(v$_S$, v$_T$) for some minimal k$\in $I\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\item When we have a Node on the right we define $\pi$$_n$ as the domain of
|
|
values not covered but the union of the constructors k$_i$
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
$\pi$$_n$ = ¬($\bigcup$$\pi$(k$_i$)$^i$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
The algorithm proceeds by trimming
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S, Node(a, (k$_i$ $\to$ C$_S$$_i$)$^i$, C\(_{\text{sf}}\)), C$_T$) :=\\
|
|
Forall(equiv( S$\cap$(a$\to$$\pi$(k$_i$)$^i$), C$_S$$_i$, C\(_{\text{t/a$\to$$\pi$(k$_i$)}}\))$^i$ \sout{+} equiv(S$\cap$(a$\to$$\pi$(k$_i$)), C$_S$, C\(_{\text{a$\to$$\pi$$_n$}}\)))\\
|
|
\end{tabular}
|
|
\end{center}
|
|
The statement still holds and we show this by first analyzing the
|
|
\emph{Yes} case:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
Forall(equiv( S$\cap$(a$\to$$\pi$(k$_i$)$^i$), C$_S$$_i$, C\(_{\text{t/a$\to$$\pi$(k$_i$)}}\))$^i$ = Yes\\
|
|
\end{tabular}
|
|
\end{center}
|
|
The constructor k is either included in the set of constructors k$_i$:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
k \(\vert{}\) k$\in $(k$_i$)$^i$ $\wedge$ C$_S$(v$_S$) = C$_S$$_i$(v$_S$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
We also know that
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
(1) C$_S$$_i$(v$_S$) = C\(_{\text{t/a$\to$$\pi$$_i$}}\)(v$_T$)\\
|
|
(2) C\(_{\text{T/a$\to$$\pi$$_i$}}\)(v$_T$) = C$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
(1) is true by induction and (2) is a consequence of the trimming lemma.
|
|
Putting everything together:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
C$_S$(v$_S$) = C$_S$$_i$(v$_S$) = C\(_{\text{T/a$\to$$\pi$$_i$}}\)(v$_T$) = C$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
When the k$\notin$(k$_i$)$^i$ [TODO]
|
|
|
|
The auxiliary Forall function returns \emph{No(v$_S$, v$_T$)} when, for a minimum k,
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S$_k$, C$_S$$_k$, C\(_{\text{T/a$\to$$\pi$$_k$}}\) = No(v$_S$, v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
Then we can say that
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
C$_S$$_k$(v$_S$) $\ne$ C\(_{\text{t/a$\to$$\pi$$_k$}}\)(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
that is enough for proving that
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
C$_S$$_k$(v$_S$) $\ne$ (C\(_{\text{t/a$\to$$\pi$$_k$}}\)(v$_T$) = C$_T$(v$_T$))\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\end{enumerate}
|
|
\end{document}
|