1152 lines
No EOL
41 KiB
TeX
1152 lines
No EOL
41 KiB
TeX
% Created 2020-04-12 Sun 22:50
|
|
% Intended LaTeX compiler: pdflatex
|
|
\documentclass[11pt]{article}
|
|
\usepackage[utf8]{inputenc}
|
|
\usepackage[T1]{fontenc}
|
|
\usepackage{graphicx}
|
|
\usepackage{grffile}
|
|
\usepackage{longtable}
|
|
\usepackage{wrapfig}
|
|
\usepackage{rotating}
|
|
\usepackage[normalem]{ulem}
|
|
\usepackage{amsmath}
|
|
\usepackage{textcomp}
|
|
\usepackage{amssymb}
|
|
\usepackage{capt-of}
|
|
\usepackage{hyperref}
|
|
\linespread{1.25}
|
|
\usepackage{algorithm}
|
|
\usepackage{comment}
|
|
\usepackage{algpseudocode}
|
|
\usepackage{amsmath,amssymb,amsthm}
|
|
\newtheorem{definition}{Definition}
|
|
\usepackage{mathpartir}
|
|
\usepackage{graphicx}
|
|
\usepackage{listings}
|
|
\usepackage{color}
|
|
\usepackage{stmaryrd}
|
|
\newcommand{\semTEX}[1]{{\llbracket{#1}\rrbracket}}
|
|
\newcommand{\EquivTEX}[3]{\mathsf{equiv}(#1, #2, #3)} % \equiv is already taken
|
|
\newcommand{\coversTEX}[2]{#1 \mathrel{\mathsf{covers}} #2}
|
|
\newcommand{\YesTEX}{\mathsf{Yes}}
|
|
\newcommand{\DZ}{\backslash\text{\{0\}}}
|
|
\newcommand{\NoTEX}[2]{\mathsf{No}(#1, #2)}
|
|
\usepackage{comment}
|
|
\usepackage{mathpartir}
|
|
\usepackage{stmaryrd} % llbracket, rrbracket
|
|
\usepackage{listings}
|
|
\usepackage{notations}
|
|
\lstset{
|
|
mathescape=true,
|
|
language=[Objective]{Caml},
|
|
basicstyle=\ttfamily,
|
|
extendedchars=true,
|
|
showstringspaces=false,
|
|
aboveskip=\smallskipamount,
|
|
% belowskip=\smallskipamount,
|
|
columns=fullflexible,
|
|
moredelim=**[is][\color{blue}]{/*}{*/},
|
|
moredelim=**[is][\color{green!60!black}]{/!}{!/},
|
|
moredelim=**[is][\color{orange}]{/(}{)/},
|
|
moredelim=[is][\color{red}]{/[}{]/},
|
|
xleftmargin=1em,
|
|
}
|
|
\lstset{aboveskip=0.4ex,belowskip=0.4ex}
|
|
\date{\today}
|
|
\title{}
|
|
\hypersetup{
|
|
pdfauthor={},
|
|
pdftitle={},
|
|
pdfkeywords={},
|
|
pdfsubject={},
|
|
pdfcreator={Emacs 26.3 (Org mode 9.1.9)},
|
|
pdflang={English}}
|
|
\begin{document}
|
|
|
|
\section{Translation validation of the Pattern Matching Compiler}
|
|
\label{sec:org02550ae}
|
|
|
|
\subsection{Source program}
|
|
\label{sec:org8c14d2a}
|
|
Our algorithm takes as its input a source program and translates it
|
|
into an algebraic data structure which type we call \emph{decision\_tree}.
|
|
|
|
\begin{verbatim}
|
|
type decision_tree =
|
|
| Unreachable
|
|
| Failure
|
|
| Leaf of source_expr
|
|
| Guard of source_blackbox * decision_tree * decision_tree
|
|
| Switch of accessor * (constructor * decision_tree) list * decision_tree
|
|
\end{verbatim}
|
|
|
|
Unreachable, Leaf of source\_expr and Failure are the terminals of the three.
|
|
We distinguish
|
|
\begin{itemize}
|
|
\item Unreachable: statically it is known that no value can go there
|
|
\item Failure: a value matching this part results in an error
|
|
\item Leaf: a value matching this part results into the evaluation of a
|
|
source black box of code
|
|
\end{itemize}
|
|
Our algorithm doesn't support type-declaration-based analysis
|
|
to know the list of constructors at a given type.
|
|
Let's consider some trivial examples:
|
|
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
function true -> 1\\
|
|
\end{tabular}
|
|
\end{center}
|
|
is translated to
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
Switch ([(true, Leaf 1)], Failure)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
while
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
function\\
|
|
\(\vert{}\) true -> 1\\
|
|
\(\vert{}\) false -> 2\\
|
|
\end{tabular}
|
|
\end{center}
|
|
will be translated to
|
|
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
Switch ([(true, Leaf 1); (false, Leaf 2)])\\
|
|
\end{tabular}
|
|
\end{center}
|
|
It is possible to produce Unreachable examples by using
|
|
refutation clauses (a "dot" in the right-hand-side)
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
function\\
|
|
\(\vert{}\) true -> 1\\
|
|
\(\vert{}\) false -> 2\\
|
|
\(\vert{}\) \_ -> .\\
|
|
\end{tabular}
|
|
\end{center}
|
|
that gets translated into
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
Switch ([(true, Leaf 1); (false, Leaf 2)], Unreachable)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
We trust this annotation, which is reasonable as the type-checker
|
|
verifies that it indeed holds.
|
|
|
|
Guard nodes of the tree are emitted whenever a guard is found. Guards
|
|
node contains a blackbox of code that is never evaluated and two
|
|
branches, one that is taken in case the guard evaluates to true and
|
|
the other one that contains the path taken when the guard evaluates to
|
|
true.
|
|
|
|
\begin{comment}
|
|
[ ] Finisci con Switch
|
|
[ ] Spiega del fallback
|
|
\end{comment}
|
|
|
|
The source code of a pattern matching function has the
|
|
following form:
|
|
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
match variable with\\
|
|
\(\vert{}\) pattern$_1$ \(\to\) expr$_1$\\
|
|
\(\vert{}\) pattern$_2$ when guard \(\to\) expr$_2$\\
|
|
\(\vert{}\) pattern$_3$ as var \(\to\) expr$_3$\\
|
|
$\vdots$\\
|
|
\(\vert{}\) p$_n$ \(\to\) expr$_n$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
Patterns could or could not be exhaustive.
|
|
|
|
Pattern matching code could also be written using the more compact form:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
function\\
|
|
\(\vert{}\) pattern$_1$ \(\to\) expr$_1$\\
|
|
\(\vert{}\) pattern$_2$ when guard \(\to\) expr$_2$\\
|
|
\(\vert{}\) pattern$_3$ as var \(\to\) expr$_3$\\
|
|
$\vdots$\\
|
|
\(\vert{}\) p$_n$ \(\to\) expr$_n$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
|
|
This BNF grammar describes formally the grammar of the source program:
|
|
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
start ::= "match" id "with" patterns \(\vert{}\) "function" patterns\\
|
|
patterns ::= (pattern0\(\vert{}\)pattern1) pattern1+\\
|
|
;; pattern0 and pattern1 are needed to distinguish the first case in which\\
|
|
;; we can avoid writing the optional vertical line\\
|
|
pattern0 ::= clause\\
|
|
pattern1 ::= "\(\vert{}\)" clause\\
|
|
clause ::= lexpr "->" rexpr\\
|
|
lexpr ::= rule ($\varepsilon$\(\vert{}\)condition)\\
|
|
rexpr ::= \_code ;; arbitrary code\\
|
|
rule ::= wildcard\(\vert{}\)variable\(\vert{}\)constructor\_pattern\(\vert{}\) or\_pattern ;;\\
|
|
wildcard ::= "\_"\\
|
|
variable ::= identifier\\
|
|
constructor\_pattern ::= constructor (rule\(\vert{}\)$\varepsilon$) (assignment\(\vert{}\)$\varepsilon$)\\
|
|
constructor ::= int\(\vert{}\)float\(\vert{}\)char\(\vert{}\)string\(\vert{}\)bool \(\vert{}\)unit\(\vert{}\)record\(\vert{}\)exn\(\vert{}\)objects\(\vert{}\)ref \(\vert{}\)list\(\vert{}\)tuple\(\vert{}\)array\(\vert{}\)variant\(\vert{}\)parameterized\_variant ;; data types\\
|
|
or\_pattern ::= rule ("\(\vert{}\)" wildcard\(\vert{}\)variable\(\vert{}\)constructor\_pattern)+\\
|
|
condition ::= "when" bexpr\\
|
|
assignment ::= "as" id\\
|
|
bexpr ::= \_code ;; arbitrary code\\
|
|
\end{tabular}
|
|
\end{center}
|
|
The source program is parsed using the ocaml-compiler-libs library.
|
|
The result of parsing, when successful, results in a list of clauses
|
|
and a list of type declarations.
|
|
Every clause consists of three objects: a left-hand-side that is the
|
|
kind of pattern expressed, an option guard and a right-hand-side expression.
|
|
Patterns are encoded in the following way:
|
|
\begin{center}
|
|
\begin{tabular}{ll}
|
|
pattern & type\\
|
|
\hline
|
|
\_ & Wildcard\\
|
|
p$_1$ as x & Assignment\\
|
|
c(p$_1$,p$_2$,\ldots{},p$_n$) & Constructor\\
|
|
(p$_1$\(\vert{}\) p$_2$) & Orpat\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
|
|
Once parsed, the type declarations and the list of clauses are encoded in the form of a matrix
|
|
that is later evaluated using a matrix decomposition algorithm.
|
|
|
|
Patterns are of the form
|
|
\begin{center}
|
|
\begin{tabular}{ll}
|
|
pattern & type of pattern\\
|
|
\hline
|
|
\_ & wildcard\\
|
|
x & variable\\
|
|
c(p$_1$,p$_2$,\ldots{},p$_n$) & constructor pattern\\
|
|
(p$_1$\(\vert{}\) p$_2$) & or-pattern\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
The pattern \emph{p} matches a value \emph{v}, written as p $\preccurlyeq$ v, when one of the
|
|
following rules apply
|
|
|
|
\begin{center}
|
|
\begin{tabular}{llll}
|
|
\hline
|
|
\_ & $\preccurlyeq$ & v & $\forall$v\\
|
|
x & $\preccurlyeq$ & v & $\forall$v\\
|
|
(p$_1$ \(\vert{}\) p$_2$) & $\preccurlyeq$ & v & iff p$_1$ $\preccurlyeq$ v or p$_2$ $\preccurlyeq$ v\\
|
|
c(p$_1$, p$_2$, \ldots{}, p$_a$) & $\preccurlyeq$ & c(v$_1$, v$_2$, \ldots{}, v$_a$) & iff (p$_1$, p$_2$, \ldots{}, p$_a$) $\preccurlyeq$ (v$_1$, v$_2$, \ldots{}, v$_a$)\\
|
|
(p$_1$, p$_2$, \ldots{}, p$_a$) & $\preccurlyeq$ & (v$_1$, v$_2$, \ldots{}, v$_a$) & iff p$_i$ $\preccurlyeq$ v$_i$ $\forall$i $\in $ [1..a]\\
|
|
\hline
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
When a value \emph{v} matches pattern \emph{p} we say that \emph{v} is an \emph{instance} of \emph{p}.
|
|
|
|
|
|
During compilation by the translators, expressions at the
|
|
right-hand-side are compiled into
|
|
Lambda code and are referred as lambda code actions l$_i$.
|
|
|
|
We define the \emph{pattern matrix} P as the matrix |m x n| where m bigger
|
|
or equal than the number of clauses in the source program and n is
|
|
equal to the arity of the constructor with the gratest arity.
|
|
\begin{equation*}
|
|
P =
|
|
\begin{pmatrix}
|
|
p_{1,1} & p_{1,2} & \cdots & p_{1,n} \\
|
|
p_{2,1} & p_{2,2} & \cdots & p_{2,n} \\
|
|
\vdots & \vdots & \ddots & \vdots \\
|
|
p_{m,1} & p_{m,2} & \cdots & p_{m,n} )
|
|
\end{pmatrix}
|
|
\end{equation*}
|
|
every row of \emph{P} is called a pattern vector
|
|
\(\vec{p_i}\) = (p$_1$, p$_2$, \ldots{}, p$_n$); In every instance of P pattern
|
|
vectors appear normalized on the length of the longest pattern vector.
|
|
Considering the pattern matrix P we say that the value vector
|
|
\(\vec{v}\) = (v$_1$, v$_2$, \ldots{}, v$_i$) matches the pattern vector p$_i$ in P if and only if the following two
|
|
conditions are satisfied:
|
|
\begin{itemize}
|
|
\item p\(_{\text{i,1}}\), p\(_{\text{i,2}}\), \(\cdots{}\), p\(_{\text{i,n}}\) $\preccurlyeq$ (v$_1$, v$_2$, \ldots{}, v$_i$)
|
|
\item $\forall$j < i p\(_{\text{j,1}}\), p\(_{\text{j,2}}\), \(\cdots{}\), p\(_{\text{j,n}}\) $\npreceq$ (v$_1$, v$_2$, \ldots{}, v$_i$)
|
|
\end{itemize}
|
|
|
|
We can define the following three relations with respect to patterns:
|
|
\begin{itemize}
|
|
\item Pattern p is less precise than pattern q, written p $\preccurlyeq$ q, when all
|
|
instances of q are instances of p
|
|
\item Pattern p and q are equivalent, written p $\equiv$ q, when their instances
|
|
are the same
|
|
\item Patterns p and q are compatible when they share a common instance
|
|
\end{itemize}
|
|
|
|
Wit the support of two auxiliary functions
|
|
\begin{itemize}
|
|
\item tail of an ordered family
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
tail((x$_i$)\(^{\text{i $\in $ I}}\)) := (x$_i$)\(^{\text{i $\ne$ min(I)}}\)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\item first non-$\bot$ element of an ordered family
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
First((x$_i$)$^i$) := $\bot$ \quad \quad \quad if $\forall$i, x$_i$ = $\bot$\\
|
|
First((x$_i$)$^i$) := x\_min\{i \(\vert{}\) x$_i$ $\ne$ $\bot$\} \quad if $\exists$i, x$_i$ $\ne$ $\bot$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\end{itemize}
|
|
we now define what it means to run a pattern row against a value
|
|
vector of the same length, that is (p$_i$)$^i$(v$_i$)$^i$
|
|
\begin{center}
|
|
\begin{tabular}{lll}
|
|
p$_i$ & v$_i$ & result\(_{\text{pat}}\)\\
|
|
\hline
|
|
$\varnothing$ & ($\varnothing$) & []\\
|
|
(\_, tail(p$_i$)$^i$) & (v$_i$) & tail(p$_i$)$^i$(tail(v$_i$)$^i$)\\
|
|
(x, tail(p$_i$)$^i$) & (v$_i$) & $\sigma$[x$\mapsto$v$_0$] if tail(p$_i$)$^i$(tail(v$_i$)$^i$) = $\sigma$\\
|
|
(K(q$_j$)$^j$, tail(p$_i$)$^i$) & (K(v'$_j$)$^j$,tail(v$_j$)$^j$) & ((q$_j$)$^j$ \sout{+} tail(p$_i$)$^i$)((v'$_j$)$^j$ \sout{+} tail(v$_i$)$^i$)\\
|
|
(K(q$_j$)$^j$, tail(p$_i$)$^i$) & (K'(v'$_l$)$^l$,tail(v$_j$)$^j$) & $\bot$ if K $\ne$ K'\\
|
|
(q$_1$\(\vert{}\)q$_2$, tail(p$_i$)$^i$) & (v$_i$)$^i$ & First((q$_1$,tail(p$_i$)$^i$)(v$_i$)$^i$, (q$_2$,tail(p$_i$)$^i$)(v$_i$)$^i$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
A source program t$_S$ is a collection of pattern clauses pointing to
|
|
\emph{bb} terms. Running a program t$_S$ against an input value v$_S$, written
|
|
t$_S$(v$_S$) produces a result \emph{r} belonging to the following grammar:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
t$_S$ ::= (p $\to$ bb)\(^{\text{i$\in $I}}\)\\
|
|
t$_S$(v$_S$) $\to$ r\\
|
|
r ::= guard list * (Match bb \(\vert{}\) NoMatch \(\vert{}\) Absurd)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\begin{comment}
|
|
TODO: understand how to explain guard
|
|
\end{comment}
|
|
We can define what it means to run an input value v$_S$ against a source
|
|
program t$_S$:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
t$_S$(v$_S$) := $\bot$ \quad \quad \quad if $\forall$i, p$_i$(v$_S$) = $\bot$\\
|
|
First((x$_i$)$^i$) := x\_min\{i \(\vert{}\) x$_i$ $\ne$ $\bot$\} \quad if $\exists$i, x$_i$ $\ne$ $\bot$\\
|
|
t$_S$(v$_S$) = Absurd if bb\_min\{p$_i$ $\to$ bb$_i$ \(\vert{}\) p$_i$(v$_S$) $\ne$ $\bot$\} = \emph{refutation clause}\\
|
|
t$_S$(v$_S$) = Match bb\_min\{p$_i$ $\to$ bb$_i$ \(\vert{}\) p$_i$(v$_S$) $\ne$ $\bot$\}\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
[ \ldots{} ] Big part that I think doesn't need revision from you [ \ldots{} ]
|
|
|
|
In our prototype we make use of accessors to encode stored values.
|
|
|
|
\begin{minipage}{0.6\linewidth}
|
|
\begin{verbatim}
|
|
|
|
let value = 1 :: 2 :: 3 :: []
|
|
(* that can also be written *)
|
|
let value = []
|
|
|> List.cons 3
|
|
|> List.cons 2
|
|
|> List.cons 1
|
|
\end{verbatim}
|
|
\end{minipage}
|
|
\hfill
|
|
\begin{minipage}{0.5\linewidth}
|
|
\begin{verbatim}
|
|
|
|
|
|
(field 0 x) = 1
|
|
(field 0 (field 1 x)) = 2
|
|
(field 0 (field 1 (field 1 x)) = 3
|
|
(field 0 (field 1 (field 1 (field 1 x)) = []
|
|
\end{verbatim}
|
|
\end{minipage}
|
|
An \emph{accessor} \emph{a} represents the
|
|
access path to a value that can be reached by deconstructing the
|
|
scrutinee.
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
a ::= Here \(\vert{}\) n.a\\
|
|
\end{tabular}
|
|
\end{center}
|
|
The above example, in encoded form:
|
|
\begin{verbatim}
|
|
Here = 1
|
|
1.Here = 2
|
|
1.1.Here = 3
|
|
1.1.1.Here = []
|
|
\end{verbatim}
|
|
In our prototype the source matrix m$_S$ is defined as follows
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
SMatrix m$_S$ := (a$_j$)\(^{\text{j$\in $J}}\), ((p\(_{\text{ij}}\))\(^{\text{j$\in $J}}\) $\to$ bb$_i$)\(^{\text{i$\in $I}}\)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
Source matrices are used to build source decision trees C$_S$.
|
|
A decision tree is defined as either a Leaf, a Failure terminal or
|
|
an intermediate node with different children sharing the same accessor \emph{a}
|
|
and an optional fallback.
|
|
Failure is emitted only when the patterns don't cover the whole set of
|
|
possible input values \emph{S}. The fallback is not needed when the user
|
|
doesn't use a wildcard pattern.
|
|
\%\%\% Give example of thing
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
C$_S$ ::= Leaf bb \(\vert{}\) Switch(a, (K$_i$ $\to$ C$_i$)\(^{\text{i$\in $S}}\) , C?) \(\vert{}\) Failure \(\vert{}\) Unreachable\\
|
|
v$_S$ ::= K(v$_i$)\(^{\text{i$\in $I}}\) \(\vert{}\) n $\in $ $\mathbb{N}$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\begin{comment}
|
|
Are K and Here clear here?
|
|
\end{comment}
|
|
We say that a translation of a source program to a decision tree
|
|
is correct when for every possible input, the source program and its
|
|
respective decision tree produces the same result
|
|
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
$\forall$v$_S$, t$_S$(v$_S$) = $\llbracket$t$_S$$\rrbracket$$_S$(v$_S$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
We define the decision tree of source programs
|
|
$\llbracket$t$_S$$\rrbracket$
|
|
in terms of the decision tree of pattern matrices
|
|
$\llbracket$m$_S$$\rrbracket$
|
|
by the following:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
$\llbracket$((p$_i$ $\to$ bb$_i$)\(^{\text{i$\in $I}}\)$\rrbracket$ := $\llbracket$(Here), (p$_i$ $\to$ bb$_i$)\(^{\text{i$\in $I}}\) $\rrbracket$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
Decision tree computed from pattern matrices respect the following invariant:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
$\forall$v (v$_i$)\(^{\text{i$\in $I}}\) = v(a$_i$)\(^{\text{i$\in $I}}\) $\to$ $\llbracket$m$\rrbracket$(v) = m(v$_i$)\(^{\text{i$\in $I}}\) for m = ((a$_i$)\(^{\text{i$\in $I}}\), (r$_i$)\(^{\text{i$\in $I}}\))\\
|
|
v(Here) = v\\
|
|
K(v$_i$)$^i$(k.a) = v$_k$(a) if k $\in $ [0;n[\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\begin{comment}
|
|
TODO: EXPLAIN
|
|
\end{comment}
|
|
|
|
We proceed to show the correctness of the invariant by a case
|
|
analysys.
|
|
|
|
Base cases:
|
|
\begin{enumerate}
|
|
\item\relax [| $\varnothing$, ($\varnothing$ $\to$ bb$_i$)$^i$ |] $\equiv$ Leaf bb$_i$ where i := min(I), that is a
|
|
decision tree [|ms|] defined by an empty accessor and empty
|
|
patterns pointing to blackboxes \emph{bb$_i$}. This respects the invariant
|
|
because a source matrix in the case of empty rows returns
|
|
the first expression and (Leaf bb)(v) := Match bb
|
|
\item\relax [| (a$_j$)$^j$, $\varnothing$ |] $\equiv$ Failure
|
|
\end{enumerate}
|
|
Regarding non base cases:
|
|
Let's first define some auxiliary functions
|
|
\begin{itemize}
|
|
\item The index family of a constructor
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
Idx(K) := [0; arity(K)[\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\item head of an ordered family (we write x for any object here, value, pattern etc.)
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
head((x$_i$)\(^{\text{i $\in $ I}}\)) = x\_min(I)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\item tail of an ordered family
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
tail((x$_i$)\(^{\text{i $\in $ I}}\)) := (x$_i$)\(^{\text{i $\ne$ min(I)}}\)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\item head constructor of a value or pattern
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
constr(K(x$_i$)$^i$) = K\\
|
|
constr(\_) = $\bot$\\
|
|
constr(x) = $\bot$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\item first non-$\bot$ element of an ordered family
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
First((x$_i$)$^i$) := $\bot$ \quad \quad \quad if $\forall$i, x$_i$ = $\bot$\\
|
|
First((x$_i$)$^i$) := x\_min\{i \(\vert{}\) x$_i$ $\ne$ $\bot$\} \quad if $\exists$i, x$_i$ $\ne$ $\bot$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\item definition of group decomposition:
|
|
\end{itemize}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
let constrs((p$_i$)\(^{\text{i $\in $ I}}\)) = \{ K \(\vert{}\) K = constr(p$_i$), i $\in $ I \}\\
|
|
let Groups(m) where m = ((a$_i$)$^i$ ((p$_i$$_j$)$^i$ $\to$ e$_j$)$^i$$^j$) =\\
|
|
\quad \quad let (K$_k$)$^k$ = constrs(p$_i$$_0$)$^i$ in\\
|
|
\quad \quad ( K$_k$ $\to$\\
|
|
\quad \quad \quad \quad ((a$_0$.$_l$)$^l$ \sout{+} tail(a$_i$)$^i$)\\
|
|
\quad \quad \quad \quad (\\
|
|
\quad \quad \quad \quad if p$_o$$_j$ is K$_k$(q$_l$) then\\
|
|
\quad \quad \quad \quad \quad \quad (q$_l$)$^l$ \sout{+} tail(p$_i$$_j$)$^i$ $\to$ e$_j$\\
|
|
\quad \quad \quad \quad if p$_o$$_j$ is \_ then\\
|
|
\quad \quad \quad \quad \quad \quad (\_)$^l$ \sout{+} tail(p$_i$$_j$)$^i$ $\to$ e$_j$\\
|
|
\quad \quad \quad \quad else $\bot$\\
|
|
\quad \quad \quad \quad )$^j$\\
|
|
\quad \quad ), (\\
|
|
\quad \quad \quad \quad tail(a$_i$)$^i$, (tail(p$_i$$_j$)$^i$ $\to$ e$_j$ if p$_0$$_j$ is \_ else $\bot$)$^j$\\
|
|
\quad \quad )\\
|
|
\end{tabular}
|
|
\end{center}
|
|
Groups(m) is an auxiliary function that decomposes a matrix m into
|
|
submatrices, according to the head constructor of their first pattern.
|
|
Groups(m) returns one submatrix m\_r for each head constructor K that
|
|
occurs on the first row of m, plus one "wildcard submatrix" m\(_{\text{wild}}\)
|
|
that matches on all values that do not start with one of those head
|
|
constructors.
|
|
Intuitively, m is equivalent to its decomposition in the following
|
|
sense: if the first pattern of an input vector (v\_i)\^{}i starts with one
|
|
of the head constructors K$_k$, then running (v\_i)\^{}i against m is the same
|
|
as running it against the submatrix m\(_{\text{K$_k$}}\); otherwise (its head
|
|
constructor $\notin$ (K$_k$)$^k$) it is equivalent to running it against
|
|
the wildcard submatrix.
|
|
|
|
We formalize this intuition as follows
|
|
\begin{enumerate}
|
|
\item Lemma (Groups):
|
|
\label{sec:org3d5a08a}
|
|
Let \emph{m} be a matrix with
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
Groups(m) = (k\_r \(\to\) m\_r)\^{}k, m\(_{\text{wild}}\)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
For any value vector \((v_i)^l\) such that \(v_0 = k(v'_l)^l\) for some
|
|
constructor k,
|
|
we have:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
if k = k$_k$ \text{ for some k then}\\
|
|
\quad m(v$_i$)$^i$ = m$_k$((v\(_{\text{l}}\)')$^l$ \sout{+} (v\(_{\text{i}}\))\(^{\text{i$\in $I}\DZ}\))\\
|
|
\text{else}\\
|
|
\quad m(v$_i$)$^i$ = m\(_{\text{wild}}\)(v$_i$)\(^{\text{i$\in $I}\DZ}\)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
\begin{comment}
|
|
TODO: fix \{0}
|
|
\end{comment}
|
|
|
|
\item Proof:
|
|
\label{sec:org416e0ec}
|
|
Let \emph{m} be a matrix ((a$_i$)$^i$, ((p$_i$$_j$)$^i$ $\to$ e$_j$)$^j$) with
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
Groups(m) = (K$_k$ $\to$ m$_k$)$^k$, m\(_{\text{wild}}\)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
Below we are going to assume that m is a simplified matrix such that
|
|
the first row does not contain an or-pattern or a binding to a
|
|
variable.
|
|
|
|
Let (v$_i$)$^i$ be an input matrix with v$_0$ = K$_v$(v'\(_{\text{l}}\))$^l$ for some constructor K$_v$.
|
|
We have to show that:
|
|
\begin{itemize}
|
|
\item if K$_k$ = K$_v$ for some K$_k$ $\in $ constrs(p$_0$$_j$)$^j$, then
|
|
m(v$_i$)$^i$ = m$_k$((v'$_l$)$^l$ \sout{+} tail(v$_i$)$^i$)
|
|
\item otherwise
|
|
m(v$_i$)$^i$ = m\(_{\text{wild}}\)(tail(v$_i$)$^i$)
|
|
\end{itemize}
|
|
Let us call (r$_k$$_j$) the j-th row of the submatrix m$_k$, and r$_j$\(_{\text{wild}}\)
|
|
the j-th row of the wildcard submatrix m\(_{\text{wild}}\).
|
|
|
|
Our goal contains same-behavior equalities between matrices, for
|
|
a fixed input vector (v$_i$)$^i$. It suffices to show same-behavior
|
|
equalities between each row of the matrices for this input
|
|
vector. We show that for any j,
|
|
\begin{itemize}
|
|
\item if K$_k$ = K$_v$ for some K$_k$ $\in $ constrs(p$_0$$_j$)$^j$, then
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
(p$_i$$_j$)$^i$(v$_i$)$^i$ = r$_k$$_j$((v'$_l$)$^l$ \sout{+} tail(v$_i$)$^i$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\item otherwise
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
(p$_i$$_j$)$^i$(v$_i$)$^i$ = r$_j$\(_{\text{wild}}\) tail(v$_i$)$^i$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\end{itemize}
|
|
In the first case (K$_v$ is K$_k$ for some K$_k$ $\in $ constrs(p$_0$$_j$)$^j$), we
|
|
have to prove that
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
(p$_i$$_j$)$^i$(v$_i$)$^i$ = r$_k$$_j$((v'$_l$)$^l$ \sout{+} tail(v$_i$)$^i$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
By definition of m$_k$ we know that r$_k$$_j$ is equal to
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
if p$_o$$_j$ is K$_k$(q$_l$) then\\
|
|
\quad (q$_l$)$^l$ \sout{+} tail(p$_i$$_j$)$^i$ $\to$ e$_j$\\
|
|
if p$_o$$_j$ is \_ then\\
|
|
\quad (\_)$^l$ \sout{+} tail(p$_i$$_j$)$^i$ $\to$ e$_j$\\
|
|
else $\bot$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\begin{comment}
|
|
Maybe better as a table?
|
|
| p$_o$$_j$ | r$_k$$_j$ |
|
|
|--------+---------------------------|
|
|
| K$_k$(q$_l$) | (q$_l$)$^l$ +++ tail(p$_i$$_j$)$^i$ $\to$ e$_j$ |
|
|
| _ | (_)$^l$ +++ tail(p$_i$$_j$)$^i$ $\to$ e$_j$ |
|
|
| else | $\bot$ |
|
|
\end{comment}
|
|
By definition of (p$_i$)$^i$(v$_i$)$^i$ we know that (p$_i$$_j$)$^i$(v$_i$) is equal to
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
(K(q$_j$)$^j$, tail(p$_i$$_j$)$^i$) (K(v'$_l$)$^l$,tail(v$_i$)$^i$) := ((q$_j$)$^j$ \sout{+} tail(p$_i$$_j$)$^i$)((v'$_l$)$^l$ \sout{+} tail(v$_i$)$^i$)\\
|
|
(\_, tail(p$_i$$_j$)$^i$) (v$_i$) := tail(p$_i$$_j$)$^i$(tail(v$_i$)$^i$)\\
|
|
(K(q$_j$)$^j$, tail(p$_i$$_j$)$^i$) (K'(v'$_l$)$^l$,tail(v$_j$)$^j$) := $\bot$ if K $\ne$ K'\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
We prove this first case by a second case analysis on p$_0$$_j$.
|
|
|
|
TODO
|
|
|
|
In the second case (K$_v$ is distinct from K$_k$ for all K$_k$ $\in $ constrs(p$_o$$_j$)$^j$),
|
|
we have to prove that
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
(p$_i$$_j$)$^i$(v$_i$)$^i$ = r$_j$\(_{\text{wild}}\) tail(v$_i$)$^i$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
TODO
|
|
\end{enumerate}
|
|
|
|
|
|
\subsection{Target translation}
|
|
\label{sec:org7394f38}
|
|
|
|
The target program of the following general form is parsed using a parser
|
|
generated by Menhir, a LR(1) parser generator for the OCaml programming language.
|
|
Menhir compiles LR(1) a grammar specification, in this case a subset
|
|
of the Lambda intermediate language, down to OCaml code.
|
|
This is the grammar of the target language (TODO: check menhir grammar)
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
start ::= sexpr\\
|
|
sexpr ::= variable \(\vert{}\) string \(\vert{}\) "(" special\_form ")"\\
|
|
string ::= "$\backslash$"" identifier "$\backslash$"" ;; string between doublequotes\\
|
|
variable ::= identifier\\
|
|
special\_form ::= let\(\vert{}\)catch\(\vert{}\)if\(\vert{}\)switch\(\vert{}\)switch-star\(\vert{}\)field\(\vert{}\)apply\(\vert{}\)isout\\
|
|
let ::= "let" assignment sexpr ;; (assignment sexpr)+ outside of pattern match code\\
|
|
assignment ::= "function" variable variable+ ;; the first variable is the identifier of the function\\
|
|
field ::= "field" digit variable\\
|
|
apply ::= ocaml\_lambda\_code ;; arbitrary code\\
|
|
catch ::= "catch" sexpr with sexpr\\
|
|
with ::= "with" "(" label ")"\\
|
|
exit ::= "exit" label\\
|
|
switch-star ::= "switch*" variable case*\\
|
|
switch::= "switch" variable case* "default:" sexpr\\
|
|
case ::= "case" casevar ":" sexpr\\
|
|
casevar ::= ("tag"\(\vert{}\)"int") integer\\
|
|
if ::= "if" bexpr sexpr sexpr\\
|
|
bexpr ::= "(" ("!="\(\vert{}\)"\texttt{="\textbackslash{}vert\{\}">}"\(\vert{}\)"<="\(\vert{}\)">"\(\vert{}\)"<") sexpr digit \(\vert{}\) field ")"\\
|
|
label ::= integer\\
|
|
\end{tabular}
|
|
\end{center}
|
|
The prototype doesn't support strings.
|
|
|
|
The AST built by the parser is traversed and evaluated by the symbolic
|
|
execution engine.
|
|
Given that the target language supports jumps in the form of "catch - exit"
|
|
blocks the engine tries to evaluate the instructions inside the blocks
|
|
and stores the result of the partial evaluation into a record.
|
|
When a jump is encountered, the information at the point allows to
|
|
finalize the evaluation of the jump block.
|
|
In the environment the engine also stores bindings to values and
|
|
functions.
|
|
Integer additions and subtractions are simplified in a second pass.
|
|
The result of the symbolic evaluation is a target decision tree C$_T$
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
C$_T$ ::= Leaf bb \(\vert{}\) Switch(a, ($\pi$$_i$ $\to$ C$_i$)\(^{\text{i$\in $S}}\) , C?) \(\vert{}\) Failure\\
|
|
v$_T$ ::= Cell(tag $\in $ $\mathbb{N}$, (v$_i$)\(^{\text{i$\in $I}}\)) \(\vert{}\) n $\in $ $\mathbb{N}$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
Every branch of the decision tree is "constrained" by a domain
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
Domain $\pi$ = \{ n\(\vert{}\)n$\in $$\mathbb{N}$ x n\(\vert{}\)n$\in $Tag$\subseteq$$\mathbb{N}$ \}\\
|
|
\end{tabular}
|
|
\end{center}
|
|
Intuitively, the $\pi$ condition at every branch tells us the set of
|
|
possible values that can "flow" through that path.
|
|
$\pi$ conditions are refined by the engine during the evaluation; at the
|
|
root of the decision tree the domain corresponds to the set of
|
|
possible values that the type of the function can hold.
|
|
C? is the fallback node of the tree that is taken whenever the value
|
|
at that point of the execution can't flow to any other subbranch.
|
|
Intuitively, the $\pi$\(_{\text{fallback}}\) condition of the C? fallback node is
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
$\pi$\(_{\text{fallback}}\) = ¬\bigcup\limits\(_{\text{i$\in $I}}\)$\pi$$_i$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
The fallback node can be omitted in the case where the domain of the
|
|
children nodes correspond to set of possible values pointed by the
|
|
accessor at that point of the execution
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
domain(v$_S$(a)) = \bigcup\limits\(_{\text{i$\in $I}}\)$\pi$$_i$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
We say that a translation of a target program to a decision tree
|
|
is correct when for every possible input, the target program and its
|
|
respective decision tree produces the same result
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
$\forall$v$_T$, t$_T$(v$_T$) = $\llbracket$t$_T$$\rrbracket$$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
|
|
|
|
\subsection{Equivalence checking}
|
|
\label{sec:org7194f33}
|
|
|
|
The equivalence checking algorithm takes as input a domain of
|
|
possible values \emph{S} and a pair of source and target decision trees and
|
|
in case the two trees are not equivalent it returns a counter example.
|
|
Our algorithm respects the following correctness statement:
|
|
|
|
\begin{comment}
|
|
TODO: we have to define what \coversTEX mean for readers to understand the specifications
|
|
(or we use a simplifying lie by hiding \coversTEX in the statements).
|
|
\end{comment}
|
|
|
|
\begin{align*}
|
|
\EquivTEX S {C_S} {C_T} \emptyqueue = \YesTEX \;\land\; \coversTEX {C_T} S
|
|
& \implies
|
|
\forall v_S \approx v_T \in S,\; C_S(v_S) = C_T(v_T)
|
|
\\
|
|
\EquivTEX S {C_S} {C_T} \emptyqueue = \NoTEX {v_S} {v_T} \;\land\; \coversTEX {C_T} S
|
|
& \implies
|
|
v_S \approx v_T \in S \;\land\; C_S(v_S) \neq C_T(v_T)
|
|
\end{align*}
|
|
Our equivalence-checking algorithm \(\EquivTEX S {C_S} {C_T} G\) is
|
|
a exactly decision procedure for the provability of the judgment
|
|
\((\EquivTEX S {C_S} {C_T} G)\), defined below.
|
|
\begin{mathpar}
|
|
\begin{array}{l@{~}r@{~}l}
|
|
& & \text{\emph{constraint trees}} \\
|
|
C & \bnfeq & \Leaf t \\
|
|
& \bnfor & \Failure \\
|
|
& \bnfor & \Switch a {\Fam i {\pi_i, C_i}} \Cfb \\
|
|
& \bnfor & \Guard t {C_0} {C_1} \\
|
|
\end{array}
|
|
|
|
\begin{array}{l@{~}r@{~}l}
|
|
& & \text{\emph{boolean result}} \\
|
|
b & \in & \{ 0, 1 \} \\[0.5em]
|
|
& & \text{\emph{guard queues}} \\
|
|
G & \bnfeq & (t_1 = b_1), \dots, (t_n = b_n) \\
|
|
\end{array}
|
|
|
|
\begin{array}{l@{~}r@{~}l}
|
|
& & \text{\emph{input space}} \\
|
|
S & \subseteq & \{ (v_S, v_T) \mid \vrel {v_S} {v_T} \} \\
|
|
\end{array}
|
|
\\
|
|
\infer{ }
|
|
{\EquivTEX \emptyset {C_S} {C_T} G}
|
|
|
|
\infer{ }
|
|
{\EquivTEX S \Failure \Failure \emptyqueue}
|
|
|
|
\infer
|
|
{\trel {t_S} {t_T}}
|
|
{\EquivTEX S {\Leaf {t_S}} {\Leaf {t_T}} \emptyqueue}
|
|
|
|
\infer
|
|
{\forall i,\;
|
|
\EquivTEX
|
|
{(S \land a = K_i)}
|
|
{C_i} {\trim {C_T} {a = K_i}} G
|
|
\\
|
|
\EquivTEX
|
|
{(S \land a \notin \Fam i {K_i})}
|
|
\Cfb {\trim {C_T} {a \notin \Fam i {K_i}}} G
|
|
}
|
|
{\EquivTEX S
|
|
{\Switch a {\Fam i {K_i, C_i}} \Cfb} {C_T} G}
|
|
|
|
\begin{comment}
|
|
% TODO explain somewhere why the judgment is not symmetric:
|
|
% we avoid defining trimming on source trees, which would
|
|
% require more expressive conditions than just constructors
|
|
\end{comment}
|
|
\infer
|
|
{C_S \in {\Leaf t, \Failure}
|
|
\\
|
|
\forall i,\; \EquivTEX {(S \land a \in D_i)} {C_S} {C_i} G
|
|
\\
|
|
\EquivTEX {(S \land a \notin \Fam i {D_i})} {C_S} \Cfb G}
|
|
{\EquivTEX S
|
|
{C_S} {\Switch a {\Fam i {D_i} {C_i}} \Cfb} G}
|
|
|
|
\infer
|
|
{\EquivTEX S {C_0} {C_T} {G, (t_S = 0)}
|
|
\\
|
|
\EquivTEX S {C_1} {C_T} {G, (t_S = 1)}}
|
|
{\EquivTEX S
|
|
{\Guard {t_S} {C_0} {C_1}} {C_T} G}
|
|
|
|
\infer
|
|
{\trel {t_S} {t_T}
|
|
\\
|
|
\EquivTEX S {C_S} {C_b} G}
|
|
{\EquivTEX S
|
|
{C_S} {\Guard {t_T} {C_0} {C_1}} {(t_S = b), G}}
|
|
\end{mathpar}
|
|
Running a program t$_S$ or its translation $\llbracket$t$_S$$\rrbracket$ against an input v$_S$
|
|
produces as a result \emph{r} in the following way:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
( $\llbracket$t$_S$$\rrbracket$$_S$(v$_S$) $\equiv$ C$_S$(v$_S$) ) $\to$ r\\
|
|
t$_S$(v$_S$) $\to$ r\\
|
|
\end{tabular}
|
|
\end{center}
|
|
Likewise
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
( $\llbracket$t$_T$$\rrbracket$$_T$(v$_T$) $\equiv$ C$_T$(v$_T$) ) $\to$ r\\
|
|
t$_T$(v$_T$) $\to$ r\\
|
|
result r ::= guard list * (Match blackbox \(\vert{}\) NoMatch \(\vert{}\) Absurd)\\
|
|
guard ::= blackbox.\\
|
|
\end{tabular}
|
|
\end{center}
|
|
Having defined equivalence between two inputs of which one is
|
|
expressed in the source language and the other in the target language,
|
|
v$_S$ $\simeq$ v$_T$, we can define the equivalence between a couple of programs or
|
|
a couple of decision trees
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
t$_S$ $\simeq$ t$_T$ := $\forall$v$_S$$\simeq$v$_T$, t$_S$(v$_S$) = t$_T$(v$_T$)\\
|
|
C$_S$ $\simeq$ C$_T$ := $\forall$v$_S$$\simeq$v$_T$, C$_S$(v$_S$) = C$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
The result of the proposed equivalence algorithm is \emph{Yes} or \emph{No(v$_S$,
|
|
v$_T$)}. In particular, in the negative case, v$_S$ and v$_T$ are a couple of
|
|
possible counter examples for which the decision trees produce a
|
|
different result.
|
|
|
|
In the presence of guards we can say that two results are
|
|
equivalent modulo the guards queue, written \emph{r$_1$ $\simeq$gs r$_2$}, when:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
(gs$_1$, r$_1$) $\simeq$gs (gs$_2$, r$_2$) $\Leftrightarrow$ (gs$_1$, r$_1$) = (gs$_2$ ++ gs, r$_2$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
We say that C$_T$ covers the input space \emph{S}, written
|
|
\emph{covers(C$_T$, S)} when every value v$_S$$\in $S is a valid input to the
|
|
decision tree C$_T$. (TODO: rephrase)
|
|
Given an input space \emph{S} and a couple of decision trees, where
|
|
the target decision tree C$_T$ covers the input space \emph{S} we can define equivalence:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S, C$_S$, C$_T$, gs) = Yes $\wedge$ covers(C$_T$, S) $\to$ $\forall$v$_S$$\simeq$v$_T$ $\in $ S, C$_S$(v$_S$) $\simeq$gs C$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
Similarly we say that a couple of decision trees in the presence of
|
|
an input space \emph{S} are \emph{not} equivalent in the following way:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S, C$_S$, C$_T$, gs) = No(v$_S$,v$_T$) $\wedge$ covers(C$_T$, S) $\to$ v$_S$$\simeq$v$_T$ $\in $ S $\wedge$ C$_S$(v$_S$) $\ne$gs C$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
Corollary: For a full input space \emph{S}, that is the universe of the
|
|
target program:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S, $\llbracket$t$_S$$\rrbracket$$_S$, $\llbracket$t$_T$$\rrbracket$$_T$, $\varnothing$) = Yes $\Leftrightarrow$ t$_S$ $\simeq$ t$_T$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
|
|
\begin{comment}
|
|
TODO: put ^i$\in $I where needed
|
|
\end{comment}
|
|
\subsubsection{The trimming lemma}
|
|
The trimming lemma allows to reduce the size of a decision tree given
|
|
an accessor \emph{a} $\to$ $\pi$ relation (TODO: expand)
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
$\forall$v$_T$ $\in $ (a$\to$$\pi$), C$_T$(v$_T$) = C\(_{\text{t/a$\to$$\pi$}}\)(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
We prove this by induction on C$_T$:
|
|
|
|
\begin{itemize}
|
|
\item C$_T$ = Leaf\(_{\text{bb}}\): when the decision tree is a leaf terminal, the result of trimming on a Leaf is the Leaf itself
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
Leaf\(_{\text{bb/a$\to$$\pi$}}\)(v) = Leaf\(_{\text{bb}}\)(v)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\item The same applies to Failure terminal
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
Failure\(_{\text{/a$\to$$\pi$}}\)(v) = Failure(v)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\item When C$_T$ = Switch(b, ($\pi$$\to$C$_i$)$^i$)\(_{\text{/a$\to$$\pi$}}\) then we look at the accessor
|
|
\emph{a} of the subtree C$_i$ and we define $\pi$$_i$' = $\pi$$_i$ if a$\ne$b else $\pi$$_i$$\cap$$\pi$ Trimming
|
|
a switch node yields the following result:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
Switch(b, ($\pi$$\to$C$_i$)\(^{\text{i$\in $I}}\))\(_{\text{/a$\to$$\pi$}}\) := Switch(b, ($\pi$'$_i$$\to$C\(_{\text{i/a$\to$$\pi$}}\))\(^{\text{i$\in $I}}\))\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\end{itemize}
|
|
\begin{comment}
|
|
TODO: understand how to properly align lists
|
|
check that every list is aligned
|
|
\end{comment}
|
|
For the trimming lemma we have to prove that running the value v$_T$ against
|
|
the decision tree C$_T$ is the same as running v$_T$ against the tree
|
|
C\(_{\text{trim}}\) that is the result of the trimming operation on C$_T$
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
C$_T$(v$_T$) = C\(_{\text{trim}}\)(v$_T$) = Switch(b, ($\pi$$_i$'$\to$C\(_{\text{i/a$\to$$\pi$}}\))\(^{\text{i$\in $I}}\))(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
We can reason by first noting that when v$_T$$\notin$(b$\to$$\pi$$_i$)$^i$ the node must be a Failure node.
|
|
In the case where $\exists$k \(\vert{}\) v$_T$$\in $(b$\to$$\pi$$_k$) then we can prove that
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
C\(_{\text{k/a$\to$$\pi$}}\)(v$_T$) = Switch(b, ($\pi$$_i$'$\to$C\(_{\text{i/a$\to$$\pi$}}\))\(^{\text{i$\in $I}}\))(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
because when a $\ne$ b then $\pi$$_k$'= $\pi$$_k$ and this means that v$_T$$\in $$\pi$$_k$'
|
|
while when a = b then $\pi$$_k$'=($\pi$$_k$$\cap$$\pi$) and v$_T$$\in $$\pi$$_k$' because:
|
|
\begin{itemize}
|
|
\item by the hypothesis, v$_T$$\in $$\pi$
|
|
\item we are in the case where v$_T$$\in $$\pi$$_k$
|
|
\end{itemize}
|
|
So v$_T$ $\in $ $\pi$$_k$' and by induction
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
C$_k$(v$_T$) = C\(_{\text{k/a$\to$$\pi$}}\)(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
We also know that $\forall$v$_T$$\in $(b$\to$$\pi$$_k$) $\to$ C$_T$(v$_T$) = C$_k$(v$_T$)
|
|
By putting together the last two steps, we have proven the trimming
|
|
lemma.
|
|
|
|
\begin{comment}
|
|
TODO: what should I say about covering??? I swap $\pi$ and $\pi$'
|
|
Covering lemma:
|
|
$\forall$a,$\pi$ covers(C$_T$,S) $\to$ covers(C_{t/a$\to$$\pi$}, (S$\cap$a$\to$$\pi$))
|
|
U$_i$$\pi$$^i$ $\approx$ U$_i$$\pi$'$\cap$(a$\to$$\pi$) $\approx$ (U$_i$$\pi$')$\cap$(a$\to$$\pi$) %%
|
|
|
|
|
|
%%%%%%% Also: Should I swap $\pi$ and $\pi$' ?
|
|
\end{comment}
|
|
|
|
\subsubsection{Equivalence checking}
|
|
The equivalence checking algorithm takes as parameters an input space
|
|
\emph{S}, a source decision tree \emph{C$_S$} and a target decision tree \emph{C$_T$}:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S, C$_S$, C$_T$) $\to$ Yes \(\vert{}\) No(v$_S$, v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
When the algorithm returns Yes and the input space is covered by \emph{C$_S$}
|
|
we can say that the couple of decision trees are the same for
|
|
every couple of source value \emph{v$_S$} and target value \emph{v$_T$} that are equivalent.
|
|
\begin{comment}
|
|
Define "covered"
|
|
Is it correct to say the same? How to correctly distinguish in words $\simeq$ and = ?
|
|
\end{comment}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S, C$_S$, C$_T$) = Yes and cover(C$_T$, S) $\to$ $\forall$ v$_S$ $\simeq$ v$_T$$\in $S $\wedge$ C$_S$(v$_S$) = C$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
In the case where the algorithm returns No we have at least a couple
|
|
of counter example values v$_S$ and v$_T$ for which the two decision trees
|
|
outputs a different result.
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S, C$_S$, C$_T$) = No(v$_S$,v$_T$) and cover(C$_T$, S) $\to$ $\forall$ v$_S$ $\simeq$ v$_T$$\in $S $\wedge$ C$_S$(v$_S$) $\ne$ C$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
We define the following
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
Forall(Yes) = Yes\\
|
|
Forall(Yes::l) = Forall(l)\\
|
|
Forall(No(v$_S$,v$_T$)::\_) = No(v$_S$,v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
There exists and are injective:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
int(k) $\in $ $\mathbb{N}$ (arity(k) = 0)\\
|
|
tag(k) $\in $ $\mathbb{N}$ (arity(k) > 0)\\
|
|
$\pi$(k) = \{n\(\vert{}\) int(k) = n\} x \{n\(\vert{}\) tag(k) = n\}\\
|
|
\end{tabular}
|
|
\end{center}
|
|
where k is a constructor.
|
|
|
|
\begin{comment}
|
|
TODO: explain:
|
|
$\forall$v$\in $a$\to$$\pi$, C_{/a$\to$$\pi$}(v) = C(v)
|
|
\end{comment}
|
|
|
|
We proceed by case analysis:
|
|
\begin{comment}
|
|
I start numbering from zero to leave the numbers as they were on the blackboard, were we skipped some things
|
|
I think the unreachable case should go at the end.
|
|
\end{comment}
|
|
\begin{enumerate}
|
|
\item in case of unreachable:
|
|
\end{enumerate}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
C$_S$(v$_S$) = Absurd(Unreachable) $\ne$ C$_T$(v$_T$) $\forall$v$_S$,v$_T$\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\begin{enumerate}
|
|
\item In the case of an empty input space
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv($\varnothing$, C$_S$, C$_T$) := Yes\\
|
|
\end{tabular}
|
|
\end{center}
|
|
and that is trivial to prove because there is no pair of values (v$_S$, v$_T$) that could be
|
|
tested against the decision trees.
|
|
In the other subcases S is always non-empty.
|
|
\item When there are \emph{Failure} nodes at both sides the result is \emph{Yes}:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S, Failure, Failure) := Yes\\
|
|
\end{tabular}
|
|
\end{center}
|
|
Given that $\forall$v, Failure(v) = Failure, the statement holds.
|
|
\item When we have a Leaf or a Failure at the left side:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S, Failure as C$_S$, Switch(a, ($\pi$$_i$ $\to$ C$_T$$_i$)\(^{\text{i$\in $I}}\))) := Forall(equiv( S$\cap$a$\to$$\pi$(k$_i$)), C$_S$, C$_T$$_i$)\(^{\text{i$\in $I}}\))\\
|
|
equiv(S, Leaf bb$_S$ as C$_S$, Switch(a, ($\pi$$_i$ $\to$ C$_T$$_i$)\(^{\text{i$\in $I}}\))) := Forall(equiv( S$\cap$a$\to$$\pi$(k$_i$)), C$_S$, C$_T$$_i$)\(^{\text{i$\in $I}}\))\\
|
|
\end{tabular}
|
|
\end{center}
|
|
Our algorithm either returns Yes for every sub-input space S$_i$ := S$\cap$(a$\to$$\pi$(k$_i$)) and
|
|
subtree C$_T$$_i$
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S$_i$, C$_S$, C$_T$$_i$) = Yes $\forall$i\\
|
|
\end{tabular}
|
|
\end{center}
|
|
or we have a counter example v$_S$, v$_T$ for which
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
v$_S$$\simeq$v$_T$$\in $S$_k$ $\wedge$ c$_S$(v$_S$) $\ne$ C$_T$$_k$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
then because
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
v$_T$$\in $(a$\to$$\pi$$_k$) $\to$ C$_T$(v$_T$) = C$_T$$_k$(v$_T$) ,\\
|
|
v$_S$$\simeq$v$_T$$\in $S $\wedge$ C$_S$(v$_S$)$\ne$C$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
we can say that
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S$_i$, C$_S$, C$_T$$_i$) = No(v$_S$, v$_T$) for some minimal k$\in $I\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\item When we have a Switch on the right we define $\pi$\(_{\text{fallback}}\) as the domain of
|
|
values not covered but the union of the constructors k$_i$
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
$\pi$\(_{\text{fallback}}\) = ¬\bigcup\limits\(_{\text{i$\in $I}}\)$\pi$(k$_i$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
Our algorithm proceeds by trimming
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S, Switch(a, (k$_i$ $\to$ C$_S$$_i$)\(^{\text{i$\in $I}}\), C\(_{\text{sf}}\)), C$_T$) :=\\
|
|
Forall(equiv( S$\cap$(a$\to$$\pi$(k$_i$)\(^{\text{i$\in $I}}\)), C$_S$$_i$, C\(_{\text{t/a$\to$$\pi$(k$_i$)}}\))\(^{\text{i$\in $I}}\) \sout{+} equiv(S$\cap$(a$\to$$\pi$$_n$), C$_S$, C\(_{\text{a$\to$$\pi$}_{\text{fallback}}}\)))\\
|
|
\end{tabular}
|
|
\end{center}
|
|
The statement still holds and we show this by first analyzing the
|
|
\emph{Yes} case:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
Forall(equiv( S$\cap$(a$\to$$\pi$(k$_i$)\(^{\text{i$\in $I}}\)), C$_S$$_i$, C\(_{\text{t/a$\to$$\pi$(k$_i$)}}\))\(^{\text{i$\in $I}}\) = Yes\\
|
|
\end{tabular}
|
|
\end{center}
|
|
The constructor k is either included in the set of constructors k$_i$:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
k \(\vert{}\) k$\in $(k$_i$)$^i$ $\wedge$ C$_S$(v$_S$) = C$_S$$_i$(v$_S$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
We also know that
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
(1) C$_S$$_i$(v$_S$) = C\(_{\text{t/a$\to$$\pi$$_i$}}\)(v$_T$)\\
|
|
(2) C\(_{\text{T/a$\to$$\pi$$_i$}}\)(v$_T$) = C$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
(1) is true by induction and (2) is a consequence of the trimming lemma.
|
|
Putting everything together:
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
C$_S$(v$_S$) = C$_S$$_i$(v$_S$) = C\(_{\text{T/a$\to$$\pi$$_i$}}\)(v$_T$) = C$_T$(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
When the k$\notin$(k$_i$)$^i$ [TODO]
|
|
|
|
The auxiliary Forall function returns \emph{No(v$_S$, v$_T$)} when, for a minimum k,
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
equiv(S$_k$, C$_S$$_k$, C\(_{\text{T/a$\to$$\pi$$_k$}}\) = No(v$_S$, v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
Then we can say that
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
C$_S$$_k$(v$_S$) $\ne$ C\(_{\text{t/a$\to$$\pi$$_k$}}\)(v$_T$)\\
|
|
\end{tabular}
|
|
\end{center}
|
|
that is enough for proving that
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
C$_S$$_k$(v$_S$) $\ne$ (C\(_{\text{t/a$\to$$\pi$$_k$}}\)(v$_T$) = C$_T$(v$_T$))\\
|
|
\end{tabular}
|
|
\end{center}
|
|
\end{enumerate}
|
|
\end{document}
|
|
|