accessors

This commit is contained in:
Francesco Mecca 2020-06-08 00:08:07 +02:00
parent f317ed7c6b
commit 587e2797f5
11 changed files with 170 additions and 123 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 918 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 398 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 519 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 655 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 409 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 519 B

Binary file not shown.

View file

@ -439,7 +439,6 @@ Our algorithm respects the following correctness statement:
v_S \approx v_T \in S \;\land\; C_S(v_S) \neq C_T(v_T) v_S \approx v_T \in S \;\land\; C_S(v_S) \neq C_T(v_T)
\end{align*} \end{align*}
* Background * Background
** OCaml ** OCaml
Objective Caml (OCaml) is a dialect of the ML (Meta-Language) Objective Caml (OCaml) is a dialect of the ML (Meta-Language)
family of programming that features with other dialects of ML, such family of programming that features with other dialects of ML, such
@ -449,9 +448,10 @@ provides many advantages with respect to static type systems of traditional impe
oriented language such as C, C++ and Java, such as: oriented language such as C, C++ and Java, such as:
- Polymorphism: in certain scenarios a function can accept more than one - Polymorphism: in certain scenarios a function can accept more than one
type for the input parameters. For example a function that computes the length of a type for the input parameters. For example a function that computes the length of a
list doesn't need to inspect the type of the elements of the list and for this reason list doesn't need to inspect the type of the elements and for this reason
a List.length function can accept lists of integers, lists of strings and in general a List.length function can accept lists of integers, lists of strings and in general
lists of any type. Such languages offer polymorphic functions through subtyping at lists of any type. JVM languages offer polymorphic functions and
classes through subtyping at
runtime only, while other languages such as C++ offer polymorphism through compile runtime only, while other languages such as C++ offer polymorphism through compile
time templates and function overloading. time templates and function overloading.
With the Hindley-Milner type system each well typed function can have more than one With the Hindley-Milner type system each well typed function can have more than one
@ -459,8 +459,8 @@ oriented language such as C, C++ and Java, such as:
For example the principal type of the List.length function is "For any /a/, function from For example the principal type of the List.length function is "For any /a/, function from
list of /a/ to /int/" and /a/ is called the /type parameter/. list of /a/ to /int/" and /a/ is called the /type parameter/.
- Strong typing: Languages such as C and C++ allow the programmer to operate on data - Strong typing: Languages such as C and C++ allow the programmer to operate on data
without considering its type, mainly through pointers. Other languages such as C# without considering its type, mainly through pointers[CIT]. Other languages such as Swift
and Go allow type erasure so at runtime the type of the data can't be queried. and Java performs type erasure[CIT] so at runtime the type of the data can't be queried.
In the case of programming languages using an Hindley-Milner type system the In the case of programming languages using an Hindley-Milner type system the
programmer is not allowed to operate on data by ignoring or promoting its type. programmer is not allowed to operate on data by ignoring or promoting its type.
- Type Inference: the principal type of a well formed term can be inferred without any - Type Inference: the principal type of a well formed term can be inferred without any
@ -479,10 +479,10 @@ oriented language such as C, C++ and Java, such as:
Moreover ML languages are functional, meaning that functions are Moreover ML languages are functional, meaning that functions are
treated as first class citizens and variables are immutable, treated as first class citizens and variables are immutable,
although mutable statements and imperative constructs are permitted. although mutable statements and imperative constructs are permitted.
In addition to that features an object system, that provides In addition to that OCaml features an object system, that provides
inheritance, subtyping and dynamic binding, and modules, that inheritance, subtyping and dynamic binding, and modules, that
provide a way to encapsulate definitions. Modules are checked provide a way to encapsulate definitions. Modules are checked
statically and can be reifycated through functors. statically and can be reifycated through functors[CIT].
** Compiling OCaml code ** Compiling OCaml code
@ -496,17 +496,17 @@ is trasformed into an untyped syntax tree. Code with syntax errors is
rejected at this stage. rejected at this stage.
After that the AST is processed by the type checker that performs After that the AST is processed by the type checker that performs
three steps at once: three steps at once:
- type inference, using the classical /Algorithm W/ - type inference, using the classical /Algorithm W/[CIT]
- perform subtyping and gathers type information from the module system - perform subtyping and gathers type information from the module system
- ensures that the code obeys the rule of the OCaml type system - ensures that the code obeys the rule of the OCaml type system
At this stage, incorrectly typed code is rejected. In case of success, At this stage, incorrectly typed code is rejected. In case of success,
the untyped AST in trasformed into a /Typed Tree/. the untyped AST in trasformed into a /Typed Tree/.
After the typechecker has proven that the program is type safe, After the typechecker has proven that the program is type safe,
the compiler lower the code to /Lambda/, an s-expression based the compiler lower the code to /Lambda/, an s-expression based
language that assumes that its input has already been proved safe. language that assumes that its input has already been proved safe[CIT].
After the Lambda pass, the Lambda code is either translated into After the Lambda pass, the Lambda code is either translated into
bytecode or goes through a series of optimization steps performed by bytecode or goes through a series of optimization steps performed by
the /Flambda/ optimizer before being translated into assembly. the /Flambda/ optimizer[CIT] before being translated into assembly.
\begin{comment} \begin{comment}
TODO: Talk about flambda passes? TODO: Talk about flambda passes?
\end{comment} \end{comment}
@ -549,6 +549,7 @@ set to 0.
\begin{comment} \begin{comment}
https://dev.realworld.org/compiler-backend.html https://dev.realworld.org/compiler-backend.html
CITE: realworldocaml CITE: realworldocaml
[CIT]
\end{comment} \end{comment}
A Lambda code target file is produced by the compiler and consists of a A Lambda code target file is produced by the compiler and consists of a
single s-expression. Every s-expression consist of /(/, a sequence of single s-expression. Every s-expression consist of /(/, a sequence of
@ -577,7 +578,8 @@ let test = function
| Fred -> "fred" | Fred -> "fred"
#+END_SRC #+END_SRC
The Lambda output for this code can be obtained by running the The Lambda output for this code can be obtained by running the
compiler with the /-dlambda/ flag: compiler with the /-drawlambda/ flag or in a more compact form with
the /-dlambda/ flag:
#+BEGIN_SRC #+BEGIN_SRC
(setglobal Prova! (setglobal Prova!
(let (let
@ -590,9 +592,9 @@ compiler with the /-dlambda/ flag:
case int 3: "fred"))) case int 3: "fred")))
(makeblock 0 test/85))) (makeblock 0 test/85)))
#+END_SRC #+END_SRC
As outlined by the example, the /makeblock/ directive is responsible As outlined by the example, the /makeblock/ directive allows to
for allocating low level OCaml values and every constant constructor allocate low level OCaml values and every constant constructor
ot the algebraic type /t/ is stored in memory as an integer. of the algebraic type /t/ is stored in memory as an integer.
The /setglobal/ directives declares a new binding in the global scope: The /setglobal/ directives declares a new binding in the global scope:
Every concept of modules is lost at this stage of compilation. Every concept of modules is lost at this stage of compilation.
The pattern matching compiler uses a jump table to map every pattern The pattern matching compiler uses a jump table to map every pattern
@ -624,7 +626,7 @@ are mapped to cons cell that are accessed using the /tag/ directive.
case tag 1: (if (!= (field 0 param/90) 0) "fred" "baz")))) case tag 1: (if (!= (field 0 param/90) 0) "fred" "baz"))))
(makeblock 0 test/89))) (makeblock 0 test/89)))
#+END_SRC #+END_SRC
In the Lambda language are several numeric types: In the Lambda language defines several numeric types:
- integers: that us either 31 or 63 bit two's complement arithmetic - integers: that us either 31 or 63 bit two's complement arithmetic
depending on system word size, and also wrapping on overflow depending on system word size, and also wrapping on overflow
- 32 bit and 64 bit integers: that use 32-bit and 64-bit two's complement arithmetic - 32 bit and 64 bit integers: that use 32-bit and 64-bit two's complement arithmetic
@ -633,14 +635,12 @@ In the Lambda language are several numeric types:
- floats: that use IEEE754 double-precision (64-bit) arithmetic with - floats: that use IEEE754 double-precision (64-bit) arithmetic with
the addition of the literals /infinity/, /neg_infinity/ and /nan/. the addition of the literals /infinity/, /neg_infinity/ and /nan/.
\\ \\
The are various numeric operations defined: The are various numeric operations:
- Arithmetic operations: +, -, *, /, % (modulo), neg (unary negation) - Arithmetic operations: +, -, *, /, % (modulo), neg (unary negation)
- Bitwise operations: &, |, ^, <<, >> (zero-shifting), a>> (sign extending) - Bitwise operations: &, |, ^, <<, >> (zero-shifting), a>> (sign extending)
- Numeric comparisons: <, >, <=, >=, == - Numeric comparisons: <, >, <=, >=, ==
*** Functions *** Functions
Functions are defined using the following syntax, and close over all Functions are defined using the following syntax, and close over all
bindings in scope: (lambda (arg1 arg2 arg3) BODY) bindings in scope: (lambda (arg1 arg2 arg3) BODY)
and are applied using the following syntax: (apply FUNC ARG ARG ARG) and are applied using the following syntax: (apply FUNC ARG ARG ARG)
@ -663,8 +663,7 @@ Spiega che la sintassi che supporti e` quella nella BNF
** Pattern matching ** Pattern matching
Pattern matching is a widely adopted mechanism to interact with ADT[CIT].
Pattern matching is a widely adopted mechanism to interact with ADT.
C family languages provide branching on predicates through the use of C family languages provide branching on predicates through the use of
if statements and switch statements. if statements and switch statements.
Pattern matching on the other hands express predicates through Pattern matching on the other hands express predicates through
@ -673,7 +672,7 @@ arbitrary shapes. One common example of pattern matching is the use of regular
expressions on strings. provides pattern matching on ADT and expressions on strings. provides pattern matching on ADT and
primitive data types. primitive data types.
The result of a pattern matching operation is always one of: The result of a pattern matching operation is always one of:
- this value does not match this pattern - this value does not match this pattern
- this value matches this pattern, resulting the following bindings of - this value matches this pattern, resulting the following bindings of
names to values and the jump to the expression pointed at the names to values and the jump to the expression pointed at the
pattern. pattern.
@ -687,54 +686,42 @@ match color with
| Green -> print "green" | Green -> print "green"
| _ -> print "white or black" | _ -> print "white or black"
#+END_SRC #+END_SRC
Pattern matching clauses provide tokens to express data destructoring.
provides tokens to express data destructoring.
For example we can examine the content of a list with pattern matching For example we can examine the content of a list with pattern matching
#+BEGIN_SRC #+BEGIN_SRC
begin match list with begin match list with
| [ ] -> print "empty list" | [ ] -> print "empty list"
| element1 :: [ ] -> print "one element" | element1 :: [ ] -> print "one element"
| (element1 :: element2) :: [ ] -> print "two elements" | (element1 :: element2) :: [ ] -> print "two elements"
| head :: tail-> print "head followed by many elements" | head :: tail-> print "head followed by many elements"
#+END_SRC #+END_SRC
\\
Parenthesized patterns, such as the third one in the previous example, Parenthesized patterns, such as the third one in the previous example,
matches the same value as the pattern without parenthesis. matches the same value as the pattern without parenthesis.
\\ \\
The same could be done with tuples The same could be done with tuples
#+BEGIN_SRC #+BEGIN_SRC
begin match tuple with begin match tuple with
| (Some _, Some _) -> print "Pair of optional types" | (Some _, Some _) -> print "Pair of optional types"
| (Some _, None) | (None, Some _) -> print "Pair of optional types, one of which is null" | (Some _, None) | (None, Some _) -> print "Pair of optional types, one of which is null"
| (None, None) -> print "Pair of optional types, both null" | (None, None) -> print "Pair of optional types, both null"
#+END_SRC #+END_SRC
The pattern pattern₁ | pattern₂ represents the logical "or" of the The pattern pattern₁ | pattern₂ represents the logical "or" of the
two patterns pattern₁ and pattern₂. A value matches pattern₁ | two patterns, pattern₁ and pattern₂. A value matches /pattern₁ | pattern₂/ if it matches pattern₁ or pattern₂.
pattern₂ if it matches pattern₁ or pattern₂.
\\ \\
Pattern clauses can make the use of /guards/ to test predicates and Pattern clauses can make the use of /guards/ to test predicates and
variables can captured (binded in scope). variables can captured (binded in scope).
#+BEGIN_SRC #+BEGIN_SRC
begin match token_list with begin match token_list with
| "switch"::var::"{"::rest -> ... | "switch"::var::"{"::rest -> ...
| "case"::":"::var::rest when is_int var -> ... | "case"::":"::var::rest when is_int var -> ...
| "case"::":"::var::rest when is_string var -> ... | "case"::":"::var::rest when is_string var -> ...
| "}"::[ ] -> ... | "}"::[ ] -> ...
| "}"::rest -> error "syntax error: " rest | "}"::rest -> error "syntax error: " rest
#+END_SRC #+END_SRC
\\
Moreover, the pattern matching compiler emits a warning when a Moreover, the pattern matching compiler emits a warning when a
pattern is not exhaustive or some patterns are shadowed by precedent ones. pattern is not exhaustive or some patterns are shadowed by precedent ones.
** Symbolic execution ** Symbolic execution
Symbolic execution is a widely used techniques in the field of Symbolic execution is a widely used techniques in the field of
computer security. computer security.
It allows to analyze different execution paths of a program It allows to analyze different execution paths of a program
@ -749,8 +736,12 @@ domain of all possible inputs of a program, detecting infeasible
paths, dead code and proving that two code segments are equivalent. paths, dead code and proving that two code segments are equivalent.
\\ \\
Let's take as example this signedness bug that was found in the Let's take as example this signedness bug that was found in the
FreeBSD kernel and allowed, when calling the getpeername function, to FreeBSD kernel [CIT] and allowed, when calling the /getpeername/ function, to
read portions of kernel memory. read portions of kernel memory.
\begin{comment}
TODO: controlla paginazione
\end{comment}
\clearpage
#+BEGIN_SRC #+BEGIN_SRC
int compat; int compat;
{ {
@ -775,10 +766,11 @@ bad:
return (error); return (error);
} }
#+END_SRC #+END_SRC
The tree of the execution is presented below.
The tree of the execution when the function is evaluated considering It is built by evaluating the function by consider the integer
/int len/ our symbolic variable α, sa->sa_len as symbolic variable β variable /len/ the symbolic variable α, sa->sa_len the symbolic variable β
and π as the set of constraints on a symbolic variable: and π indicates the set of constraints on the symbolic variables.
The input values to the functions are identified by σ.
[[./files/symb_exec.png]] [[./files/symb_exec.png]]
\begin{comment} \begin{comment}
[1] compat (...) { π_{α}: -∞ < α < ∞ } [1] compat (...) { π_{α}: -∞ < α < ∞ }
@ -793,15 +785,15 @@ We can see that at step 3 the set of possible values of the scrutinee
α is bigger than the set of possible values of the input σ to the α is bigger than the set of possible values of the input σ to the
/cast/ directive, that is: π_{α} ⊈ π_{σ}. For this reason the /cast/ may fail when α is /len/ /cast/ directive, that is: π_{α} ⊈ π_{σ}. For this reason the /cast/ may fail when α is /len/
negative number, outside the domain π_{σ}. In C this would trigger undefined behaviour (signed negative number, outside the domain π_{σ}. In C this would trigger undefined behaviour (signed
overflow) that made the exploitation possible. overflow) that made the exploit possible.
\\ *** Symbolic Execution in terms of Hoare Logic
Every step of evaluation can be modelled as the following transition: Every step of the evaluation in a symbolic engine can be modelled as the following transition:
\[ \[
_{σ}, (πᵢ)ⁱ) → (π'_{σ}, (π'ᵢ)ⁱ) _{σ}, (πᵢ)ⁱ) → (π'_{σ}, (π'ᵢ)ⁱ)
\] \]
if we express the π constraints as logical formulas we can model the if we express the π transitions as logical formulas we can model the
execution of the program in terms of Hoare Logic. execution of the program in terms of Hoare Logic.
State of the computation is a Hoare triple {P}C{Q} where P and Q are The state of the computation is a Hoare triple {P}C{Q} where P and Q are
respectively the /precondition/ and the /postcondition/ that respectively the /precondition/ and the /postcondition/ that
constitute the assertions of the program. C is the directive being constitute the assertions of the program. C is the directive being
executed. executed.
@ -812,7 +804,8 @@ In the Hoare rules assertions could also take the form of
| P ::= \forall i. P \vert \exists i. P \vert P_{1} \Rightarrow P_{2} | P ::= \forall i. P \vert \exists i. P \vert P_{1} \Rightarrow P_{2}
where i is a logical variable, but assertions of these kinds increases where i is a logical variable, but assertions of these kinds increases
the complexity of the symbolic engine. the complexity of the symbolic engine.
Execution follows the rules of Hoare logic: \
Execution follows the following inference rules:
- Empty statement : - Empty statement :
\begin{mathpar} \begin{mathpar}
\infer{ } \infer{ }
@ -826,7 +819,7 @@ Execution follows the rules of Hoare logic:
\end{mathpar} \end{mathpar}
\\ \\
- Composition : c₁ and c₂ are directives that are executed in order; - Composition : c₁ and c₂ are directives that are executed in order;
{Q} is called the /midcondition/. {Q} is called the /mid condition/.
\begin{mathpar} \begin{mathpar}
\infer { \{P\}c_1\{R\}, \{R\}c_2\{Q\} } \infer { \{P\}c_1\{R\}, \{R\}c_2\{Q\} }
{ \{P\}c₁;c₂\{Q\} } { \{P\}c₁;c₂\{Q\} }
@ -839,7 +832,7 @@ Execution follows the rules of Hoare logic:
\end{mathpar} \end{mathpar}
\\ \\
- Loop : {P} is the loop invariant. After the loop is finished /P/ - Loop : {P} is the loop invariant. After the loop is finished /P/
holds and ¬̸b caused the loop to end. holds and ¬b caused the loop to end.
\begin{mathpar} \begin{mathpar}
\infer { \{P \wedge b \}c\{P\} } \infer { \{P \wedge b \}c\{P\} }
{ \{P\}\text{while b do c} \{P \wedge \neg b\} } { \{P\}\text{while b do c} \{P \wedge \neg b\} }
@ -847,14 +840,14 @@ Execution follows the rules of Hoare logic:
\\ \\
Even if the semantics of symbolic execution engines are well defined, Even if the semantics of symbolic execution engines are well defined,
the user may run into different complications when applying such the user may run into different complications when applying such
analysis to non trivial codebases. analysis to non trivial codebases.[CIT]
For example, depending on the domain, loop termination is not For example, depending on the domain, loop termination is not
guaranteed. Even when termination is guaranteed, looping causes guaranteed. Even when termination is guaranteed, looping causes
exponential branching that may lead to path explosion or state exponential branching that may lead to path explosion or state
explosion. explosion.[CIT]
Reasoning about all possible executions of a program is not always Reasoning about all possible executions of a program is not always
feasible and in case of explosion usually symbolic execution engines feasible[CIT] and in case of explosion usually symbolic execution engines
implement heuristics to reduce the size of the search space. implement heuristics to reduce the size of the search space.[CIT]
** Translation validation ** Translation validation
Translators, such as translators and code generators, are huge pieces of Translators, such as translators and code generators, are huge pieces of
@ -873,10 +866,97 @@ existing translators that consists of taking the source and the target
- [ ] Translation validation through symbolic execution - [ ] Translation validation through symbolic execution
* Translation validation of the Pattern Matching Compiler * Translation validation of the Pattern Matching Compiler
** Accessors
OCaml encourages widespread usage of composite types to encapsulate
data. Composite types include /discriminated unions/, of which we have
seen different use cases, and /records/, that are a form of /product
types/ [CIT] such as /structures/ in C. \\
\begin{minipage}{0.6\linewidth}
\begin{verbatim}
struct Shape {
int centerx;
int centery;
enum ShapeKind kind;
union {
struct { int side; };
struct { int length, height; };
struct { int radius; };
};
};
\end{verbatim}
\end{minipage}
\hfill
\begin{minipage}{0.4\linewidth}
\begin{verbatim}
type shape = {
x:int;
y:int;
kind:shapekind
}
and shapekind
| Square of int
| Rect of int * int
| Circle of int
\end{verbatim}
\end{minipage}
\begin{comment}
TODO: metti linea qualcosa a dividere
\end{comment}
Primitive OCaml datatypes include aggregate types in the form of
/tuples/ and /lists/. Other aggregate types are built using module
functors. [CIT]
Low level /Lambda/ untyped constructors of the form
#+BEGIN_SRC
type t = Constant | Cell of int * t
#+END_SRC
provides enough flexibility to encode source higher kinded types.
This shouldn't surprise because the /Lambda/ language consists of
s-expressions. The /field/ operation allows to address a /Cell/ value
and the expressions /(field 0 x)/ and /(field 1 x)/ are equivalent to
the Lisp primitives /(car x)/ and /(cdr x)/ respectively.
\begin{comment}
(* that can also be written *)
let value = List.cons 1
@@ List.cons 2
@@ List.cons 3 []
\end{comment}
\begin{minipage}{0.3\linewidth}
\begin{verbatim}
let value = 1 :: 2 :: 3 :: []
\end{verbatim}
\end{minipage}
\hfill
\begin{minipage}{0.5\linewidth}
\begin{verbatim}
(field 0 x) = 1
(field 0 (field 1 x)) = 2
(field 0 (field 1 (field 1 x)) = 3
(field 0 (field 1 (field 1 (field 1 x)) = []
\end{verbatim}
\end{minipage} \\
\\
We can represent the concrete value of a higher kinded type as a flat
list of cells.
In the prototype we call this "view" into the value of a datatype the
\emph{accessor} /a/.
| a ::= Here \vert n.a
Accessors have some resemblance with the low level /Cell/ values, such
as the fact that both don't encode type informations; for example the
accessor of a list of integers is structurally equivalent to the
accessor of a tuple containing the same elements.
\\
We can intuitively think of the /accessor/ as the
access path to a value that can be reached by deconstructing the
scrutinee.
At the source level /accessors/ are constructed by inspecting the structure of
the patterns at hand.
At the target level /accessors/ are constructed by compressing the steps taken by
the symbolic engine during the evaluation of a value.
/Accessors/ don't store informations about the concrete value of the scrutinee or
it's symbolic domain of values.
** Source program ** Source program
The OCaml source code of a pattern matching function has the
The source code of a pattern matching function has the
following form: following form:
|match variable with |match variable with
@ -895,10 +975,10 @@ Pattern matching code could also be written using the more compact form:
|\vert pattern₃ as var \to expr₃ |\vert pattern₃ as var \to expr₃
|⋮ |⋮
|\vert pₙ \to exprₙ |\vert pₙ \to exprₙ
\begin{comment}
TODO: paginazione seria per BNF
\end{comment}
This BNF grammar describes formally the grammar of the source program: This BNF grammar describes formally the grammar of the source program:
| start ::= "match" id "with" patterns \vert{} "function" patterns | start ::= "match" id "with" patterns \vert{} "function" patterns
| patterns ::= (pattern0\vert{}pattern1) pattern1+ | patterns ::= (pattern0\vert{}pattern1) pattern1+
| ;; pattern0 and pattern1 are needed to distinguish the first case in which | ;; pattern0 and pattern1 are needed to distinguish the first case in which
@ -908,16 +988,17 @@ This BNF grammar describes formally the grammar of the source program:
| clause ::= lexpr "->" rexpr | clause ::= lexpr "->" rexpr
| lexpr ::= rule (ε\vert{}condition) | lexpr ::= rule (ε\vert{}condition)
| rexpr ::= _code ;; arbitrary code | rexpr ::= _code ;; arbitrary code
| rule ::= wildcard\vert{}variable\vert{}constructor_pattern\vert or_pattern ;; | rule ::= wildcard\vert{}variable\vert{}constructor_pattern\vert or_pattern
| wildcard ::= "_" | wildcard ::= "_"
| variable ::= identifier | variable ::= identifier
| constructor_pattern ::= constructor (rule\vert{}ε) (assignment\vert{}ε) | constructor_pattern ::= constructor (rule\vert{}ε) (assignment\vert{}ε)
| constructor ::= int\vert{}float\vert{}char\vert{}string\vert{}bool \vert{}unit\vert{}record\vert{}exn\vert{}objects\vert{}ref \vert{}list\vert{}tuple\vert{}array\vert{}variant\vert{}parameterized_variant ;; data types | constructor ::= int\vert{}float\vert{}char\vert{}string\vert{}bool \vert{}unit
| \quad\quad\quad\quad\quad\quad\quad \vert{}record\vert{}exn\vert{}objects\vert{}ref \vert{}list\vert{}tuple\vert{}array\vert{}variant\vert{}parameterized_variant ;; data types
| or_pattern ::= rule ("\vert{}" wildcard\vert{}variable\vert{}constructor_pattern)+ | or_pattern ::= rule ("\vert{}" wildcard\vert{}variable\vert{}constructor_pattern)+
| condition ::= "when" bexpr | condition ::= "when" b_guard
| assignment ::= "as" id | assignment ::= "as" id
| bexpr ::= _code ;; arbitrary code | b_guard ::= ocaml_expression ;; arbitrary code
The source program is parsed using the ocaml-compiler-libs library. The source program is parsed using the ocaml-compiler-libs [CIT] library.
The result of parsing, when successful, results in a list of clauses The result of parsing, when successful, results in a list of clauses
and a list of type declarations. and a list of type declarations.
Every clause consists of three objects: a left-hand-side that is the Every clause consists of three objects: a left-hand-side that is the
@ -1037,17 +1118,15 @@ program tₛ:
A sound approach for treating these blackboxes would be to inspect the A sound approach for treating these blackboxes would be to inspect the
OCaml compiler during translation to Lambda code and extract the OCaml compiler during translation to Lambda code and extract the
blackboxes compiled in their Lambda representation. blackboxes compiled in their Lambda representation.
This would allow to test for equality with the respective blackbox at This would allow to test for structural equality with the counterpart
the target level. Lambda blackboxes at the target level.
Given that this level of introspection is currently not possibile, we Given that this level of introspection is currently not possibile, we
decided to restrict the structure of blackboxes to the following (valid) OCaml decided to restrict the structure of blackboxes to the following (valid) OCaml
code: code:
#+BEGIN_SRC #+BEGIN_SRC
external guard : 'a -> 'b = "guard" external guard : 'a -> 'b = "guard"
external observe : 'a -> 'b = "observe" external observe : 'a -> 'b = "observe"
#+END_SRC #+END_SRC
We assume these two external functions /guard/ and /observe/ with a valid We assume these two external functions /guard/ and /observe/ with a valid
type that lets the user pass any number of arguments to them. type that lets the user pass any number of arguments to them.
All the guards are of the form \texttt{guard <arg> <arg> <arg>}, where the All the guards are of the form \texttt{guard <arg> <arg> <arg>}, where the
@ -1068,11 +1147,9 @@ let _ = function
We note that the right hand side of /observe/ is just an arbitrary We note that the right hand side of /observe/ is just an arbitrary
value and in this case just enumerates the order in which expressions value and in this case just enumerates the order in which expressions
appear. appear.
Even if this is an oversimplification of the problem for the This oversimplification of the structure of arbitrary code blackboxes
prototype, it must be noted that at the compiler level we have the allows us to test for structural equivalence without querying the
possibility to compile the pattern clauses in two separate steps so compiler during the /TypedTree/ to /Lambda/ translation phase.
that the guards and right-hand-side expressions are semantically equal
to their counterparts at the target program level.
\begin{lstlisting} \begin{lstlisting}
let _ = function let _ = function
| K1 -> lambda₀ | K1 -> lambda₀
@ -1133,6 +1210,7 @@ following four rules:
\end{equation*} \end{equation*}
where where
\begin{equation*} \begin{equation*}
P' \to L' =
\begin{pmatrix} \begin{pmatrix}
p_{1,2} & \cdots & p_{1,n} & → & (let & y₁ & x₁) & l₁ \\ p_{1,2} & \cdots & p_{1,n} & → & (let & y₁ & x₁) & l₁ \\
p_{2,2} & \cdots & p_{2,n} & → & (let & y₂ & x₁) & l₂ \\ p_{2,2} & \cdots & p_{2,n} & → & (let & y₂ & x₁) & l₂ \\
@ -1141,9 +1219,9 @@ following four rules:
\end{pmatrix} \end{pmatrix}
\end{equation*} \end{equation*}
That means in every lambda action lᵢ there is a binding of x₁ to the That means in every lambda action lᵢ in the P' → L' matrix there is a binding of x₁ to the
variable that appears on the pattern p_{i,1}. Bindings are omitted variable that appears on the pattern p_{i,1}. When there are
for wildcard patterns and the lambda action lᵢ remains unchanged. wildcard patterns, bindings are omitted the lambda action lᵢ remains unchanged.
2) Constructor rule: if all patterns in the first column of P are 2) Constructor rule: if all patterns in the first column of P are
constructors patterns of the form k(q₁, q₂, ..., q_{n'}) we define a constructors patterns of the form k(q₁, q₂, ..., q_{n'}) we define a
@ -1157,7 +1235,7 @@ following four rules:
p ← $q_{i,1}$, $q_{i,2}$, ..., $q_{i,n'}$, $p_{i,2}$, $p_{i,3}$, ..., $p_{i, n}$ p ← $q_{i,1}$, $q_{i,2}$, ..., $q_{i,n'}$, $p_{i,2}$, $p_{i,3}$, ..., $p_{i, n}$
\end{lstlisting} \end{lstlisting}
Patterns of the form $q_{i,j}$ matches on the values of the Patterns of the form $q_{i,j}$ matches on the values of the
constructor and we define new fresh variables y₁, y₂, ..., yₐ so constructor and we define the variables y₁, y₂, ..., yₐ so
that the lambda action lᵢ becomes that the lambda action lᵢ becomes
\begin{lstlisting}[mathescape,columns=fullflexible,basicstyle=\fontfamily{lmvtt}\selectfont,] \begin{lstlisting}[mathescape,columns=fullflexible,basicstyle=\fontfamily{lmvtt}\selectfont,]
@ -1169,38 +1247,41 @@ following four rules:
\end{lstlisting} \end{lstlisting}
and the result of the compilation for the set of constructors and the result of the compilation for the set of constructors
{c₁, c₂, ..., cₖ} is: ${\{c_{1}, c_{2}, ..., c_{k}\}}$ is:
\begin{lstlisting}[mathescape,columns=fullflexible,basicstyle=\fontfamily{lmvtt}\selectfont,] \begin{lstlisting}[mathescape,columns=fullflexible,basicstyle=\fontfamily{lmvtt}\selectfont,]
switch x₁ with switch x₁ with
case c₁: l₁ case c₁: l₁
case c₂: l₂ case c₂: l₂
... ...
case cₖ: lₖ case cₖ: lₖ
default: exit default: exit
\end{lstlisting} \end{lstlisting}
\\ \\
\begin{comment}
TODO: il tre viene cambiato in uno e la lista divisa. Risolvi.
\end{comment}
3) Orpat rule: there are various strategies for dealing with 3) Orpat rule: there are various strategies for dealing with
or-patterns. The most naive one is to split the or-patterns. or-patterns. The most naive one is to split the or-patterns.
For example a row p containing an or-pattern: For example a row p containing an or-pattern:
\begin{equation*} \begin{equation*}
(p_{i,1}|q_{i,1}|r_{i,1}), p_{i,2}, ..., p_{i,m} → l (p_{i,1}|q_{i,1}|r_{i,1}), p_{i,2}, ..., p_{i,m} → l
\end{equation*} \end{equation*}
results in three rows added to the clause matrix results in three rows added to the clause matrix
\begin{equation*} \begin{equation*}
p_{i,1}, p_{i,2}, ..., p_{i,m} → l \\ p_{i,1}, p_{i,2}, ..., p_{i,m} → l \\
\end{equation*} \end{equation*}
\begin{equation*} \begin{equation*}
q_{i,1}, p_{i,2}, ..., p_{i,m} → l \\ q_{i,1}, p_{i,2}, ..., p_{i,m} → l \\
\end{equation*} \end{equation*}
\begin{equation*} \begin{equation*}
r_{i,1}, p_{i,2}, ..., p_{i,m} → l r_{i,1}, p_{i,2}, ..., p_{i,m} → l
\end{equation*} \end{equation*}
4) Mixture rule: 4) Mixture rule:
When none of the previous rules apply the clause matrix P → L is When none of the previous rules apply the clause matrix ${P\space{}\to\space{}L}$ is
split into two clause matrices, the first P₁ → L₁ that is the split into two clause matrices, the first $P_{1}\space\to\space{}L_{1}$ that is the
largest prefix matrix for which one of the three previous rules largest prefix matrix for which one of the three previous rules
apply, and P₂ → L₂ containing the remaining rows. The algorithm is apply, and $P_{2}\space\to\space{}L_{2}$ containing the remaining rows. The algorithm is
applied to both matrices. applied to both matrices.
It is important to note that the application of the decomposition It is important to note that the application of the decomposition
algorithm converges. This intuition can be verified by defining the algorithm converges. This intuition can be verified by defining the
@ -1232,41 +1313,6 @@ q_{1,1} & p_{1,2} & \cdots & p_{1,n} → l₁ \\
\vdots & \vdots & \ddots & \vdots → \vdots \\ \vdots & \vdots & \ddots & \vdots → \vdots \\
\end{pmatrix}\big\rvert = n \end{pmatrix}\big\rvert = n
\end{equation*} \end{equation*}
In our prototype we make use of accessors to encode stored values.
\begin{minipage}{0.6\linewidth}
\begin{verbatim}
let value = 1 :: 2 :: 3 :: []
(* that can also be written *)
let value = []
|> List.cons 3
|> List.cons 2
|> List.cons 1
\end{verbatim}
\end{minipage}
\hfill
\begin{minipage}{0.5\linewidth}
\begin{verbatim}
(field 0 x) = 1
(field 0 (field 1 x)) = 2
(field 0 (field 1 (field 1 x)) = 3
(field 0 (field 1 (field 1 (field 1 x)) = []
\end{verbatim}
\end{minipage}
An \emph{accessor} /a/ represents the
access path to a value that can be reached by deconstructing the
scrutinee.
| a ::= Here \vert n.a
The above example, in encoded form:
\begin{verbatim}
Here = 1
1.Here = 2
1.1.Here = 3
1.1.1.Here = []
\end{verbatim}
In our prototype the source matrix mₛ is defined as follows In our prototype the source matrix mₛ is defined as follows
| SMatrix mₛ := (aⱼ)^{j∈J}, ((p_{ij})^{j∈J} → bbᵢ)^{i∈I} | SMatrix mₛ := (aⱼ)^{j∈J}, ((p_{ij})^{j∈J} → bbᵢ)^{i∈I}

View file

@ -126,6 +126,7 @@
+ [ ] Line breaks + [ ] Line breaks
+ [ ] Esempio full signature match: allinea + [ ] Esempio full signature match: allinea
+ [ ] correct. stat. di eq. checking: fuori margine + [ ] correct. stat. di eq. checking: fuori margine
+ [ ] Minipages: spazio o divisore dalle linee prima e dopo
- [X] segreteria: tesi inglese - [X] segreteria: tesi inglese
- [X] Gatti: inglese - [X] Gatti: inglese
- [X] Gatti: Coppo mio relatore - [X] Gatti: Coppo mio relatore