accessors
After Width: | Height: | Size: 918 B |
After Width: | Height: | Size: 398 B |
After Width: | Height: | Size: 519 B |
After Width: | Height: | Size: 655 B |
After Width: | Height: | Size: 409 B |
After Width: | Height: | Size: 2.2 KiB |
After Width: | Height: | Size: 2.8 KiB |
After Width: | Height: | Size: 519 B |
BIN
tesi/tesi.pdf
|
@ -439,7 +439,6 @@ Our algorithm respects the following correctness statement:
|
||||||
v_S \approx v_T \in S \;\land\; C_S(v_S) \neq C_T(v_T)
|
v_S \approx v_T \in S \;\land\; C_S(v_S) \neq C_T(v_T)
|
||||||
\end{align*}
|
\end{align*}
|
||||||
* Background
|
* Background
|
||||||
|
|
||||||
** OCaml
|
** OCaml
|
||||||
Objective Caml (OCaml) is a dialect of the ML (Meta-Language)
|
Objective Caml (OCaml) is a dialect of the ML (Meta-Language)
|
||||||
family of programming that features with other dialects of ML, such
|
family of programming that features with other dialects of ML, such
|
||||||
|
@ -449,9 +448,10 @@ provides many advantages with respect to static type systems of traditional impe
|
||||||
oriented language such as C, C++ and Java, such as:
|
oriented language such as C, C++ and Java, such as:
|
||||||
- Polymorphism: in certain scenarios a function can accept more than one
|
- Polymorphism: in certain scenarios a function can accept more than one
|
||||||
type for the input parameters. For example a function that computes the length of a
|
type for the input parameters. For example a function that computes the length of a
|
||||||
list doesn't need to inspect the type of the elements of the list and for this reason
|
list doesn't need to inspect the type of the elements and for this reason
|
||||||
a List.length function can accept lists of integers, lists of strings and in general
|
a List.length function can accept lists of integers, lists of strings and in general
|
||||||
lists of any type. Such languages offer polymorphic functions through subtyping at
|
lists of any type. JVM languages offer polymorphic functions and
|
||||||
|
classes through subtyping at
|
||||||
runtime only, while other languages such as C++ offer polymorphism through compile
|
runtime only, while other languages such as C++ offer polymorphism through compile
|
||||||
time templates and function overloading.
|
time templates and function overloading.
|
||||||
With the Hindley-Milner type system each well typed function can have more than one
|
With the Hindley-Milner type system each well typed function can have more than one
|
||||||
|
@ -459,8 +459,8 @@ oriented language such as C, C++ and Java, such as:
|
||||||
For example the principal type of the List.length function is "For any /a/, function from
|
For example the principal type of the List.length function is "For any /a/, function from
|
||||||
list of /a/ to /int/" and /a/ is called the /type parameter/.
|
list of /a/ to /int/" and /a/ is called the /type parameter/.
|
||||||
- Strong typing: Languages such as C and C++ allow the programmer to operate on data
|
- Strong typing: Languages such as C and C++ allow the programmer to operate on data
|
||||||
without considering its type, mainly through pointers. Other languages such as C#
|
without considering its type, mainly through pointers[CIT]. Other languages such as Swift
|
||||||
and Go allow type erasure so at runtime the type of the data can't be queried.
|
and Java performs type erasure[CIT] so at runtime the type of the data can't be queried.
|
||||||
In the case of programming languages using an Hindley-Milner type system the
|
In the case of programming languages using an Hindley-Milner type system the
|
||||||
programmer is not allowed to operate on data by ignoring or promoting its type.
|
programmer is not allowed to operate on data by ignoring or promoting its type.
|
||||||
- Type Inference: the principal type of a well formed term can be inferred without any
|
- Type Inference: the principal type of a well formed term can be inferred without any
|
||||||
|
@ -479,10 +479,10 @@ oriented language such as C, C++ and Java, such as:
|
||||||
Moreover ML languages are functional, meaning that functions are
|
Moreover ML languages are functional, meaning that functions are
|
||||||
treated as first class citizens and variables are immutable,
|
treated as first class citizens and variables are immutable,
|
||||||
although mutable statements and imperative constructs are permitted.
|
although mutable statements and imperative constructs are permitted.
|
||||||
In addition to that features an object system, that provides
|
In addition to that OCaml features an object system, that provides
|
||||||
inheritance, subtyping and dynamic binding, and modules, that
|
inheritance, subtyping and dynamic binding, and modules, that
|
||||||
provide a way to encapsulate definitions. Modules are checked
|
provide a way to encapsulate definitions. Modules are checked
|
||||||
statically and can be reifycated through functors.
|
statically and can be reifycated through functors[CIT].
|
||||||
|
|
||||||
** Compiling OCaml code
|
** Compiling OCaml code
|
||||||
|
|
||||||
|
@ -496,17 +496,17 @@ is trasformed into an untyped syntax tree. Code with syntax errors is
|
||||||
rejected at this stage.
|
rejected at this stage.
|
||||||
After that the AST is processed by the type checker that performs
|
After that the AST is processed by the type checker that performs
|
||||||
three steps at once:
|
three steps at once:
|
||||||
- type inference, using the classical /Algorithm W/
|
- type inference, using the classical /Algorithm W/[CIT]
|
||||||
- perform subtyping and gathers type information from the module system
|
- perform subtyping and gathers type information from the module system
|
||||||
- ensures that the code obeys the rule of the OCaml type system
|
- ensures that the code obeys the rule of the OCaml type system
|
||||||
At this stage, incorrectly typed code is rejected. In case of success,
|
At this stage, incorrectly typed code is rejected. In case of success,
|
||||||
the untyped AST in trasformed into a /Typed Tree/.
|
the untyped AST in trasformed into a /Typed Tree/.
|
||||||
After the typechecker has proven that the program is type safe,
|
After the typechecker has proven that the program is type safe,
|
||||||
the compiler lower the code to /Lambda/, an s-expression based
|
the compiler lower the code to /Lambda/, an s-expression based
|
||||||
language that assumes that its input has already been proved safe.
|
language that assumes that its input has already been proved safe[CIT].
|
||||||
After the Lambda pass, the Lambda code is either translated into
|
After the Lambda pass, the Lambda code is either translated into
|
||||||
bytecode or goes through a series of optimization steps performed by
|
bytecode or goes through a series of optimization steps performed by
|
||||||
the /Flambda/ optimizer before being translated into assembly.
|
the /Flambda/ optimizer[CIT] before being translated into assembly.
|
||||||
\begin{comment}
|
\begin{comment}
|
||||||
TODO: Talk about flambda passes?
|
TODO: Talk about flambda passes?
|
||||||
\end{comment}
|
\end{comment}
|
||||||
|
@ -549,6 +549,7 @@ set to 0.
|
||||||
\begin{comment}
|
\begin{comment}
|
||||||
https://dev.realworld.org/compiler-backend.html
|
https://dev.realworld.org/compiler-backend.html
|
||||||
CITE: realworldocaml
|
CITE: realworldocaml
|
||||||
|
[CIT]
|
||||||
\end{comment}
|
\end{comment}
|
||||||
A Lambda code target file is produced by the compiler and consists of a
|
A Lambda code target file is produced by the compiler and consists of a
|
||||||
single s-expression. Every s-expression consist of /(/, a sequence of
|
single s-expression. Every s-expression consist of /(/, a sequence of
|
||||||
|
@ -577,7 +578,8 @@ let test = function
|
||||||
| Fred -> "fred"
|
| Fred -> "fred"
|
||||||
#+END_SRC
|
#+END_SRC
|
||||||
The Lambda output for this code can be obtained by running the
|
The Lambda output for this code can be obtained by running the
|
||||||
compiler with the /-dlambda/ flag:
|
compiler with the /-drawlambda/ flag or in a more compact form with
|
||||||
|
the /-dlambda/ flag:
|
||||||
#+BEGIN_SRC
|
#+BEGIN_SRC
|
||||||
(setglobal Prova!
|
(setglobal Prova!
|
||||||
(let
|
(let
|
||||||
|
@ -590,9 +592,9 @@ compiler with the /-dlambda/ flag:
|
||||||
case int 3: "fred")))
|
case int 3: "fred")))
|
||||||
(makeblock 0 test/85)))
|
(makeblock 0 test/85)))
|
||||||
#+END_SRC
|
#+END_SRC
|
||||||
As outlined by the example, the /makeblock/ directive is responsible
|
As outlined by the example, the /makeblock/ directive allows to
|
||||||
for allocating low level OCaml values and every constant constructor
|
allocate low level OCaml values and every constant constructor
|
||||||
ot the algebraic type /t/ is stored in memory as an integer.
|
of the algebraic type /t/ is stored in memory as an integer.
|
||||||
The /setglobal/ directives declares a new binding in the global scope:
|
The /setglobal/ directives declares a new binding in the global scope:
|
||||||
Every concept of modules is lost at this stage of compilation.
|
Every concept of modules is lost at this stage of compilation.
|
||||||
The pattern matching compiler uses a jump table to map every pattern
|
The pattern matching compiler uses a jump table to map every pattern
|
||||||
|
@ -624,7 +626,7 @@ are mapped to cons cell that are accessed using the /tag/ directive.
|
||||||
case tag 1: (if (!= (field 0 param/90) 0) "fred" "baz"))))
|
case tag 1: (if (!= (field 0 param/90) 0) "fred" "baz"))))
|
||||||
(makeblock 0 test/89)))
|
(makeblock 0 test/89)))
|
||||||
#+END_SRC
|
#+END_SRC
|
||||||
In the Lambda language are several numeric types:
|
In the Lambda language defines several numeric types:
|
||||||
- integers: that us either 31 or 63 bit two's complement arithmetic
|
- integers: that us either 31 or 63 bit two's complement arithmetic
|
||||||
depending on system word size, and also wrapping on overflow
|
depending on system word size, and also wrapping on overflow
|
||||||
- 32 bit and 64 bit integers: that use 32-bit and 64-bit two's complement arithmetic
|
- 32 bit and 64 bit integers: that use 32-bit and 64-bit two's complement arithmetic
|
||||||
|
@ -633,14 +635,12 @@ In the Lambda language are several numeric types:
|
||||||
- floats: that use IEEE754 double-precision (64-bit) arithmetic with
|
- floats: that use IEEE754 double-precision (64-bit) arithmetic with
|
||||||
the addition of the literals /infinity/, /neg_infinity/ and /nan/.
|
the addition of the literals /infinity/, /neg_infinity/ and /nan/.
|
||||||
\\
|
\\
|
||||||
The are various numeric operations defined:
|
The are various numeric operations:
|
||||||
|
|
||||||
- Arithmetic operations: +, -, *, /, % (modulo), neg (unary negation)
|
- Arithmetic operations: +, -, *, /, % (modulo), neg (unary negation)
|
||||||
- Bitwise operations: &, |, ^, <<, >> (zero-shifting), a>> (sign extending)
|
- Bitwise operations: &, |, ^, <<, >> (zero-shifting), a>> (sign extending)
|
||||||
- Numeric comparisons: <, >, <=, >=, ==
|
- Numeric comparisons: <, >, <=, >=, ==
|
||||||
|
|
||||||
*** Functions
|
*** Functions
|
||||||
|
|
||||||
Functions are defined using the following syntax, and close over all
|
Functions are defined using the following syntax, and close over all
|
||||||
bindings in scope: (lambda (arg1 arg2 arg3) BODY)
|
bindings in scope: (lambda (arg1 arg2 arg3) BODY)
|
||||||
and are applied using the following syntax: (apply FUNC ARG ARG ARG)
|
and are applied using the following syntax: (apply FUNC ARG ARG ARG)
|
||||||
|
@ -663,8 +663,7 @@ Spiega che la sintassi che supporti e` quella nella BNF
|
||||||
|
|
||||||
|
|
||||||
** Pattern matching
|
** Pattern matching
|
||||||
|
Pattern matching is a widely adopted mechanism to interact with ADT[CIT].
|
||||||
Pattern matching is a widely adopted mechanism to interact with ADT.
|
|
||||||
C family languages provide branching on predicates through the use of
|
C family languages provide branching on predicates through the use of
|
||||||
if statements and switch statements.
|
if statements and switch statements.
|
||||||
Pattern matching on the other hands express predicates through
|
Pattern matching on the other hands express predicates through
|
||||||
|
@ -673,7 +672,7 @@ arbitrary shapes. One common example of pattern matching is the use of regular
|
||||||
expressions on strings. provides pattern matching on ADT and
|
expressions on strings. provides pattern matching on ADT and
|
||||||
primitive data types.
|
primitive data types.
|
||||||
The result of a pattern matching operation is always one of:
|
The result of a pattern matching operation is always one of:
|
||||||
- this value does not match this pattern”
|
- this value does not match this pattern
|
||||||
- this value matches this pattern, resulting the following bindings of
|
- this value matches this pattern, resulting the following bindings of
|
||||||
names to values and the jump to the expression pointed at the
|
names to values and the jump to the expression pointed at the
|
||||||
pattern.
|
pattern.
|
||||||
|
@ -687,54 +686,42 @@ match color with
|
||||||
| Green -> print "green"
|
| Green -> print "green"
|
||||||
| _ -> print "white or black"
|
| _ -> print "white or black"
|
||||||
#+END_SRC
|
#+END_SRC
|
||||||
|
Pattern matching clauses provide tokens to express data destructoring.
|
||||||
provides tokens to express data destructoring.
|
|
||||||
For example we can examine the content of a list with pattern matching
|
For example we can examine the content of a list with pattern matching
|
||||||
|
|
||||||
#+BEGIN_SRC
|
#+BEGIN_SRC
|
||||||
|
|
||||||
begin match list with
|
begin match list with
|
||||||
| [ ] -> print "empty list"
|
| [ ] -> print "empty list"
|
||||||
| element1 :: [ ] -> print "one element"
|
| element1 :: [ ] -> print "one element"
|
||||||
| (element1 :: element2) :: [ ] -> print "two elements"
|
| (element1 :: element2) :: [ ] -> print "two elements"
|
||||||
| head :: tail-> print "head followed by many elements"
|
| head :: tail-> print "head followed by many elements"
|
||||||
#+END_SRC
|
#+END_SRC
|
||||||
\\
|
|
||||||
Parenthesized patterns, such as the third one in the previous example,
|
Parenthesized patterns, such as the third one in the previous example,
|
||||||
matches the same value as the pattern without parenthesis.
|
matches the same value as the pattern without parenthesis.
|
||||||
\\
|
\\
|
||||||
The same could be done with tuples
|
The same could be done with tuples
|
||||||
#+BEGIN_SRC
|
#+BEGIN_SRC
|
||||||
|
|
||||||
begin match tuple with
|
begin match tuple with
|
||||||
| (Some _, Some _) -> print "Pair of optional types"
|
| (Some _, Some _) -> print "Pair of optional types"
|
||||||
| (Some _, None) | (None, Some _) -> print "Pair of optional types, one of which is null"
|
| (Some _, None) | (None, Some _) -> print "Pair of optional types, one of which is null"
|
||||||
| (None, None) -> print "Pair of optional types, both null"
|
| (None, None) -> print "Pair of optional types, both null"
|
||||||
#+END_SRC
|
#+END_SRC
|
||||||
|
|
||||||
The pattern pattern₁ | pattern₂ represents the logical "or" of the
|
The pattern pattern₁ | pattern₂ represents the logical "or" of the
|
||||||
two patterns pattern₁ and pattern₂. A value matches pattern₁ |
|
two patterns, pattern₁ and pattern₂. A value matches /pattern₁ | pattern₂/ if it matches pattern₁ or pattern₂.
|
||||||
pattern₂ if it matches pattern₁ or pattern₂.
|
|
||||||
\\
|
\\
|
||||||
Pattern clauses can make the use of /guards/ to test predicates and
|
Pattern clauses can make the use of /guards/ to test predicates and
|
||||||
variables can captured (binded in scope).
|
variables can captured (binded in scope).
|
||||||
|
|
||||||
#+BEGIN_SRC
|
#+BEGIN_SRC
|
||||||
|
|
||||||
begin match token_list with
|
begin match token_list with
|
||||||
| "switch"::var::"{"::rest -> ...
|
| "switch"::var::"{"::rest -> ...
|
||||||
| "case"::":"::var::rest when is_int var -> ...
|
| "case"::":"::var::rest when is_int var -> ...
|
||||||
| "case"::":"::var::rest when is_string var -> ...
|
| "case"::":"::var::rest when is_string var -> ...
|
||||||
| "}"::[ ] -> ...
|
| "}"::[ ] -> ...
|
||||||
| "}"::rest -> error "syntax error: " rest
|
| "}"::rest -> error "syntax error: " rest
|
||||||
|
|
||||||
#+END_SRC
|
#+END_SRC
|
||||||
\\
|
|
||||||
Moreover, the pattern matching compiler emits a warning when a
|
Moreover, the pattern matching compiler emits a warning when a
|
||||||
pattern is not exhaustive or some patterns are shadowed by precedent ones.
|
pattern is not exhaustive or some patterns are shadowed by precedent ones.
|
||||||
|
|
||||||
** Symbolic execution
|
** Symbolic execution
|
||||||
|
|
||||||
Symbolic execution is a widely used techniques in the field of
|
Symbolic execution is a widely used techniques in the field of
|
||||||
computer security.
|
computer security.
|
||||||
It allows to analyze different execution paths of a program
|
It allows to analyze different execution paths of a program
|
||||||
|
@ -749,8 +736,12 @@ domain of all possible inputs of a program, detecting infeasible
|
||||||
paths, dead code and proving that two code segments are equivalent.
|
paths, dead code and proving that two code segments are equivalent.
|
||||||
\\
|
\\
|
||||||
Let's take as example this signedness bug that was found in the
|
Let's take as example this signedness bug that was found in the
|
||||||
FreeBSD kernel and allowed, when calling the getpeername function, to
|
FreeBSD kernel [CIT] and allowed, when calling the /getpeername/ function, to
|
||||||
read portions of kernel memory.
|
read portions of kernel memory.
|
||||||
|
\begin{comment}
|
||||||
|
TODO: controlla paginazione
|
||||||
|
\end{comment}
|
||||||
|
\clearpage
|
||||||
#+BEGIN_SRC
|
#+BEGIN_SRC
|
||||||
int compat;
|
int compat;
|
||||||
{
|
{
|
||||||
|
@ -775,10 +766,11 @@ bad:
|
||||||
return (error);
|
return (error);
|
||||||
}
|
}
|
||||||
#+END_SRC
|
#+END_SRC
|
||||||
|
The tree of the execution is presented below.
|
||||||
The tree of the execution when the function is evaluated considering
|
It is built by evaluating the function by consider the integer
|
||||||
/int len/ our symbolic variable α, sa->sa_len as symbolic variable β
|
variable /len/ the symbolic variable α, sa->sa_len the symbolic variable β
|
||||||
and π as the set of constraints on a symbolic variable:
|
and π indicates the set of constraints on the symbolic variables.
|
||||||
|
The input values to the functions are identified by σ.
|
||||||
[[./files/symb_exec.png]]
|
[[./files/symb_exec.png]]
|
||||||
\begin{comment}
|
\begin{comment}
|
||||||
[1] compat (...) { π_{α}: -∞ < α < ∞ }
|
[1] compat (...) { π_{α}: -∞ < α < ∞ }
|
||||||
|
@ -793,15 +785,15 @@ We can see that at step 3 the set of possible values of the scrutinee
|
||||||
α is bigger than the set of possible values of the input σ to the
|
α is bigger than the set of possible values of the input σ to the
|
||||||
/cast/ directive, that is: π_{α} ⊈ π_{σ}. For this reason the /cast/ may fail when α is /len/
|
/cast/ directive, that is: π_{α} ⊈ π_{σ}. For this reason the /cast/ may fail when α is /len/
|
||||||
negative number, outside the domain π_{σ}. In C this would trigger undefined behaviour (signed
|
negative number, outside the domain π_{σ}. In C this would trigger undefined behaviour (signed
|
||||||
overflow) that made the exploitation possible.
|
overflow) that made the exploit possible.
|
||||||
\\
|
*** Symbolic Execution in terms of Hoare Logic
|
||||||
Every step of evaluation can be modelled as the following transition:
|
Every step of the evaluation in a symbolic engine can be modelled as the following transition:
|
||||||
\[
|
\[
|
||||||
(π_{σ}, (πᵢ)ⁱ) → (π'_{σ}, (π'ᵢ)ⁱ)
|
(π_{σ}, (πᵢ)ⁱ) → (π'_{σ}, (π'ᵢ)ⁱ)
|
||||||
\]
|
\]
|
||||||
if we express the π constraints as logical formulas we can model the
|
if we express the π transitions as logical formulas we can model the
|
||||||
execution of the program in terms of Hoare Logic.
|
execution of the program in terms of Hoare Logic.
|
||||||
State of the computation is a Hoare triple {P}C{Q} where P and Q are
|
The state of the computation is a Hoare triple {P}C{Q} where P and Q are
|
||||||
respectively the /precondition/ and the /postcondition/ that
|
respectively the /precondition/ and the /postcondition/ that
|
||||||
constitute the assertions of the program. C is the directive being
|
constitute the assertions of the program. C is the directive being
|
||||||
executed.
|
executed.
|
||||||
|
@ -812,7 +804,8 @@ In the Hoare rules assertions could also take the form of
|
||||||
| P ::= \forall i. P \vert \exists i. P \vert P_{1} \Rightarrow P_{2}
|
| P ::= \forall i. P \vert \exists i. P \vert P_{1} \Rightarrow P_{2}
|
||||||
where i is a logical variable, but assertions of these kinds increases
|
where i is a logical variable, but assertions of these kinds increases
|
||||||
the complexity of the symbolic engine.
|
the complexity of the symbolic engine.
|
||||||
Execution follows the rules of Hoare logic:
|
\
|
||||||
|
Execution follows the following inference rules:
|
||||||
- Empty statement :
|
- Empty statement :
|
||||||
\begin{mathpar}
|
\begin{mathpar}
|
||||||
\infer{ }
|
\infer{ }
|
||||||
|
@ -826,7 +819,7 @@ Execution follows the rules of Hoare logic:
|
||||||
\end{mathpar}
|
\end{mathpar}
|
||||||
\\
|
\\
|
||||||
- Composition : c₁ and c₂ are directives that are executed in order;
|
- Composition : c₁ and c₂ are directives that are executed in order;
|
||||||
{Q} is called the /midcondition/.
|
{Q} is called the /mid condition/.
|
||||||
\begin{mathpar}
|
\begin{mathpar}
|
||||||
\infer { \{P\}c_1\{R\}, \{R\}c_2\{Q\} }
|
\infer { \{P\}c_1\{R\}, \{R\}c_2\{Q\} }
|
||||||
{ \{P\}c₁;c₂\{Q\} }
|
{ \{P\}c₁;c₂\{Q\} }
|
||||||
|
@ -839,7 +832,7 @@ Execution follows the rules of Hoare logic:
|
||||||
\end{mathpar}
|
\end{mathpar}
|
||||||
\\
|
\\
|
||||||
- Loop : {P} is the loop invariant. After the loop is finished /P/
|
- Loop : {P} is the loop invariant. After the loop is finished /P/
|
||||||
holds and ¬̸b caused the loop to end.
|
holds and ¬b caused the loop to end.
|
||||||
\begin{mathpar}
|
\begin{mathpar}
|
||||||
\infer { \{P \wedge b \}c\{P\} }
|
\infer { \{P \wedge b \}c\{P\} }
|
||||||
{ \{P\}\text{while b do c} \{P \wedge \neg b\} }
|
{ \{P\}\text{while b do c} \{P \wedge \neg b\} }
|
||||||
|
@ -847,14 +840,14 @@ Execution follows the rules of Hoare logic:
|
||||||
\\
|
\\
|
||||||
Even if the semantics of symbolic execution engines are well defined,
|
Even if the semantics of symbolic execution engines are well defined,
|
||||||
the user may run into different complications when applying such
|
the user may run into different complications when applying such
|
||||||
analysis to non trivial codebases.
|
analysis to non trivial codebases.[CIT]
|
||||||
For example, depending on the domain, loop termination is not
|
For example, depending on the domain, loop termination is not
|
||||||
guaranteed. Even when termination is guaranteed, looping causes
|
guaranteed. Even when termination is guaranteed, looping causes
|
||||||
exponential branching that may lead to path explosion or state
|
exponential branching that may lead to path explosion or state
|
||||||
explosion.
|
explosion.[CIT]
|
||||||
Reasoning about all possible executions of a program is not always
|
Reasoning about all possible executions of a program is not always
|
||||||
feasible and in case of explosion usually symbolic execution engines
|
feasible[CIT] and in case of explosion usually symbolic execution engines
|
||||||
implement heuristics to reduce the size of the search space.
|
implement heuristics to reduce the size of the search space.[CIT]
|
||||||
|
|
||||||
** Translation validation
|
** Translation validation
|
||||||
Translators, such as translators and code generators, are huge pieces of
|
Translators, such as translators and code generators, are huge pieces of
|
||||||
|
@ -873,10 +866,97 @@ existing translators that consists of taking the source and the target
|
||||||
- [ ] Translation validation through symbolic execution
|
- [ ] Translation validation through symbolic execution
|
||||||
|
|
||||||
* Translation validation of the Pattern Matching Compiler
|
* Translation validation of the Pattern Matching Compiler
|
||||||
|
** Accessors
|
||||||
|
OCaml encourages widespread usage of composite types to encapsulate
|
||||||
|
data. Composite types include /discriminated unions/, of which we have
|
||||||
|
seen different use cases, and /records/, that are a form of /product
|
||||||
|
types/ [CIT] such as /structures/ in C. \\
|
||||||
|
\begin{minipage}{0.6\linewidth}
|
||||||
|
\begin{verbatim}
|
||||||
|
struct Shape {
|
||||||
|
int centerx;
|
||||||
|
int centery;
|
||||||
|
enum ShapeKind kind;
|
||||||
|
union {
|
||||||
|
struct { int side; };
|
||||||
|
struct { int length, height; };
|
||||||
|
struct { int radius; };
|
||||||
|
};
|
||||||
|
};
|
||||||
|
\end{verbatim}
|
||||||
|
\end{minipage}
|
||||||
|
\hfill
|
||||||
|
\begin{minipage}{0.4\linewidth}
|
||||||
|
\begin{verbatim}
|
||||||
|
type shape = {
|
||||||
|
x:int;
|
||||||
|
y:int;
|
||||||
|
kind:shapekind
|
||||||
|
}
|
||||||
|
and shapekind
|
||||||
|
| Square of int
|
||||||
|
| Rect of int * int
|
||||||
|
| Circle of int
|
||||||
|
\end{verbatim}
|
||||||
|
\end{minipage}
|
||||||
|
\begin{comment}
|
||||||
|
TODO: metti linea qualcosa a dividere
|
||||||
|
\end{comment}
|
||||||
|
Primitive OCaml datatypes include aggregate types in the form of
|
||||||
|
/tuples/ and /lists/. Other aggregate types are built using module
|
||||||
|
functors. [CIT]
|
||||||
|
Low level /Lambda/ untyped constructors of the form
|
||||||
|
#+BEGIN_SRC
|
||||||
|
type t = Constant | Cell of int * t
|
||||||
|
#+END_SRC
|
||||||
|
provides enough flexibility to encode source higher kinded types.
|
||||||
|
This shouldn't surprise because the /Lambda/ language consists of
|
||||||
|
s-expressions. The /field/ operation allows to address a /Cell/ value
|
||||||
|
and the expressions /(field 0 x)/ and /(field 1 x)/ are equivalent to
|
||||||
|
the Lisp primitives /(car x)/ and /(cdr x)/ respectively.
|
||||||
|
\begin{comment}
|
||||||
|
(* that can also be written *)
|
||||||
|
let value = List.cons 1
|
||||||
|
@@ List.cons 2
|
||||||
|
@@ List.cons 3 []
|
||||||
|
\end{comment}
|
||||||
|
\begin{minipage}{0.3\linewidth}
|
||||||
|
\begin{verbatim}
|
||||||
|
let value = 1 :: 2 :: 3 :: []
|
||||||
|
\end{verbatim}
|
||||||
|
\end{minipage}
|
||||||
|
\hfill
|
||||||
|
\begin{minipage}{0.5\linewidth}
|
||||||
|
\begin{verbatim}
|
||||||
|
(field 0 x) = 1
|
||||||
|
(field 0 (field 1 x)) = 2
|
||||||
|
(field 0 (field 1 (field 1 x)) = 3
|
||||||
|
(field 0 (field 1 (field 1 (field 1 x)) = []
|
||||||
|
\end{verbatim}
|
||||||
|
\end{minipage} \\
|
||||||
|
\\
|
||||||
|
We can represent the concrete value of a higher kinded type as a flat
|
||||||
|
list of cells.
|
||||||
|
In the prototype we call this "view" into the value of a datatype the
|
||||||
|
\emph{accessor} /a/.
|
||||||
|
| a ::= Here \vert n.a
|
||||||
|
Accessors have some resemblance with the low level /Cell/ values, such
|
||||||
|
as the fact that both don't encode type informations; for example the
|
||||||
|
accessor of a list of integers is structurally equivalent to the
|
||||||
|
accessor of a tuple containing the same elements.
|
||||||
|
\\
|
||||||
|
We can intuitively think of the /accessor/ as the
|
||||||
|
access path to a value that can be reached by deconstructing the
|
||||||
|
scrutinee.
|
||||||
|
At the source level /accessors/ are constructed by inspecting the structure of
|
||||||
|
the patterns at hand.
|
||||||
|
At the target level /accessors/ are constructed by compressing the steps taken by
|
||||||
|
the symbolic engine during the evaluation of a value.
|
||||||
|
/Accessors/ don't store informations about the concrete value of the scrutinee or
|
||||||
|
it's symbolic domain of values.
|
||||||
|
|
||||||
** Source program
|
** Source program
|
||||||
|
The OCaml source code of a pattern matching function has the
|
||||||
The source code of a pattern matching function has the
|
|
||||||
following form:
|
following form:
|
||||||
|
|
||||||
|match variable with
|
|match variable with
|
||||||
|
@ -895,10 +975,10 @@ Pattern matching code could also be written using the more compact form:
|
||||||
|\vert pattern₃ as var \to expr₃
|
|\vert pattern₃ as var \to expr₃
|
||||||
|⋮
|
|⋮
|
||||||
|\vert pₙ \to exprₙ
|
|\vert pₙ \to exprₙ
|
||||||
|
\begin{comment}
|
||||||
|
TODO: paginazione seria per BNF
|
||||||
|
\end{comment}
|
||||||
This BNF grammar describes formally the grammar of the source program:
|
This BNF grammar describes formally the grammar of the source program:
|
||||||
|
|
||||||
| start ::= "match" id "with" patterns \vert{} "function" patterns
|
| start ::= "match" id "with" patterns \vert{} "function" patterns
|
||||||
| patterns ::= (pattern0\vert{}pattern1) pattern1+
|
| patterns ::= (pattern0\vert{}pattern1) pattern1+
|
||||||
| ;; pattern0 and pattern1 are needed to distinguish the first case in which
|
| ;; pattern0 and pattern1 are needed to distinguish the first case in which
|
||||||
|
@ -908,16 +988,17 @@ This BNF grammar describes formally the grammar of the source program:
|
||||||
| clause ::= lexpr "->" rexpr
|
| clause ::= lexpr "->" rexpr
|
||||||
| lexpr ::= rule (ε\vert{}condition)
|
| lexpr ::= rule (ε\vert{}condition)
|
||||||
| rexpr ::= _code ;; arbitrary code
|
| rexpr ::= _code ;; arbitrary code
|
||||||
| rule ::= wildcard\vert{}variable\vert{}constructor_pattern\vert or_pattern ;;
|
| rule ::= wildcard\vert{}variable\vert{}constructor_pattern\vert or_pattern
|
||||||
| wildcard ::= "_"
|
| wildcard ::= "_"
|
||||||
| variable ::= identifier
|
| variable ::= identifier
|
||||||
| constructor_pattern ::= constructor (rule\vert{}ε) (assignment\vert{}ε)
|
| constructor_pattern ::= constructor (rule\vert{}ε) (assignment\vert{}ε)
|
||||||
| constructor ::= int\vert{}float\vert{}char\vert{}string\vert{}bool \vert{}unit\vert{}record\vert{}exn\vert{}objects\vert{}ref \vert{}list\vert{}tuple\vert{}array\vert{}variant\vert{}parameterized_variant ;; data types
|
| constructor ::= int\vert{}float\vert{}char\vert{}string\vert{}bool \vert{}unit
|
||||||
|
| \quad\quad\quad\quad\quad\quad\quad \vert{}record\vert{}exn\vert{}objects\vert{}ref \vert{}list\vert{}tuple\vert{}array\vert{}variant\vert{}parameterized_variant ;; data types
|
||||||
| or_pattern ::= rule ("\vert{}" wildcard\vert{}variable\vert{}constructor_pattern)+
|
| or_pattern ::= rule ("\vert{}" wildcard\vert{}variable\vert{}constructor_pattern)+
|
||||||
| condition ::= "when" bexpr
|
| condition ::= "when" b_guard
|
||||||
| assignment ::= "as" id
|
| assignment ::= "as" id
|
||||||
| bexpr ::= _code ;; arbitrary code
|
| b_guard ::= ocaml_expression ;; arbitrary code
|
||||||
The source program is parsed using the ocaml-compiler-libs library.
|
The source program is parsed using the ocaml-compiler-libs [CIT] library.
|
||||||
The result of parsing, when successful, results in a list of clauses
|
The result of parsing, when successful, results in a list of clauses
|
||||||
and a list of type declarations.
|
and a list of type declarations.
|
||||||
Every clause consists of three objects: a left-hand-side that is the
|
Every clause consists of three objects: a left-hand-side that is the
|
||||||
|
@ -1037,17 +1118,15 @@ program tₛ:
|
||||||
A sound approach for treating these blackboxes would be to inspect the
|
A sound approach for treating these blackboxes would be to inspect the
|
||||||
OCaml compiler during translation to Lambda code and extract the
|
OCaml compiler during translation to Lambda code and extract the
|
||||||
blackboxes compiled in their Lambda representation.
|
blackboxes compiled in their Lambda representation.
|
||||||
This would allow to test for equality with the respective blackbox at
|
This would allow to test for structural equality with the counterpart
|
||||||
the target level.
|
Lambda blackboxes at the target level.
|
||||||
Given that this level of introspection is currently not possibile, we
|
Given that this level of introspection is currently not possibile, we
|
||||||
decided to restrict the structure of blackboxes to the following (valid) OCaml
|
decided to restrict the structure of blackboxes to the following (valid) OCaml
|
||||||
code:
|
code:
|
||||||
|
|
||||||
#+BEGIN_SRC
|
#+BEGIN_SRC
|
||||||
external guard : 'a -> 'b = "guard"
|
external guard : 'a -> 'b = "guard"
|
||||||
external observe : 'a -> 'b = "observe"
|
external observe : 'a -> 'b = "observe"
|
||||||
#+END_SRC
|
#+END_SRC
|
||||||
|
|
||||||
We assume these two external functions /guard/ and /observe/ with a valid
|
We assume these two external functions /guard/ and /observe/ with a valid
|
||||||
type that lets the user pass any number of arguments to them.
|
type that lets the user pass any number of arguments to them.
|
||||||
All the guards are of the form \texttt{guard <arg> <arg> <arg>}, where the
|
All the guards are of the form \texttt{guard <arg> <arg> <arg>}, where the
|
||||||
|
@ -1068,11 +1147,9 @@ let _ = function
|
||||||
We note that the right hand side of /observe/ is just an arbitrary
|
We note that the right hand side of /observe/ is just an arbitrary
|
||||||
value and in this case just enumerates the order in which expressions
|
value and in this case just enumerates the order in which expressions
|
||||||
appear.
|
appear.
|
||||||
Even if this is an oversimplification of the problem for the
|
This oversimplification of the structure of arbitrary code blackboxes
|
||||||
prototype, it must be noted that at the compiler level we have the
|
allows us to test for structural equivalence without querying the
|
||||||
possibility to compile the pattern clauses in two separate steps so
|
compiler during the /TypedTree/ to /Lambda/ translation phase.
|
||||||
that the guards and right-hand-side expressions are semantically equal
|
|
||||||
to their counterparts at the target program level.
|
|
||||||
\begin{lstlisting}
|
\begin{lstlisting}
|
||||||
let _ = function
|
let _ = function
|
||||||
| K1 -> lambda₀
|
| K1 -> lambda₀
|
||||||
|
@ -1133,6 +1210,7 @@ following four rules:
|
||||||
\end{equation*}
|
\end{equation*}
|
||||||
where
|
where
|
||||||
\begin{equation*}
|
\begin{equation*}
|
||||||
|
P' \to L' =
|
||||||
\begin{pmatrix}
|
\begin{pmatrix}
|
||||||
p_{1,2} & \cdots & p_{1,n} & → & (let & y₁ & x₁) & l₁ \\
|
p_{1,2} & \cdots & p_{1,n} & → & (let & y₁ & x₁) & l₁ \\
|
||||||
p_{2,2} & \cdots & p_{2,n} & → & (let & y₂ & x₁) & l₂ \\
|
p_{2,2} & \cdots & p_{2,n} & → & (let & y₂ & x₁) & l₂ \\
|
||||||
|
@ -1141,9 +1219,9 @@ following four rules:
|
||||||
\end{pmatrix}
|
\end{pmatrix}
|
||||||
\end{equation*}
|
\end{equation*}
|
||||||
|
|
||||||
That means in every lambda action lᵢ there is a binding of x₁ to the
|
That means in every lambda action lᵢ in the P' → L' matrix there is a binding of x₁ to the
|
||||||
variable that appears on the pattern p_{i,1}. Bindings are omitted
|
variable that appears on the pattern p_{i,1}. When there are
|
||||||
for wildcard patterns and the lambda action lᵢ remains unchanged.
|
wildcard patterns, bindings are omitted the lambda action lᵢ remains unchanged.
|
||||||
|
|
||||||
2) Constructor rule: if all patterns in the first column of P are
|
2) Constructor rule: if all patterns in the first column of P are
|
||||||
constructors patterns of the form k(q₁, q₂, ..., q_{n'}) we define a
|
constructors patterns of the form k(q₁, q₂, ..., q_{n'}) we define a
|
||||||
|
@ -1157,7 +1235,7 @@ following four rules:
|
||||||
p ← $q_{i,1}$, $q_{i,2}$, ..., $q_{i,n'}$, $p_{i,2}$, $p_{i,3}$, ..., $p_{i, n}$
|
p ← $q_{i,1}$, $q_{i,2}$, ..., $q_{i,n'}$, $p_{i,2}$, $p_{i,3}$, ..., $p_{i, n}$
|
||||||
\end{lstlisting}
|
\end{lstlisting}
|
||||||
Patterns of the form $q_{i,j}$ matches on the values of the
|
Patterns of the form $q_{i,j}$ matches on the values of the
|
||||||
constructor and we define new fresh variables y₁, y₂, ..., yₐ so
|
constructor and we define the variables y₁, y₂, ..., yₐ so
|
||||||
that the lambda action lᵢ becomes
|
that the lambda action lᵢ becomes
|
||||||
|
|
||||||
\begin{lstlisting}[mathescape,columns=fullflexible,basicstyle=\fontfamily{lmvtt}\selectfont,]
|
\begin{lstlisting}[mathescape,columns=fullflexible,basicstyle=\fontfamily{lmvtt}\selectfont,]
|
||||||
|
@ -1169,7 +1247,7 @@ following four rules:
|
||||||
\end{lstlisting}
|
\end{lstlisting}
|
||||||
|
|
||||||
and the result of the compilation for the set of constructors
|
and the result of the compilation for the set of constructors
|
||||||
{c₁, c₂, ..., cₖ} is:
|
${\{c_{1}, c_{2}, ..., c_{k}\}}$ is:
|
||||||
|
|
||||||
\begin{lstlisting}[mathescape,columns=fullflexible,basicstyle=\fontfamily{lmvtt}\selectfont,]
|
\begin{lstlisting}[mathescape,columns=fullflexible,basicstyle=\fontfamily{lmvtt}\selectfont,]
|
||||||
switch x₁ with
|
switch x₁ with
|
||||||
|
@ -1180,27 +1258,30 @@ following four rules:
|
||||||
default: exit
|
default: exit
|
||||||
\end{lstlisting}
|
\end{lstlisting}
|
||||||
\\
|
\\
|
||||||
|
\begin{comment}
|
||||||
|
TODO: il tre viene cambiato in uno e la lista divisa. Risolvi.
|
||||||
|
\end{comment}
|
||||||
3) Orpat rule: there are various strategies for dealing with
|
3) Orpat rule: there are various strategies for dealing with
|
||||||
or-patterns. The most naive one is to split the or-patterns.
|
or-patterns. The most naive one is to split the or-patterns.
|
||||||
For example a row p containing an or-pattern:
|
For example a row p containing an or-pattern:
|
||||||
\begin{equation*}
|
\begin{equation*}
|
||||||
(p_{i,1}|q_{i,1}|r_{i,1}), p_{i,2}, ..., p_{i,m} → lᵢ
|
(p_{i,1}|q_{i,1}|r_{i,1}), p_{i,2}, ..., p_{i,m} → l
|
||||||
\end{equation*}
|
\end{equation*}
|
||||||
results in three rows added to the clause matrix
|
results in three rows added to the clause matrix
|
||||||
\begin{equation*}
|
\begin{equation*}
|
||||||
p_{i,1}, p_{i,2}, ..., p_{i,m} → lᵢ \\
|
p_{i,1}, p_{i,2}, ..., p_{i,m} → l \\
|
||||||
\end{equation*}
|
\end{equation*}
|
||||||
\begin{equation*}
|
\begin{equation*}
|
||||||
q_{i,1}, p_{i,2}, ..., p_{i,m} → lᵢ \\
|
q_{i,1}, p_{i,2}, ..., p_{i,m} → l \\
|
||||||
\end{equation*}
|
\end{equation*}
|
||||||
\begin{equation*}
|
\begin{equation*}
|
||||||
r_{i,1}, p_{i,2}, ..., p_{i,m} → lᵢ
|
r_{i,1}, p_{i,2}, ..., p_{i,m} → l
|
||||||
\end{equation*}
|
\end{equation*}
|
||||||
4) Mixture rule:
|
4) Mixture rule:
|
||||||
When none of the previous rules apply the clause matrix P → L is
|
When none of the previous rules apply the clause matrix ${P\space{}\to\space{}L}$ is
|
||||||
split into two clause matrices, the first P₁ → L₁ that is the
|
split into two clause matrices, the first $P_{1}\space\to\space{}L_{1}$ that is the
|
||||||
largest prefix matrix for which one of the three previous rules
|
largest prefix matrix for which one of the three previous rules
|
||||||
apply, and P₂ → L₂ containing the remaining rows. The algorithm is
|
apply, and $P_{2}\space\to\space{}L_{2}$ containing the remaining rows. The algorithm is
|
||||||
applied to both matrices.
|
applied to both matrices.
|
||||||
It is important to note that the application of the decomposition
|
It is important to note that the application of the decomposition
|
||||||
algorithm converges. This intuition can be verified by defining the
|
algorithm converges. This intuition can be verified by defining the
|
||||||
|
@ -1232,41 +1313,6 @@ q_{1,1} & p_{1,2} & \cdots & p_{1,n} → l₁ \\
|
||||||
\vdots & \vdots & \ddots & \vdots → \vdots \\
|
\vdots & \vdots & \ddots & \vdots → \vdots \\
|
||||||
\end{pmatrix}\big\rvert = n
|
\end{pmatrix}\big\rvert = n
|
||||||
\end{equation*}
|
\end{equation*}
|
||||||
In our prototype we make use of accessors to encode stored values.
|
|
||||||
|
|
||||||
\begin{minipage}{0.6\linewidth}
|
|
||||||
\begin{verbatim}
|
|
||||||
|
|
||||||
let value = 1 :: 2 :: 3 :: []
|
|
||||||
(* that can also be written *)
|
|
||||||
let value = []
|
|
||||||
|> List.cons 3
|
|
||||||
|> List.cons 2
|
|
||||||
|> List.cons 1
|
|
||||||
\end{verbatim}
|
|
||||||
\end{minipage}
|
|
||||||
\hfill
|
|
||||||
\begin{minipage}{0.5\linewidth}
|
|
||||||
\begin{verbatim}
|
|
||||||
|
|
||||||
|
|
||||||
(field 0 x) = 1
|
|
||||||
(field 0 (field 1 x)) = 2
|
|
||||||
(field 0 (field 1 (field 1 x)) = 3
|
|
||||||
(field 0 (field 1 (field 1 (field 1 x)) = []
|
|
||||||
\end{verbatim}
|
|
||||||
\end{minipage}
|
|
||||||
An \emph{accessor} /a/ represents the
|
|
||||||
access path to a value that can be reached by deconstructing the
|
|
||||||
scrutinee.
|
|
||||||
| a ::= Here \vert n.a
|
|
||||||
The above example, in encoded form:
|
|
||||||
\begin{verbatim}
|
|
||||||
Here = 1
|
|
||||||
1.Here = 2
|
|
||||||
1.1.Here = 3
|
|
||||||
1.1.1.Here = []
|
|
||||||
\end{verbatim}
|
|
||||||
In our prototype the source matrix mₛ is defined as follows
|
In our prototype the source matrix mₛ is defined as follows
|
||||||
| SMatrix mₛ := (aⱼ)^{j∈J}, ((p_{ij})^{j∈J} → bbᵢ)^{i∈I}
|
| SMatrix mₛ := (aⱼ)^{j∈J}, ((p_{ij})^{j∈J} → bbᵢ)^{i∈I}
|
||||||
|
|
||||||
|
|
1
todo.org
|
@ -126,6 +126,7 @@
|
||||||
+ [ ] Line breaks
|
+ [ ] Line breaks
|
||||||
+ [ ] Esempio full signature match: allinea
|
+ [ ] Esempio full signature match: allinea
|
||||||
+ [ ] correct. stat. di eq. checking: fuori margine
|
+ [ ] correct. stat. di eq. checking: fuori margine
|
||||||
|
+ [ ] Minipages: spazio o divisore dalle linee prima e dopo
|
||||||
- [X] segreteria: tesi inglese
|
- [X] segreteria: tesi inglese
|
||||||
- [X] Gatti: inglese
|
- [X] Gatti: inglese
|
||||||
- [X] Gatti: Coppo mio relatore
|
- [X] Gatti: Coppo mio relatore
|
||||||
|
|