diff --git a/tesi/files/ocamlcompilation b/tesi/files/ocamlcompilation new file mode 100644 index 0000000..98fd67b --- /dev/null +++ b/tesi/files/ocamlcompilation @@ -0,0 +1 @@ +7VrbctowEP0aHtuxfIM8cjGBSbgMuNPSl47BCrgRFjUi4Hx9pVjGWDIxSTGQ0BeQVvc9R7taySWtPt/cBs5i1sEuRCVVcTclrVFSVaCrCv1jkjCSVACIBNPAc3mlRDD0niEX8nbTlefCZaoiwRgRb5EWTrDvwwlJyZwgwOt0tQeM0qMunCmUBMOJg2Tpd88lM74KQ0nkLehNZ/HIQOElcyeuzAXLmePi9Y5Is0paPcCYRKn5pg4RU16sl6hdc0/pdmIB9MkhDR6NWiP8Y4SkZQx/9Z/HzrJf/aJGvTw5aMUX3K8Ohu3uLZ8zCWNFBHjlu5D1BUpabT3zCBwunAkrXVPoqWxG5ogXP2CfNJ25hxjqLYieIPEmDi8Y8k75Cp5gQOBm76LAVlWUYxDPIQlCWoU30DSu3XCr/yi/TsACOpfNdoAqc5nD+THddp2okCa4Ft+gUU3SqKRK6LtVRk2amyBnufQmB2oPbjzyg8qUrwbPjXgTlm4wnShxJuSZaHToSjQXVExniFfBBOaRRYZiR9VGhqZjWQCRQ7yn9DSy1M9H6GOPTjBBupxGWtcFBKPp81a7+0DoSFfSHaliR8QJppBIHb2wYbvs9xNEL5IgPp3bDkNYdhSzgmUSjrzk3k6SSDmXShIRW+O9HFHT/Wg3p+WIIXGEb07m6FwoEYYaUJKmyJIE+BHWMcIBlfjYh4w3HkKCyEHe1Gc8o0BDKq8xc0zJhKq8YO65Lhsm0+QnTkF5jZZHMPQisqZs5ysZvFKLMvNmxi42EQNhTBNTlrBHfYtWqbes+t2LR+XlQVwhltAJjBNZIX73GK5W32M3d12tmoGBWRQG5XwM7qudWqNKKzV7gw4Do9fpt++rdrvX/Wx4aBlb4rR43OTjURvZVr3XsLZQWIOPjoMq+Aq9cm4c4jPwa0A0tzuj17fbnfbPjw+EboJLAwLkA7GzI9pd2xr0B5b98bEQN4WpnB0LOdSVvUW7e8d0r1S7DfY7HFqd2icwUbp5cWhcWphcfOSUG17zI2VuhFXOxvpEYXhlz6njzSGWcHOjAaGjgkMsIMdYrzGQh0v/EoSDI1HpUoLro93A6GKUVjT0WcHbsYzP1aCoVvace0+FYlb4dzQXcv6LNKB8CpqYgp03tBPTpPL/pLEnUM+n4FktFRBiiLLoJg6lIBCeDkzx8adoCmZdjFw5BePTXj4H1bNyUHByhkidg72laAZPfOaJh9vh4DefhAsanrJA15YIeQVPCiANiZoRFZsZ1CrsTUGVr4tsjpAdwKt89REg0lQZIsM4JUTyvUVv/Jt9eKIqVM9weYUgGcJDhJHxEHHafSQ/sddCAq/05RTc5MOT9YXMO+Ch2eR7pshzJV+FadZf \ No newline at end of file diff --git a/tesi/files/ocamlcompilation.png b/tesi/files/ocamlcompilation.png new file mode 100644 index 0000000..3a18d6d Binary files /dev/null and b/tesi/files/ocamlcompilation.png differ diff --git a/tesi/files/ocamlcompilation.svg b/tesi/files/ocamlcompilation.svg new file mode 100644 index 0000000..cd59066 --- /dev/null +++ b/tesi/files/ocamlcompilation.svg @@ -0,0 +1,3 @@ + + +
PARSING
PARSING
source code
source code
TYPE CHECKING
TYPE CHECKING
LAMBDA FORM COMPILATION
LAMBDA FORM COMPILAT...
BYTECODE COMPILER
BYTECODE COMPILER
FLAMBDA OPTIMIZER
FLAMBDA OPTIMIZER
BYTECODE INTERPRETER
BYTECODE INTERPRETER
LINKER AND ASSEMBLER
LINKER AND ASSEMBLER
Untyped AST
Untyped AST
Typed Tree
Typed Tree
Object files
Object fil...
Bytecode
Byteco...
Viewer does not support full SVG 1.1
\ No newline at end of file diff --git a/tesi/tesi.pdf b/tesi/tesi.pdf index 93eb788..c575dad 100644 Binary files a/tesi/tesi.pdf and b/tesi/tesi.pdf differ diff --git a/tesi/tesi_unicode.org b/tesi/tesi_unicode.org index 9f9b7ec..040b22f 100644 --- a/tesi/tesi_unicode.org +++ b/tesi/tesi_unicode.org @@ -20,7 +20,8 @@ - [ ] Statement of correctness - [ ] Proof of correctness - [ ] Practical results -TODO: talk about compiler stuff + +Magari prima pattern matching poi compilatore? \end{comment} @@ -253,35 +254,72 @@ inheritance, subtyping and dynamic binding, and modules, that provide a way to encapsulate definitions. Modules are checked statically and can be reifycated through functors. -** Lambda form compilation -\begin{comment} -https://dev.realworld.org/compiler-backend.html -\end{comment} +** Compiling OCaml code - provides compilation in form of a bytecode executable with an -optionally embeddable interpreter and a native executable that could +The OCaml compiler provides compilation of source files in form of a bytecode executable with an +optionally embeddable interpreter or as a native executable that could be statically linked to provide a single file executable. - +Every source file is treated as a separate /compilation unit/ that is +advanced through different states. +The first stage of compilation is the parsing of the input code that +is trasformed into an untyped syntax tree. Code with syntax errors is +rejected at this stage. +After that the AST is processed by the type checker that performs +three steps at once: +- type inference, using the classical /Algorithm W/ +- perform subtyping and gathers type information from the module system +- ensures that the code obeys the rule of the OCaml type system +At this stage, incorrectly typed code is rejected. In case of success, +the untyped AST in trasformed into a /Typed Tree/. After the typechecker has proven that the program is type safe, the compiler lower the code to /Lambda/, an s-expression based language that assumes that its input has already been proved safe. -On the /Lambda/ representation of the source program, the compiler -performs a series of optimization passes before translating the -lambda form to assembly code. +After the Lambda pass, the Lambda code is either translated into +bytecode or goes through a series of optimization steps performed by +the /Flambda/ optimizer before being translated into assembly. +\begin{comment} +TODO: Talk about flambda passes? +\end{comment} -*** OCaml Native Datatypes - -Most native data types in , such as integers, tuples, lists, -records, can be seen as instances of the following definition +This is an overview of the different compiler steps. +[[./files/ocamlcompilation.png]] +** Memory representation of OCaml values +An usual OCaml source program contains few to none explicit type +signatures. +This is possible because of type inference that allows to annotate the +AST with type informations. However, since the OCaml typechecker guarantes that a program is well typed +before being transformed into Lambda code, values at runtime contains +only a minimal subset of type informations needed to distinguish +polymorphic values. +For runtime values, OCaml uses a uniform memory representation in +which every variable is stored as a value in a contiguous block of +memory. +Every value is a single word that is either a concrete integer or a +pointer to another block of memory, that is called /cell/ or /box/. +We can abstract the type of OCaml runtime values as the following: #+BEGIN_SRC -type t = Nil | One of int | Cons of int * t +type t = Constant | Cell of int * t #+END_SRC -that is a type /t/ with three constructors that define its complete -signature. -Every constructor has an arity. Nil, a constructor of arity 0, is -called a constant constructor. +where a one bit tag is used to distinguish between Constant or Cell. +In particular this bit of metadata is stored as the lowest bit of a +memory block. +Given that all the OCaml target architectures guarantee that all +pointers are divisible by four and that means that two lowest bits are +always 00 storing this bit of metadata at the lowest bit allows an +optimization. Constant values in OCaml, such as integers, empty lists, +Unit values and constructors of arity zero (/constant/ constructors) +are unboxed at runtime while pointers are recognized by the lowest bit +set to 0. + + + +** Lambda form compilation +\begin{comment} +https://dev.realworld.org/compiler-backend.html +CITE: realworldocaml +\end{comment} *** Lambda form types A Lambda form target file produced by the compiler consists of a single s-expression. Every s-expression consist of /(/, a sequence of