OCaml code generation tutorial
From Gallium
This page deals with OCaml code generation in Camlp4. It is recommended that you read the Syntax extension tutorial first (if this is not already done).
Contents |
Aim
What we want to do here is generate OCaml code from a DSL (Domain Specific Language) that allows easy manipulation of vectors using the following syntax:
- 42 is the scalar 42.
- [1, 2., 3] is the list [ 1. ; 2. ; 3. ]
- a + b is the sum of the two vectors a and b.
- a * b is the scalar multiplication of the vector b by the scalar a.
First step: a syntax extension
Just like in the Syntax extension tutorial, we will start from an empty grammar, and use original syntax. Another tutorial will deal with extending OCaml syntax.
So we start with the following syntax extension:
Header
open Camlp4.PreCast
type vec =
| Scalar of string
| Vector of string list
| Sum of vec * vec
| ScalarProduct of vec * vec
let expression = Gram.Entry.mk "expression"
We define our vector type as being either a scalar, a vector, a sum of two vectors, or a scalar product. As before, we have our global expression entry "expression".
EXTEND statement
EXTEND Gram
GLOBAL: expression;
expression:
[ "sum" LEFTA
[ x = SELF; "+"; y = SELF -> Sum (x, y) ]
| "scalar" LEFTA
[ x = SELF; "*"; y = SELF -> ScalarProduct (x, y) ]
| "simple" NONA
[ "("; e = SELF; ")" -> e
| s = scalar -> Scalar s
| v = vector -> v ] ];
scalar:
[ [ `INT (i, _) -> string_of_float (float i)
| `FLOAT (_, f) -> f ] ];
vector:
[ [ "["; v = LIST1 [ s = scalar -> s ] SEP ","; "]" -> Vector v ] ];
END
In addition to what we've seen in the Syntax extension tutorial, we have `FLOAT (whose meaning should not be too hard to guess), and LIST1. LIST1 will try to build a (non-empty) list of "scalar"s separated by ",". LIST0 does the same, but allows an empty list as well.
Please note that `INT and `FLOAT have two arguments: the first one is the interpreted content (an "int" for `INT and a "float" for `FLOAT), and the second one is the un-interpreted content, as a string. The string is useful when you want to keep the number the same way you entered it: "3.0000" will be printed "3.0000", not "3.". But here, since we convert integers to float, we have a mixed solution.
Defining the location
let _loc = Loc.mk "<string>"
A location is necessary in order to build syntactic trees. In our small example we don't need to have several locations so we create one named "_loc" as a function. When calling a tree builder function without giving a location in parameter, "_loc" will be automatically given.
Parsing
let parse_and_generate_code str =
let e = Gram.parse_string expression _loc str in
(* generate_code *) e
We will define the "generate_code" function in the second part of this tutorial. Then you will need to remove the comment. Note that we have to explicitly give the location in parameter of the "Gram.parse_string" function.
Main
let _ =
print_string "# ";
let str = read_line () in
let e = parse_and_generate_code str in
ignore (e)
This main will be modified in order to display the syntactic tree that we are going to build.
Ok, now let's move on!
Second step: generating code from our DSL AST
Here comes the interesting things. Once we have defined our grammar and our parser we can start generating code. Our parser returns a structure of imbricated vec so we just have to filter this structure and create the AST while doing this.
Let's complete our main function and then implement the "generate_code" function.
The main branch
We want the code that we are going to generate to be executable in byte code so we will create a "let _" function. In the quotation syntax (see Abstract Syntax Tree) the top level items such as functions, modules, types, classes, etc... are represented by a <:str_item < .. >> quotation. We can put whatever we want in place of the dots : other AST branchs (between $) or text that does not depend on the parsed expression.
Note that if you want to create several top level items in your generated code you will have to make a list of <:str_item< .. >> quotations.
So here is our completed main :
let _ =
print_string "# ";
let str = read_line () in
let e = parse_and_generate_code str in
let ast_e = <:expr< $e$ >> in
Camlp4.PreCast.Printers.OCaml.print_implem <:str_item< let _ = let res = $ast_e$ in print_float res >>
Let's detail this a little :
- We first create a "ast_e" variable which will contain the AST generated by the "generate_code" function (the AST of our parsed expression)
- The "Camlp4.PreCast.Printers.OCaml.print_implem" function allows to print AST contained in a <:str_item< .. >> quotation.
- In our <:str_item< .. >> quotation we create the "let _" main function. We also create a "res" variable whose expression will be the content of our "ast_e" variable, i.e. the AST of our parsed expression.
The generate_code function
This function must obviously be defined before the "parse_and_generate_code" function which calls it in order to avoid the "unbound variable" error.
Since we only parse expressions, we will only need the <:expr< .. >> quotation.
let rec generate_code = function
| Scalar s -> <:expr< $flo:s$ >>
$flo:s$ means the value of "s" as float
| Vector vlist -> List.fold_right
(fun x l -> <:expr< $flo:x$ :: $l$ >>)
vlist
<:expr< [] >>
This creates the expression of the float list representing the vector. The way of creating an AST representing a list is a little complicated. I used the warning found in the Ocaml documentation http://caml.inria.fr/pub/old_caml_site/camlp4/manual/manual010.html#toc36
| Sum (v1, v2) -> let ast_v1 = generate_code v1
and ast_v2 = generate_code v2
in
(match (v1, v2) with
| Scalar _, Scalar _ -> <:expr< ( +. ) $ast_v1$ $ast_v2$ >>
| _, _ -> <:expr< List.map2 ( +. ) $ast_v1$ $ast_v2$ >>)
We recursively generate the AST for the Sum expression parameters and we check if we want to sum 2 scalars or 2 things asserted to be vectors. In the first case we generate the code for a simple addition whereas in the other case we use the "List.map2" function in order to add the 2 vectors. Remark : if we want to sum a scalar and a vector the code will be generated with no error but it will crash when compiling ("this function is float but is here used with float list message)
| ScalarProduct (v1, v2) ->let ast_v1 = generate_code v1
and ast_v2 = generate_code v2
in
(match (v1, v2) with
| Scalar _, Scalar _ -> <:expr< ( *. ) $ast_v1$ $ast_v2$ >>
| Scalar _, _ -> <:expr< List.map ( fun a -> a *. $ast_v1$ ) $ast_v2$ >>
| _, Scalar _ -> <:expr< List.map ( fun a -> a *. $ast_v2$ ) $ast_v1$ >>
| _, _ -> <:expr< List.fold_right
( +. )
(List.map2 ( *. ) $ast_v1$ $ast_v2$ )
0. >>)
This will do nearly the same thing as above for ScalarProduct expressions.
Compiling and testing
Since we use original syntax, it is camlp4of:
ocamlc -I +camlp4 -pp camlp4of.opt camlp4lib.cma pa_vector.ml
camlp4 -parser pa_vector.cmo
Just type in the expression you want to parse after the "#".
To go further
See Abstract Syntax Tree for a complete list of quotations, their signification and equivalent AST node.