Quotation

From Gallium

Jump to: navigation, search

Contents

A comprehensive story of quotations

Quotations are often considered as a too complicated feature of Camlp4. This section will try to show you the opposite: quotations are not more complicated than literal strings with expansion.

Definition: quotations are delimiters around pieces of foreign syntax.

This is often used to extend a host language with an embedded sub-language, without having to change the lexer or to fear introducing new conflicts in the grammar of the host language.

Examples of quotations

String literals can be seen as a sugar upon an array of characters. Even if that's not the case in OCaml it can be useful to get the idea.

"Hello\nWorld!" -> [|'H';'e';'l';'l';'o';'\n';'W';'o';'r';'l';'d';'!'|]

Here delimiters are the double quotes characters, and the inner language (foreign,embedded...) is made of characters and meta-characters like the newline character.

More examples

 (* An SQL query *)
 <:sql< SELECT name,firstname FORM people WHERE favorite_language='OCaml' >>
 
 (* A regular expression *)
 <:re< ('a' | 'b')* 'c' 'd'+ >>
 
 (* A C statement *)
 <:Cstm< while (x > 0) --x >>
 
 (* An OCaml expression *)
 <:expr< List.map (fun x -> x + 1) xs >>
 
 (* An simpler OCaml expression with it's translation *)
 <:expr< f x >> ->
   Ast.ExApp(_loc,
     Ast.ExId(_loc,
       Ast.IdLid(_loc,"f")),
     Ast.ExId(_loc,
       Ast.IdLid(_loc,"x")))

In all these last example, delimiters follow the same pattern. However the contents is fairly different. In order to do this the host language can provide named delimiters that will be expansed by a custom function based on the name of delimiters.

Again quotations are just syntax, after expansion there is no quotation anymore. The quotation expander have to return the AST to put instead of the expansed quotation.

Informal Camlp4 syntax of quotations

In Camlp4, the standard lexer recognizes the following informal syntax:

 quotation ::=
      | <<contents>>               (* no name, the simplest form *)
      | <:name<contents>>          (* a name and a contents      *)
      | <:name@loc_name<contents>> (* plus a location name       *)
      | <@loc_name<contents>>      (* just a location name       *)
 
 contents ::=
      | plain-character contents
      | quotation contents
      | (* empty *)

Examples:

 <<foo>>
 <:foo<bar>>
 <@loc<bar>>
 <:foo@loc<bar>>

Analogy with strings: one can consider literal strings as a particular quotation, with "..." as a notation instead something like <:string<...>>.

The standard lexer produces a single token for such a quotation QUOTATION which takes an argument of type Camlp4.Sig.quotation. The lexer assumes that quotation marks are well-balanced, this means that quotations can be nested, even if the a nested quotation will be given by the lexer as a single token that will be perhaps treated as containing quotations itself. From the point of view of the parser, a quotation is simply a specific kind of token that holds a string (together with a name for the quotation). The parser is free to do anything with such a token.

Often, when Camlp4 is used as a parser generator for a structured language, the abstract syntax tree of the parsed language does not include nodes for quotations. Instead, the parser looks at the content of quotations and produces "normal" nodes in the AST. Usually, this involves parsing the content of the quotation maybe, but not necessarily, with the same grammar as for the host language.

Antiquotations

In many cases, we want to allow the content of a quotation to make reference to objects from the host language. Again, the standard lexer in Camlp4 provides some generic support: the so-called antiquotations. Actually, antiquotations are very similar to quotations: they are just named delimiters that produces a token holding their content as a string. The syntax recognized by the lexer is:

 antiquotation ::=
      | $anti_contents$        (* simplest antiquotation, unnamed *)
      | $name:anti_contents$   (* named antiquotation *)
 
 anti_contents ::=
      | plain-character anti_contents
      | quotation anti_contents
      | (* empty *)

Hence antiquotations are not nested, but can contains quotations, even if it's only for balancing purposes.

Generally, antiquotations should be rejected from the host language and only recognized by the grammar for the sub-language. But this is purely a matter of convention, and Camlp4 parsers built on top of the standard lexer can just do whatever they want with antiquotation tokens.

Analogy with strings with expansion:

The constructions "..." and #{...} of strings with expansion define one particular quotation language.

Quotations and the OCaml syntax

When Camlp4 is used to parse OCaml syntax (revised or original, maybe extended with some syntax extensions), it will recognize quotations in the source files in position of OCaml expressions, patterns, types, ... (See Syntactic Category for an exhaustive list).

Example:

 1 + <:foo<...>>;;             (* here the quotation is an expression *)
 function <:bar<...>> -> ...;; (* here it's a pattern *)
 type t = <:moo<...>>;;        (* here it's a type *)

Of course, the OCaml AST doesn't allow quotations, so the Camlp4 grammar for OCaml will need to expand those quotations to pieces of OCaml AST. To achieve that, it lets syntax extensions register so-called Quotation Expanders for each kind of Syntactic Category. For instance, a "pattern quotation expander" is an arbitrary function that maps a string to a pattern in the internal Camlp4 representation of the OCaml AST. A Quotation Expander is bound to a name, and the Camlp4 grammar for OCaml will use the quotation name and the current Syntactic Category to select the correct Quotation Expander. For instance, if you register an expression quotation expander named "foo", Camlp4 will be able to parse OCaml sources with quotations like <:foo<...>> in position of an OCaml expression (and the ... will be passed to your Quotation Expander, which is then expected to produce a value of type Ast.expr, that is, the Camlp4 AST representation of an OCaml expression).

Example:

 val my_quotation_expander : string -> Ast.expr

After defining a Quotation Expander, one can register it to treat quotations of a given name.

 (* Merely the type of the registering function *)
 val add_quotation_expander : string -> 'a tag -> (string -> 'a) -> unit;;
 (* This registers an expander for <:foo<...>> in position of expressions. *)
 add_quotation_expander "foo" expr_tag my_quotation_expander;;

In versions <=3.09 of Camlp4, there was one kind of expanders that should be able to expand expressions and patterns. The version 3.10 lifts this short-coming by letting the user give an expander for any Syntactic Category (types, signature items, structure items...).

The default quotation

There is a default quotation name, that can be chosen by the programmer in order to use a shorter syntax. If we setup the default quotation name to "foo", <<bar>> is equivalent to <:foo<bar>>.

The default quotation name can be setup with the expander itself:

 Camlp4.PreCast.Syntax.Quotation.default.val := "foo";;

Or in the client code:

 #default_quotation "foo";;

Quotations: a concrete syntax for an object language

Quotations are ideal when dealing with DSLs, like SQL, HTML, XML... But also when working with programming languages in compiler for instance, and more specifically when doing Program Transformation.

To be clear, the most important place where we want to put quotations is in expressions.

In exprs, quots are expanded to ctors...

In exprs, antis means...

Let's say that we have big template of code, where the major part is constant but still holds some dynamic parts.

Example:

 let instanciate x y z =
   print_expr <<
     let foo a b c = ... in
     let bar a =
       match a with
       | ... -> $x$
       | ...
     in
     (foo $y$, bar $z$)
   >>
 ;;
 instanciate
   (parse_expr argv.(1))
   (parse_expr argv.(2))
   (parse_expr argv.(3))
 ;;

Then comes the need to de-construct terms, this comes very quickly in the case of doing Program Transformation. Practically this leads to defining Quotation Expanders for quotations in position of patterns.


Locations

The programmer of a Quotation Expander probably wants to insert meaningful location information in generated AST nodes.

In the previous version of camlp4 the programmer had to consider that the parsing starts at the beginning of the to be expanded Quotation. And then the camlp4 engine had to shift all location information, to start at the current point in the whole file. This leads to many bugs about locations in Camlp4, so it has been decided that all parsing functions will receive a location information about the current position.

Conversely the expander will receive the current position in order to give to such a parsing function.

Location names

Real-life programming languages, have a representation (for expressions, patterns...) that hold location information that precisely indicate the position of the given construct in the input source file given by the programmer. Then when building programs using concrete syntax one needs a way to transmit the location information. In camlp4 one gives just one location for the whole quotation including it's sub-terms. Historically this is done by generating occurrences of the free variable _loc (it was just loc before the warning about unused variables).

Example:

 let _loc = ... some location information ... in
 << some term that will be placed at the location given by _loc >>

When expanding a quotation in a position of pattern, we could have the possibility of generating binding variables for capturing locations. However by producing more than one occurence of _loc one produces wrong code.

Example:

 << f x >> (* This AST is clearly not atomic, and producing more than one _loc will be wrong *)

One can argue that producing at most one _loc (the topmost for instance) would solve the problem. In fact that's not the case:

match (a,b) with
| (<< f x >>, << g y >>) -> ...
   (* here even if each pattern produce at most one _loc, the final pattern have two of them *)
| ...

In camlp4 <= 3.09, the choice was to produce wildcards (denoted by underscore in OCaml). This was not satisfactory so this limitation has been lifted in 3.10 by introducing location names.

Given a quotation like <:expr@loc<...>>. The author of the Quotation Expander receive the location name and can choose to produce at most one variable with that name. Then the above example become:

match (a,b) with
| (<@loc1< f x >>, <@loc2< g y >>) -> ...
    (* no problem here *)
| ...

The actual Camlp4 set of Quotation Expanders use the following rule:

  • When no location name is given: Produce wildcards as before.
  • When a name is given: Use that name for the topmost node.

Then these names become also very useful in position of expression, since we start having more than one location variable.

Don't be confused about locations in your expander since there is two levels of locations, there is the current position information that can be used to produce OCaml AST nodes with meaningful position information. But there is also the location variable that will hold a position information (perhaps of a different type if we treat another language), that is just for that level an identifier to put as a free variable when expanding expressions and in a binding position when expanding patterns.

Loc me here!

To push things a little further, let's talk about code generation. When doing code generation that is not easily linked to an input source file, one are little embarrassed about locations.

Before 3.10 two approaches have been used:

However while generating code, it makes sense to annotate generated nodes with the position in the generator itself. That way when some wrong code is generated, one can also look at the generation site.

Doing so is now feasible in Camlp4 (from 3.10). This is accomplished by using the special location variable 'here'.

Example:

 <:expr@here< f x >>

Quotation Expanders in the real life

The example: Lambda calculus quotations, gives a very concise example of registering a Quotation Expander.

Personal tools
Espace privé