Generic Lexer

From Gallium

Jump to: navigation, search

Camlp4 provides a somewhat generic Lexer with an associated Generic Token Type.

This lexer is generic since it only embed lexical conventions, it knows nothing about keywords for instance.

This includes, basic things:

  • LIDENT: lower case identifier.
  • UIDENT: upper case identifier.
  • Integers, and floats:
    • INT: 42, 0xa0, 0XffFFff, 0b1010101, 0O644, 0o644...
    • INT32: 42l, 0xa0l...
    • INT64: 42L, 0xa0L...
    • NATIVEINT: 42n, 0xa0n...
    • FLOAT: 42.5, 1.0, 2.4e32
  • Strings and characters with escaping:
    • STRING: "", "foo", "bar\n", "nl=\n, tab=\t, bs=\\ dq=\", sq=', \010, \xa0"...
    • CHAR: 'a', 'B', '\n', '\, '\n'...

But also advanced, ones:

  • Quotations and Antiquotations:
    • QUOTATION:
      • << foo >>
      • <:quot_name< bar >>
      • <@loc_name< bar >>
      • <:foo@loc_name< bar >>
    • ANTIQUOT:
      • $foo$
      • $anti_name:foo$
      • $`anti_name:foo$
  • Symbols and escaped identifiers:
    • SYMBOL: *, +, +++*%, %#@...
    • ESCAPED_IDENT: ( * ), ( ++##> ), ( foo )
  • LINE_DIRECTIVE: #line 42, #foo "string"...

And also the layout (new in 3.10):

  • NEWLINE: a newline
  • BLANKS: some non newlines blanks
  • COMMENT: a comment (comments can be nested, strings, chars and quotations must be well terminated)
Personal tools
Espace privé