Generic Lexer
From Gallium
Camlp4 provides a somewhat generic Lexer with an associated Generic Token Type.
This lexer is generic since it only embed lexical conventions, it knows nothing about keywords for instance.
This includes, basic things:
- LIDENT: lower case identifier.
- UIDENT: upper case identifier.
- Integers, and floats:
- INT: 42, 0xa0, 0XffFFff, 0b1010101, 0O644, 0o644...
- INT32: 42l, 0xa0l...
- INT64: 42L, 0xa0L...
- NATIVEINT: 42n, 0xa0n...
- FLOAT: 42.5, 1.0, 2.4e32
- Strings and characters with escaping:
- STRING: "", "foo", "bar\n", "nl=\n, tab=\t, bs=\\ dq=\", sq=', \010, \xa0"...
- CHAR: 'a', 'B', '\n', '\, '\n'...
But also advanced, ones:
- Quotations and Antiquotations:
- QUOTATION:
- << foo >>
- <:quot_name< bar >>
- <@loc_name< bar >>
- <:foo@loc_name< bar >>
- ANTIQUOT:
- $foo$
- $anti_name:foo$
- $`anti_name:foo$
- QUOTATION:
- Symbols and escaped identifiers:
- SYMBOL: *, +, +++*%, %#@...
- ESCAPED_IDENT: ( * ), ( ++##> ), ( foo )
- LINE_DIRECTIVE: #line 42, #foo "string"...
And also the layout (new in 3.10):
- NEWLINE: a newline
- BLANKS: some non newlines blanks
- COMMENT: a comment (comments can be nested, strings, chars and quotations must be well terminated)