Friday, February 8, 2008

Flex

flex for lexical analysis

http://flex.sourceforge.net/

http://www.gnu.org/software/flex/manual/

In the rules section, any indented or %{ %} enclosed text appearing before the first rule may be used to declare variables which are local to the scanning routine and (after the declarations) code which is to be executed whenever the scanning routine is entered. Other indented or %{ %} text in the rule section is still copied to the output, but its meaning is not well-defined and it may well cause compile-time errors (this feature is present for POSIX compliance. See Lex and Posix, for other such features).

Any indented text or text enclosed in `%{' and `%}' is copied verbatim to the output (with the %{ and %} symbols removed). The %{ and %} symbols must appear unindented on lines by themselves.

When specifying a lexer, Place the patterns from the more restrictive to the more general.

`\X'
if X is `a', `b', `f', `n', `r', `t', or `v', then the ANSI-C interpretation of `\x'. Otherwise, a literal `X' (used to escape operators such as `*')

Inside of a character class, all regular expression operators lose their special meaning except escape (`\') and the character class operators, `-', `]]', and, at the beginning of the class, `^'.


How do I expand backslash-escape sequences in C-style quoted strings?

A key point when scanning quoted strings is that you cannot (easily) write a single rule that will precisely match the string if you allow things like embedded escape sequences and newlines. If you try to match strings with a single rule then you'll wind up having to rescan the string anyway to find any escape sequences.

Instead you can use exclusive start conditions and a set of rules, one for matching non-escaped text, one for matching a single escape, one for matching an embedded newline, and one for recognizing the end of the string. Each of these rules is then faced with the question of where to put its intermediary results. The best solution is for the rules to append their local value of yytext to the end of a “string literal” buffer. A rule like the escape-matcher will append to the buffer the meaning of the escape sequence rather than the literal text in yytext. In this way, yytext does not need to be modified at all.



No comments: