Friday, April 18, 2008

Bison: Actions in Mid-Rule

http://uw714doc.sco.com/cgi-bin/info2html?(bison.info)Mid-Rule%2520Actions&lang=en

Actions in Mid-Rule
-------------------

Occasionally it is useful to put an action in the middle of a rule.
These actions are written just like usual end-of-rule actions, but they
are executed before the parser even recognizes the following components.

A mid-rule action may refer to the components preceding it using
`$N', but it may not refer to subsequent components because it is run
before they are parsed.

The mid-rule action itself counts as one of the components of the
rule. This makes a difference when there is another action later in
the same rule (and usually there is another at the end): you have to
count the actions along with the symbols when working out which number
N to use in `$N'.

The mid-rule action can also have a semantic value. The action can
set its value with an assignment to `$$', and actions later in the rule
can refer to the value using `$N'. Since there is no symbol to name
the action, there is no way to declare a data type for the value in
advance, so you must use the `$<...>' construct to specify a data type
each time you refer to this value.

There is no way to set the value of the entire rule with a mid-rule
action, because assignments to `$$' do not have that effect. The only
way to set the value for the entire rule is with an ordinary action at
the end of the rule.

Here is an example from a hypothetical compiler, handling a `let'
statement that looks like `let (VARIABLE) STATEMENT' and serves to
create a variable named VARIABLE temporarily for the duration of
STATEMENT. To parse this construct, we must put VARIABLE into the
symbol table while STATEMENT is parsed, then remove it afterward. Here
is how it is done:

stmt: LET '(' var ')'
{ $$ = push_context ();
declare_variable ($3); }
stmt { $$ = $6;
pop_context ($5); }

As soon as `let (VARIABLE)' has been recognized, the first action is
run. It saves a copy of the current semantic context (the list of
accessible variables) as its semantic value, using alternative
`context' in the data-type union. Then it calls `declare_variable' to
add the new variable to that list. Once the first action is finished,
the embedded statement `stmt' can be parsed. Note that the mid-rule
action is component number 5, so the `stmt' is component number 6.

After the embedded statement is parsed, its semantic value becomes
the value of the entire `let'-statement. Then the semantic value from
the earlier action is used to restore the prior list of variables. This
removes the temporary `let'-variable from the list so that it won't
appear to exist while the rest of the program is parsed.

Taking action before a rule is completely recognized often leads to
conflicts since the parser must commit to a parse in order to execute
the action. For example, the following two rules, without mid-rule
actions, can coexist in a working parser because the parser can shift
the open-brace token and look at what follows before deciding whether
there is a declaration or not:

compound: '{' declarations statements '}'
| '{' statements '}'
;

But when we add a mid-rule action as follows, the rules become
nonfunctional:

compound: { prepare_for_local_variables (); }
'{' declarations statements '}'
| '{' statements '}'
;

Now the parser is forced to decide whether to run the mid-rule action
when it has read no farther than the open-brace. In other words, it
must commit to using one rule or the other, without sufficient
information to do it correctly. (The open-brace token is what is called
the "look-ahead" token at this time, since the parser is still deciding
what to do about it. Look-Ahead Tokens Look-Ahead.)

You might think that you could correct the problem by putting
identical actions into the two rules, like this:

compound: { prepare_for_local_variables (); }
'{' declarations statements '}'
| { prepare_for_local_variables (); }
'{' statements '}'
;

But this does not help, because Bison does not realize that the two
actions are identical. (Bison never tries to understand the C code in
an action.)

If the grammar is such that a declaration can be distinguished from a
statement by the first token (which is true in C), then one solution
which does work is to put the action after the open-brace, like this:

compound: '{' { prepare_for_local_variables (); }
declarations statements '}'
| '{' statements '}'
;

Now the first token of the following declaration or statement, which
would in any case tell Bison which rule to use, can still do so.

Another solution is to bury the action inside a nonterminal symbol
which serves as a subroutine:

subroutine: /* empty */
{ prepare_for_local_variables (); }
;

compound: subroutine
'{' declarations statements '}'
| subroutine
'{' statements '}'
;

Now Bison can execute the action in the rule for `subroutine' without
deciding which rule for `compound' it will eventually use. Note that
the action is now at the end of its rule. Any mid-rule action can be
converted to an end-of-rule action in this way, and this is what Bison
actually does to implement mid-rule actions.

No comments: