187Re: What would the wily programmer do?
- Feb 1, 1999Hey, Janet.
> From: Janet Riley <jriley@...>I'm still not sure I understand, having read over the description a couple
> I aked: >>When does it make sense to break parsing up into separate
> I have a text file that contains standard HTML commands, custom commands
> for the Netscape server to handle (delimited by tags), and plain text that
> people will see when they view the page. I want to run the file through
> ANTLR and return a tree that contains the text grouped into related
> chunks. <a href="">stuff</a> would be one group. I want to retain all of
> the text in the file in my tree.
of times. What would this file look like?
> Here's where I got into trouble:I'm a little rusty on my antlr, but this sounds somewhat reminiscent of my
> There are three or four different sets of delimiter tags that could be
> used. Rather than type every rule three or four times, I created rules
> called command_open and command_close.
> I found when I expressed this in the lexer that I got ambiguity errors on
> COMMAND_OPEN 'a' vs. the WORD rule, so I decided to stuff all the reserved
> words into the parser with literals, like the HTML example grammar.
> Happy day, it now recognizes all the reserved words. Unfortunately, it
> does not distinguish between reserved words inside my rule and reserved
> words occuring in the plain text. So it recognizes
> command_open "if" WS command_close
> but also recognizes
> "if wishes were horses then beggars would ride"
> and complains that there was an unexpected token IF. I reckon ANTLR made a
> LITERAL_if token, recognized if, did not find a command_close, and
> freaked. The HTML example grammar must work because it includes the
> delimiters in the literal e.g. "<a" .
Have you tried protecting most of the lexer rules and checking for the
delimeters and commands with syntactic predicates?
: (COMMAND_OPEN AREF)=> ...
| (COMMAND_OPEN TABLE)=> ...
> I realized that there are times I want to say WHEN. When insideSo, if you know your conditionals, do something like:
> command_open and command_close, check it against these rules. Case
> matters. When inside a conditional, which is denoted by '(' and ')', '<'
> and '>' do not mean command_open and command_close . When NOT in a
> command, ignore the text. I thought I had done this with the rules, but
> the literal tokens threw a curveball. Filtering out everything but the
> commands, as described in the "Filtering Input Streams" part of the lexer
> documentation, isn't an option because I do want to retain non-commands.
: (COMMAND_OPEN "if")=> COMMAND_OPEN "if" CONDITIONAL_TEXT
: ( ALPHA | DIGIT | WS | '<' | '>' )+
Of course, most of what I've stuck in here is entirely incomplete, and
mostly culled from the source for one of my lexers.
Am I overlooking something obvious?
- << Previous post in topic