Loading ...
Sorry, an error occurred while loading the content.

187Re: What would the wily programmer do?

Expand Messages
  • Greg Haverkamp
    Feb 1, 1999
    • 0 Attachment
      Hey, Janet.

      > From: Janet Riley <jriley@...>
      >
      > I aked: >>When does it make sense to break parsing up into separate
      > passes?
      >
      > I have a text file that contains standard HTML commands, custom commands
      > for the Netscape server to handle (delimited by tags), and plain text that
      > people will see when they view the page. I want to run the file through
      > ANTLR and return a tree that contains the text grouped into related
      > chunks. <a href="">stuff</a> would be one group. I want to retain all of
      > the text in the file in my tree.

      I'm still not sure I understand, having read over the description a couple
      of times. What would this file look like?

      > Here's where I got into trouble:
      > --------------------------------------
      > There are three or four different sets of delimiter tags that could be
      > used. Rather than type every rule three or four times, I created rules
      > called command_open and command_close.
      >
      > I found when I expressed this in the lexer that I got ambiguity errors on
      > COMMAND_OPEN 'a' vs. the WORD rule, so I decided to stuff all the reserved
      > words into the parser with literals, like the HTML example grammar.
      > Happy day, it now recognizes all the reserved words. Unfortunately, it
      > does not distinguish between reserved words inside my rule and reserved
      > words occuring in the plain text. So it recognizes
      > command_open "if" WS command_close
      > but also recognizes
      > "if wishes were horses then beggars would ride"
      > and complains that there was an unexpected token IF. I reckon ANTLR made a
      > LITERAL_if token, recognized if, did not find a command_close, and
      > freaked. The HTML example grammar must work because it includes the
      > delimiters in the literal e.g. "<a" .

      I'm a little rusty on my antlr, but this sounds somewhat reminiscent of my
      last antlr
      project.

      Have you tried protecting most of the lexer rules and checking for the
      delimeters and commands with syntactic predicates?

      COMMAND
      : (COMMAND_OPEN AREF)=> ...
      | (COMMAND_OPEN TABLE)=> ...
      ...
      ;


      protected
      AREF
      : 'a'
      ;

      protected
      TABLE
      : "table"
      ;


      > I realized that there are times I want to say WHEN. When inside
      > command_open and command_close, check it against these rules. Case
      > matters. When inside a conditional, which is denoted by '(' and ')', '<'
      > and '>' do not mean command_open and command_close . When NOT in a
      > command, ignore the text. I thought I had done this with the rules, but
      > the literal tokens threw a curveball. Filtering out everything but the
      > commands, as described in the "Filtering Input Streams" part of the lexer
      > documentation, isn't an option because I do want to retain non-commands.

      So, if you know your conditionals, do something like:
      : (COMMAND_OPEN "if")=> COMMAND_OPEN "if" CONDITIONAL_TEXT

      protected
      CONDITIONAL_TEXT
      : ( ALPHA | DIGIT | WS | '<' | '>' )+
      ;


      Of course, most of what I've stuck in here is entirely incomplete, and
      mostly culled from the source for one of my lexers.

      Am I overlooking something obvious?

      Greg
    • Show all 4 messages in this topic