Loading ...
Sorry, an error occurred while loading the content.

Re: What would the wily programmer do?

Expand Messages
  • Sean Fausett
    I have a similar problem which I solved in flex by using rules with state prefixes. It would be nice if ANTLR had a similar feature. Maybe if you could write
    Message 1 of 4 , Feb 1, 1999
    • 0 Attachment
      I have a similar problem which I solved in flex by using rules with
      'state'
      prefixes. It would be nice if ANTLR had a similar feature.

      Maybe if you could write separate lexer grammars and switch them on the
      fly...

      Cheers, Sean.

      -----Original Message-----
      From: Janet Riley [mailto:jriley@...]
      Sent: Saturday, 30 January 1999 04:28
      To: antlr-interest@onelist.com
      Subject: [antlr-interest] Re: What would the wily programmer do?


      From: Janet Riley <jriley@...>

      [...]
      I realized that there are times I want to say WHEN. When inside
      command_open and command_close, check it against these rules. Case
      matters. When inside a conditional, which is denoted by '(' and ')',
      '<'
      and '>' do not mean command_open and command_close . When NOT in a
      command, ignore the text. I thought I had done this with the rules, but
      the literal tokens threw a curveball. Filtering out everything but the
      commands, as described in the "Filtering Input Streams" part of the
      lexer
      documentation, isn't an option because I do want to retain
      non-commands.
      [...]
    • Greg Haverkamp
      Hey, Janet. ... I m still not sure I understand, having read over the description a couple of times. What would this file look like? ... I m a little rusty on
      Message 2 of 4 , Feb 1, 1999
      • 0 Attachment
        Hey, Janet.

        > From: Janet Riley <jriley@...>
        >
        > I aked: >>When does it make sense to break parsing up into separate
        > passes?
        >
        > I have a text file that contains standard HTML commands, custom commands
        > for the Netscape server to handle (delimited by tags), and plain text that
        > people will see when they view the page. I want to run the file through
        > ANTLR and return a tree that contains the text grouped into related
        > chunks. <a href="">stuff</a> would be one group. I want to retain all of
        > the text in the file in my tree.

        I'm still not sure I understand, having read over the description a couple
        of times. What would this file look like?

        > Here's where I got into trouble:
        > --------------------------------------
        > There are three or four different sets of delimiter tags that could be
        > used. Rather than type every rule three or four times, I created rules
        > called command_open and command_close.
        >
        > I found when I expressed this in the lexer that I got ambiguity errors on
        > COMMAND_OPEN 'a' vs. the WORD rule, so I decided to stuff all the reserved
        > words into the parser with literals, like the HTML example grammar.
        > Happy day, it now recognizes all the reserved words. Unfortunately, it
        > does not distinguish between reserved words inside my rule and reserved
        > words occuring in the plain text. So it recognizes
        > command_open "if" WS command_close
        > but also recognizes
        > "if wishes were horses then beggars would ride"
        > and complains that there was an unexpected token IF. I reckon ANTLR made a
        > LITERAL_if token, recognized if, did not find a command_close, and
        > freaked. The HTML example grammar must work because it includes the
        > delimiters in the literal e.g. "<a" .

        I'm a little rusty on my antlr, but this sounds somewhat reminiscent of my
        last antlr
        project.

        Have you tried protecting most of the lexer rules and checking for the
        delimeters and commands with syntactic predicates?

        COMMAND
        : (COMMAND_OPEN AREF)=> ...
        | (COMMAND_OPEN TABLE)=> ...
        ...
        ;


        protected
        AREF
        : 'a'
        ;

        protected
        TABLE
        : "table"
        ;


        > I realized that there are times I want to say WHEN. When inside
        > command_open and command_close, check it against these rules. Case
        > matters. When inside a conditional, which is denoted by '(' and ')', '<'
        > and '>' do not mean command_open and command_close . When NOT in a
        > command, ignore the text. I thought I had done this with the rules, but
        > the literal tokens threw a curveball. Filtering out everything but the
        > commands, as described in the "Filtering Input Streams" part of the lexer
        > documentation, isn't an option because I do want to retain non-commands.

        So, if you know your conditionals, do something like:
        : (COMMAND_OPEN "if")=> COMMAND_OPEN "if" CONDITIONAL_TEXT

        protected
        CONDITIONAL_TEXT
        : ( ALPHA | DIGIT | WS | '<' | '>' )+
        ;


        Of course, most of what I've stuck in here is entirely incomplete, and
        mostly culled from the source for one of my lexers.

        Am I overlooking something obvious?

        Greg
      Your message has been successfully submitted and would be delivered to recipients shortly.