Loading ...
Sorry, an error occurred while loading the content.

RE: [antlr-interest] column-sensitive grammars

Expand Messages
  • Millaway, John
    ... Something like this, maybe? class FooParser extends Parser; line: NAME COLON (NAME|VALUE) EOL ; class FooLexer extends Lexer; // Names are strictly
    Message 1 of 8 , Sep 6, 2000
    • 0 Attachment
      > That works if field name and field value are the same token,
      > however, in my
      > case they are not. The field name has to be more restricted
      > than the value
      > field.
      >

      Something like this, maybe?

      class FooParser extends Parser;
      line:
      NAME COLON (NAME|VALUE) EOL ;

      class FooLexer extends Lexer;
      // Names are strictly uppercase, values are anycase.
      NAME_VALUE:
      (('A'..'Z')+) => ('A'..'Z')+ { $setType(NAME); }
      | ('A'..'Z'|'a'..'z')+ { $setType(VALUE); }
      ;
    • Geoff Hardy
      I found another way to disambiguate. In lexer: NAME: ( a .. z | A .. Z | - | _ )+; VALUE: : ! ( ~( # | r | n ) )* EOL; In parser: field: NAME VALUE; I
      Message 2 of 8 , Sep 6, 2000
      • 0 Attachment
        I found another way to disambiguate. In lexer:

        NAME: ('a'..'z'|'A'..'Z'|'-'|'_')+;

        VALUE: ':'! ( ~('#'|'\r'|'\n') )* EOL;

        In parser:

        field: NAME VALUE;

        I would still like to know how to match the beginning of the line, if there
        is an easy way. Should I keep track of column numbers? What a pain. Is
        there an easier way?

        Thanks.

        Geoff Hardy

        > -----Original Message-----
        > From: Millaway, John [mailto:john@...]
        > Sent: Wednesday, September 06, 2000 6:39 PM
        > To: antlr-interest@egroups.com
        > Subject: RE: [antlr-interest] column-sensitive grammars
        >
        >
        >
        > > That works if field name and field value are the same token,
        > > however, in my
        > > case they are not. The field name has to be more restricted
        > > than the value
        > > field.
        > >
        >
        > Something like this, maybe?
        >
        > class FooParser extends Parser;
        > line:
        > NAME COLON (NAME|VALUE) EOL ;
        >
        > class FooLexer extends Lexer;
        > // Names are strictly uppercase, values are anycase.
        > NAME_VALUE:
        > (('A'..'Z')+) => ('A'..'Z')+ { $setType(NAME); }
        > | ('A'..'Z'|'a'..'z')+ { $setType(VALUE); }
        > ;
        >
        >
        >
        >
      • Ken Lidster
        Geoff, You might try treating a line as a logical unit, with EOL a valid terminating token. You could then multiplex a couple of different lexers, make sure
        Message 3 of 8 , Sep 6, 2000
        • 0 Attachment
          Geoff,

          You might try treating a line as a logical unit, with EOL a valid
          terminating token. You could then multiplex a couple of different lexers,
          make sure that the beginning-of-line lexer is the initial one, switch to the
          other lexer when you get a colon, and reset back to the BOL lexer on the EOL
          token.

          Ken

          > -----Original Message-----
          > From: Geoff Hardy [mailto:ghardy@...]
          > Sent: Wednesday, September 06, 2000 2:52 PM
          > To: antlr-interest@egroups.com
          > Subject: [antlr-interest] column-sensitive grammars
          >
          >
          > -------------------------- eGroups Sponsor
          > -------------------------~-~>
          > GET A NEXTCARD VISA, in 30 seconds! Get rates
          > of 2.9% Intro or 9.9% Ongoing APR* and no annual fee!
          > Apply NOW!
          > http://click.egroups.com/1/9146/11/_/492272/_/968277319/
          > --------------------------------------------------------------
          > -------_->
          >
          > I'm writing a parser for a language similar to MIME. MIME is
          > column-dependent, i.e., fields have to begin at column zero.
          > Since the
          > field name can contain some of the same characters as the
          > field value, it is
          > ambiguous (unless I can force a match to the beginning of a
          > line.) For
          > example,
          >
          > Field-Name: Field-Value
          >
          > Does anyone have any examples of how to build such a parser
          > with ANTLR? Am
          > I crazy for wanting to use ANTLR for this?
          >
          > Thanks.
          > Geoff Hardy
          > ghardy@...
          >
          >
          >
          >
        • Sinan Karasu
          (EOL)* (LINE EOL)+ EOF or ((EOL)= EOL)* (LINE EOL)+ EOF see tiny basic example... SInan
          Message 4 of 8 , Sep 6, 2000
          • 0 Attachment
            (EOL)*
            (LINE EOL)+
            EOF


            or

            ((EOL)=>EOL)*
            (LINE EOL)+
            EOF

            see tiny basic example...

            SInan
          • Geoff Hardy
            Ken, do you have an example of this? Does one of the examples that comes in the distribution demonstrate this method? Thanks for your help. Geoff
            Message 5 of 8 , Sep 7, 2000
            • 0 Attachment
              Ken, do you have an example of this? Does one of the examples that comes in
              the distribution demonstrate this method?

              Thanks for your help.
              Geoff

              > -----Original Message-----
              > From: Ken Lidster [mailto:ken@...]
              > Sent: Wednesday, September 06, 2000 8:32 PM
              > To: 'antlr-interest@egroups.com'
              > Subject: RE: [antlr-interest] column-sensitive grammars
              >
              > Geoff,
              >
              > You might try treating a line as a logical unit, with EOL a valid
              > terminating token. You could then multiplex a couple of different lexers,
              > make sure that the beginning-of-line lexer is the initial one,
              > switch to the
              > other lexer when you get a colon, and reset back to the BOL lexer
              > on the EOL
              > token.
              >
              > Ken
              >
              > > -----Original Message-----
              > > From: Geoff Hardy [mailto:ghardy@...]
              > > Sent: Wednesday, September 06, 2000 2:52 PM
              > > To: antlr-interest@egroups.com
              > > Subject: [antlr-interest] column-sensitive grammars
              > >
              > >
              > > -------------------------- eGroups Sponsor
              > > -------------------------~-~>
              > > GET A NEXTCARD VISA, in 30 seconds! Get rates
              > > of 2.9% Intro or 9.9% Ongoing APR* and no annual fee!
              > > Apply NOW!
              > > http://click.egroups.com/1/9146/11/_/492272/_/968277319/
              > > --------------------------------------------------------------
              > > -------_->
              > >
              > > I'm writing a parser for a language similar to MIME. MIME is
              > > column-dependent, i.e., fields have to begin at column zero.
              > > Since the
              > > field name can contain some of the same characters as the
              > > field value, it is
              > > ambiguous (unless I can force a match to the beginning of a
              > > line.) For
              > > example,
              > >
              > > Field-Name: Field-Value
              > >
              > > Does anyone have any examples of how to build such a parser
              > > with ANTLR? Am
              > > I crazy for wanting to use ANTLR for this?
              > >
              > > Thanks.
              > > Geoff Hardy
              > > ghardy@...
              > >
              > >
              > >
              > >
              >
              >
              >
              >
            • Ken Lidster
              Geoff, I had to solve this problem before there were multiplexed lexers, so I don t have a specific example. What I did back then (a couple of years ago) was
              Message 6 of 8 , Sep 7, 2000
              • 0 Attachment
                Geoff,

                I had to solve this problem before there were multiplexed lexers, so I don't
                have a specific example. What I did "back then" (a couple of years ago) was
                to extend the token class so that it would map an identifier to a different
                set of tokens based on a state. I had to parse a line-based language in
                which the code line

                IF, IF(IF) XCALL IF( )

                is valid and contains a label, statement, variable, and routine identifier,
                all named "IF". By switching states as a line is parsed (which I found a way
                to do within the parser, but the lexer is really the right place for this),
                I was able to parse this code. With regards to an identifier, this is simply
                another form of a multiplexed lexer, and the technique works quite well.

                Ken

                > -----Original Message-----
                > From: Geoff Hardy [mailto:ghardy@...]
                > Sent: Thursday, September 07, 2000 7:07 AM
                > To: antlr-interest@egroups.com
                > Subject: RE: [antlr-interest] column-sensitive grammars
                >
                >
                > -------------------------- eGroups Sponsor
                > -------------------------~-~>
                > GET A NEXTCARD VISA, in 30 seconds! Get rates
                > of 2.9% Intro or 9.9% Ongoing APR* and no annual fee!
                > Apply NOW!
                > http://click.egroups.com/1/9146/11/_/492272/_/968335775/
                > --------------------------------------------------------------
                > -------_->
                >
                > Ken, do you have an example of this? Does one of the
                > examples that comes in
                > the distribution demonstrate this method?
                >
                > Thanks for your help.
                > Geoff
                >
                > > -----Original Message-----
                > > From: Ken Lidster [mailto:ken@...]
                > > Sent: Wednesday, September 06, 2000 8:32 PM
                > > To: 'antlr-interest@egroups.com'
                > > Subject: RE: [antlr-interest] column-sensitive grammars
                > >
                > > Geoff,
                > >
                > > You might try treating a line as a logical unit, with EOL a valid
                > > terminating token. You could then multiplex a couple of
                > different lexers,
                > > make sure that the beginning-of-line lexer is the initial one,
                > > switch to the
                > > other lexer when you get a colon, and reset back to the BOL lexer
                > > on the EOL
                > > token.
                > >
                > > Ken
              Your message has been successfully submitted and would be delivered to recipients shortly.