Loading ...
Sorry, an error occurred while loading the content.

ANTLR 3.0 status: got nongreedy loops going

Expand Messages
  • Terence Parr
    Howdy, Spent 3 days thinking this week and one hour coding to get nongreedy loops going properly. ANTLR lexers are much easier to specify now. For example,
    Message 1 of 4 , Jul 31, 2004
    • 0 Attachment
      Howdy,

      Spent 3 days thinking this week and one hour coding to get nongreedy
      loops going properly. ANTLR lexers are much easier to specify now.
      For example, here is a sample grammar I'm working with:

      lexer grammar L;

      IF : "if" ;
      ID : ('a'..'z')+ ;
      WS : (' '|'\n')+ ;
      CMT : "/*" ( greedy=false : . )* "*/" ;

      It properly deals with IF vs ID and it handles the CMT rule properly.
      It stops when reading when it sees "*/". Here is the test example:

      java TestLexer "bbd if /* * / ** foo */ abc"

      [bbd/65538,0:0]
      [ /65539,0:0]
      [if/65536,0:0]
      [ /65539,0:0]
      [/* * / ** foo *//65540,0:0]
      [ /65539,0:0]
      [abc/65538,0:0]

      TestLexer is just a loop that prints out Token objects.

      IntegerStream charStream = new ANTLRStringStream(args[0]);
      L lexer = new L(charStream);
      Token t = lexer.nextToken();
      while ( t.getType()!= IntegerStream.EOF ) {
      System.out.println(t.toString());
      t = lexer.nextToken();
      }

      I feel confident that soon I'll be able to handle the Java grammar. :)

      BTW, org.antlr.runtime.* is only 370 lines of code so far. :)

      runtime/ANTLRStringStream.java
      runtime/CommonToken.java
      runtime/CommonTokenStream.java
      runtime/DFA.java
      runtime/IntegerStream.java
      runtime/Lexer.java
      runtime/Parser.java
      runtime/Token.java
      runtime/TokenSource.java
      runtime/TokenStream.java

      L8R,
      Ter
      --
      CS Professor & Grad Director, University of San Francisco
      Creator, ANTLR Parser Generator, http://www.antlr.org
      Cofounder, http://www.jguru.com
      Cofounder, http://www.knowspam.net enjoy email again!
      Cofounder, http://www.peerscope.com pure link sharing
    • thrutchy
      A few things about greediness: - how about supporting regex-like ? modifier (greedy=false): CMT : /* ( . )*? */ ; - in antlr 2.7.4 warnings and docs, it
      Message 2 of 4 , Jul 31, 2004
      • 0 Attachment
        A few things about greediness:

        - how about supporting regex-like "?" modifier (greedy=false):

        CMT : "/*" ( . )*? "*/" ;

        - in antlr 2.7.4 warnings and docs, it says greediness doesn't make
        sense with the optional ()? construct, but it does (regex's use ??).
        greedy=true would be the same as warnWhenFollowAbiguous=false, and
        non-greedy would be the same as not matching the optional construct if
        it matches what's next.

        - also, the java regex "+" greedy=true modifier would be nice:

        "if" LPAREN expr RPAREN stat ( "else" stat )?+

        I find the majority of times I use options it can be boiled down to
        greediness, so the regex shorthands would be nice.

        Eric


        --- In antlr-interest@yahoogroups.com, Terence Parr <parrt@c...> wrote:
        > Howdy,
        >
        > Spent 3 days thinking this week and one hour coding to get nongreedy
        > loops going properly. ANTLR lexers are much easier to specify now.
        > For example, here is a sample grammar I'm working with:
        >
        > lexer grammar L;
        >
        > IF : "if" ;
        > ID : ('a'..'z')+ ;
        > WS : (' '|'\n')+ ;
        > CMT : "/*" ( greedy=false : . )* "*/" ;
        >
        > It properly deals with IF vs ID and it handles the CMT rule properly.
        > It stops when reading when it sees "*/". Here is the test example:
        >
        > java TestLexer "bbd if /* * / ** foo */ abc"
        >
        > [bbd/65538,0:0]
        > [ /65539,0:0]
        > [if/65536,0:0]
        > [ /65539,0:0]
        > [/* * / ** foo *//65540,0:0]
        > [ /65539,0:0]
        > [abc/65538,0:0]
        >
        > TestLexer is just a loop that prints out Token objects.
        >
        > IntegerStream charStream = new ANTLRStringStream(args[0]);
        > L lexer = new L(charStream);
        > Token t = lexer.nextToken();
        > while ( t.getType()!= IntegerStream.EOF ) {
        > System.out.println(t.toString());
        > t = lexer.nextToken();
        > }
        >
        > I feel confident that soon I'll be able to handle the Java grammar. :)
        >
        > BTW, org.antlr.runtime.* is only 370 lines of code so far. :)
        >
        > runtime/ANTLRStringStream.java
        > runtime/CommonToken.java
        > runtime/CommonTokenStream.java
        > runtime/DFA.java
        > runtime/IntegerStream.java
        > runtime/Lexer.java
        > runtime/Parser.java
        > runtime/Token.java
        > runtime/TokenSource.java
        > runtime/TokenStream.java
        >
        > L8R,
        > Ter
        > --
        > CS Professor & Grad Director, University of San Francisco
        > Creator, ANTLR Parser Generator, http://www.antlr.org
        > Cofounder, http://www.jguru.com
        > Cofounder, http://www.knowspam.net enjoy email again!
        > Cofounder, http://www.peerscope.com pure link sharing
      • Terence Parr
        ... you mean instead of the verbose greedy=false? Well, ? means optional in my meta-language. ... Greedy is more about finding longest match in normal regexs,
        Message 3 of 4 , Aug 1, 2004
        • 0 Attachment
          On Jul 31, 2004, at 7:22 PM, thrutchy wrote:

          > A few things about greediness:
          >
          > - how about supporting regex-like "?" modifier (greedy=false):
          >
          > CMT : "/*" ( . )*? "*/" ;

          you mean instead of the verbose greedy=false? Well, ? means optional
          in my meta-language.

          > - in antlr 2.7.4 warnings and docs, it says greediness doesn't make
          > sense with the optional ()? construct, but it does (regex's use ??).

          Greedy is more about finding longest match in normal regexs, but it
          requires backtracking. I use it to indicate how to resolve a
          deterministic lookahead decision.

          > greedy=true would be the same as warnWhenFollowAbiguous=false, and
          > non-greedy would be the same as not matching the optional construct if
          > it matches what's next.

          I'm hoping that there will just generally be less bitching from antlr
          unless there is something that really needs your attention :)

          > - also, the java regex "+" greedy=true modifier would be nice:
          >
          > "if" LPAREN expr RPAREN stat ( "else" stat )?+

          That means to match greedily (the default) and shut up about it right?

          > I find the majority of times I use options it can be boiled down to
          > greediness, so the regex shorthands would be nice.

          I see... Well, i'm really tempted to simply turn off warning emanating
          from subrules vs "what follows". It will always default to greedy
          anyway, so might as well just ignore the warning.

          Ter

          >
          > Eric
          >
          >
          > --- In antlr-interest@yahoogroups.com, Terence Parr <parrt@c...> wrote:
          >> Howdy,
          >>
          >> Spent 3 days thinking this week and one hour coding to get nongreedy
          >> loops going properly. ANTLR lexers are much easier to specify now.
          >> For example, here is a sample grammar I'm working with:
          >>
          >> lexer grammar L;
          >>
          >> IF : "if" ;
          >> ID : ('a'..'z')+ ;
          >> WS : (' '|'\n')+ ;
          >> CMT : "/*" ( greedy=false : . )* "*/" ;
          >>
          >> It properly deals with IF vs ID and it handles the CMT rule properly.
          >> It stops when reading when it sees "*/". Here is the test example:
          >>
          >> java TestLexer "bbd if /* * / ** foo */ abc"
          >>
          >> [bbd/65538,0:0]
          >> [ /65539,0:0]
          >> [if/65536,0:0]
          >> [ /65539,0:0]
          >> [/* * / ** foo *//65540,0:0]
          >> [ /65539,0:0]
          >> [abc/65538,0:0]
          >>
          >> TestLexer is just a loop that prints out Token objects.
          >>
          >> IntegerStream charStream = new ANTLRStringStream(args[0]);
          >> L lexer = new L(charStream);
          >> Token t = lexer.nextToken();
          >> while ( t.getType()!= IntegerStream.EOF ) {
          >> System.out.println(t.toString());
          >> t = lexer.nextToken();
          >> }
          >>
          >> I feel confident that soon I'll be able to handle the Java grammar. :)
          >>
          >> BTW, org.antlr.runtime.* is only 370 lines of code so far. :)
          >>
          >> runtime/ANTLRStringStream.java
          >> runtime/CommonToken.java
          >> runtime/CommonTokenStream.java
          >> runtime/DFA.java
          >> runtime/IntegerStream.java
          >> runtime/Lexer.java
          >> runtime/Parser.java
          >> runtime/Token.java
          >> runtime/TokenSource.java
          >> runtime/TokenStream.java
          >>
          >> L8R,
          >> Ter
          >> --
          >> CS Professor & Grad Director, University of San Francisco
          >> Creator, ANTLR Parser Generator, http://www.antlr.org
          >> Cofounder, http://www.jguru.com
          >> Cofounder, http://www.knowspam.net enjoy email again!
          >> Cofounder, http://www.peerscope.com pure link sharing
          >
          >
          >
          >
          > Yahoo! Groups Links
          >
          >
          >
          >
          >
          >
          --
          CS Professor & Grad Director, University of San Francisco
          Creator, ANTLR Parser Generator, http://www.antlr.org
          Cofounder, http://www.jguru.com
          Cofounder, http://www.knowspam.net enjoy email again!
          Cofounder, http://www.peerscope.com pure link sharing
        • thrutchy
          I m basically suggesting syntax similar to what in java.util.regex ( http://java.sun.com/docs/books/tutorial/extra/regex/quant.html ). Here s a little table:
          Message 4 of 4 , Aug 1, 2004
          • 0 Attachment
            I'm basically suggesting syntax similar to what in java.util.regex (
            http://java.sun.com/docs/books/tutorial/extra/regex/quant.html ).
            Here's a little table:

            new old
            --- ---
            (x)?? {i=0} ( options{greedy=false;}: {!i}? x {++i;} )*
            (x)*? ( options{greedy=false;}: x )*
            (x)+? ( options{greedy=false;}: x )+
            (x)?+ ( options{warnWhenFollowAbiguous=false};}: x )?
            (x)*+ ( options{greedy=true;}: x )*
            (x)++ ( options{greedy=true;}: x )+

            As you can see, it's a hack now to do a non-greedy/reluctant
            conditional. I faked a ( )? using a ( )*.

            You can reuse ?/+ as suffixes since if they follow ?/*/+ they would
            make no since otherwise. Also, it's nice to reuse stuff from an
            existing standard. In this case regex.

            If you are thinking about backtracking with any of these, you could
            use a "*" suffix to ?/*/+ to mean do backtracking, since we've already
            used ?/+ suffixes. If you did that, you'd have quite a bit of options
            just for loop:

            rule result when something can match x or y
            ---- --------------------------------------
            (x)* y Warn (or error?) of ambiguity
            (x)*? y use y (non-greedy/reluctant w/o backtracking)
            (x)*+ y use x (regex possessive)
            (x)** y prefer x but use y if it fails later (regex default)
            (x)*?* y prefer y but use x if it fails later (regex reluctant)

            Feel free to ignore my ramblings above about backtracking since there
            probably isn't much usefulness for it :) I just thought I mention it
            since we are on the topic of greediness and backtracking is a related
            thing.

            I definitely don't think you should make turn off the ambiguity
            warnings. Overall I find them useful.

            Eric

            --- In antlr-interest@yahoogroups.com, Terence Parr <parrt@c...> wrote:
            >
            > On Jul 31, 2004, at 7:22 PM, thrutchy wrote:
            >
            > > A few things about greediness:
            > >
            > > - how about supporting regex-like "?" modifier (greedy=false):
            > >
            > > CMT : "/*" ( . )*? "*/" ;
            >
            > you mean instead of the verbose greedy=false? Well, ? means optional
            > in my meta-language.
            >
            > > - in antlr 2.7.4 warnings and docs, it says greediness doesn't make
            > > sense with the optional ()? construct, but it does (regex's use ??).
            >
            > Greedy is more about finding longest match in normal regexs, but it
            > requires backtracking. I use it to indicate how to resolve a
            > deterministic lookahead decision.
            >
            > > greedy=true would be the same as warnWhenFollowAbiguous=false, and
            > > non-greedy would be the same as not matching the optional construct if
            > > it matches what's next.
            >
            > I'm hoping that there will just generally be less bitching from antlr
            > unless there is something that really needs your attention :)
            >
            > > - also, the java regex "+" greedy=true modifier would be nice:
            > >
            > > "if" LPAREN expr RPAREN stat ( "else" stat )?+
            >
            > That means to match greedily (the default) and shut up about it right?
            >
            > > I find the majority of times I use options it can be boiled down to
            > > greediness, so the regex shorthands would be nice.
            >
            > I see... Well, i'm really tempted to simply turn off warning emanating
            > from subrules vs "what follows". It will always default to greedy
            > anyway, so might as well just ignore the warning.
            >
            > Ter
            >
            > >
            > > Eric
            > >
            > >
            > > --- In antlr-interest@yahoogroups.com, Terence Parr <parrt@c...>
            wrote:
            > >> Howdy,
            > >>
            > >> Spent 3 days thinking this week and one hour coding to get nongreedy
            > >> loops going properly. ANTLR lexers are much easier to specify now.
            > >> For example, here is a sample grammar I'm working with:
            > >>
            > >> lexer grammar L;
            > >>
            > >> IF : "if" ;
            > >> ID : ('a'..'z')+ ;
            > >> WS : (' '|'\n')+ ;
            > >> CMT : "/*" ( greedy=false : . )* "*/" ;
            > >>
            > >> It properly deals with IF vs ID and it handles the CMT rule properly.
            > >> It stops when reading when it sees "*/". Here is the test example:
            > >>
            > >> java TestLexer "bbd if /* * / ** foo */ abc"
            > >>
            > >> [bbd/65538,0:0]
            > >> [ /65539,0:0]
            > >> [if/65536,0:0]
            > >> [ /65539,0:0]
            > >> [/* * / ** foo *//65540,0:0]
            > >> [ /65539,0:0]
            > >> [abc/65538,0:0]
            > >>
            > >> TestLexer is just a loop that prints out Token objects.
            > >>
            > >> IntegerStream charStream = new ANTLRStringStream(args[0]);
            > >> L lexer = new L(charStream);
            > >> Token t = lexer.nextToken();
            > >> while ( t.getType()!= IntegerStream.EOF ) {
            > >> System.out.println(t.toString());
            > >> t = lexer.nextToken();
            > >> }
            > >>
            > >> I feel confident that soon I'll be able to handle the Java
            grammar. :)
            > >>
            > >> BTW, org.antlr.runtime.* is only 370 lines of code so far. :)
            > >>
            > >> runtime/ANTLRStringStream.java
            > >> runtime/CommonToken.java
            > >> runtime/CommonTokenStream.java
            > >> runtime/DFA.java
            > >> runtime/IntegerStream.java
            > >> runtime/Lexer.java
            > >> runtime/Parser.java
            > >> runtime/Token.java
            > >> runtime/TokenSource.java
            > >> runtime/TokenStream.java
            > >>
            > >> L8R,
            > >> Ter
            > >> --
            > >> CS Professor & Grad Director, University of San Francisco
            > >> Creator, ANTLR Parser Generator, http://www.antlr.org
            > >> Cofounder, http://www.jguru.com
            > >> Cofounder, http://www.knowspam.net enjoy email again!
            > >> Cofounder, http://www.peerscope.com pure link sharing
            > >
            > >
            > >
            > >
            > > Yahoo! Groups Links
            > >
            > >
            > >
            > >
            > >
            > >
            > --
            > CS Professor & Grad Director, University of San Francisco
            > Creator, ANTLR Parser Generator, http://www.antlr.org
            > Cofounder, http://www.jguru.com
            > Cofounder, http://www.knowspam.net enjoy email again!
            > Cofounder, http://www.peerscope.com pure link sharing
          Your message has been successfully submitted and would be delivered to recipients shortly.