Loading ...
Sorry, an error occurred while loading the content.

I can have one or the other, but not both....

Expand Messages
  • Michael T. Richter
    Comment ... ( ( ~( n | r ) )* ( n | r ( n )? ) { $setType(Token::SKIP); newline(); } ... { $setType(Token::SKIP); } ) ; The above rule is trying
    Message 1 of 12 , Nov 9, 1999
    • 0 Attachment
      Comment
      : "--"
      ( ( ~( '\n' | '\r' ) )* ( '\n' | '\r' ( '\n' )? )
      { $setType(Token::SKIP); newline(); }
      | ( ( ~( '-' ) )* '-' )+ '-'
      { $setType(Token::SKIP); }
      )
      ;

      The above rule is trying to describe two different kinds of comments. The
      first kind starts with "--" and terminates at the end of a line. The
      second kind starts with "--" and terminates with another "--". The intent
      is to support something like this:

      LANGUAGE ITEMS--THIS IS A COMMENT--MORE LANGUAGE ITEMS--ANOTHER COMMENT

      (Yes, I know this is an abortion, but welcome to the wonderful world of
      ISO/IEC/ITU specifications....)

      When I have either one of the two above subrules commented out, the grammar
      compiles with no muss, no fuss. When I have the both in place, however, I
      get this:

      warning: line 1166: lexical nondeterminism upon
      k==1:'\t','\n','\14','\r',' ','-','0'..'9','A'..'Z','a'..'z'
      k==2:'\t','\n','\14','\r',' ','-','0'..'9','A'..'Z','a'..'z'
      between alts 1 and 2 of block

      (The '\t' to ' ' characters derive from my whitespace rule and the '-' to
      'z' characters derive from my various identifier rules. Cute trick.)

      How do I go about getting my comments to work out?

      --
      Michael T. Richter <mtr@...> http://www.igs.net/~mtr/
      PGP Key: http://www.igs.net/~mtr/pgp-key.html
      PGP Fingerprint: 40D1 33E0 F70B 6BB5 8353 4669 B4CC DD09 04ED 4FE8
    • Michael T. Richter
      I solved this one myself, but a pre-emptive thank you to anyone who responded. :-) Once I learned to relax and start loving the parsing lexer , I got this
      Message 2 of 12 , Nov 9, 1999
      • 0 Attachment
        I solved this one myself, but a pre-emptive "thank you" to anyone who
        responded. :-)

        Once I learned to relax and start loving the "parsing lexer", I got this
        one straightened out. (Basically the fact that I can do predicates dawned
        on me.) The new rule set for my whitespace (including comments) looks like
        this:

        Whitespace
        : ( ( ' ' | '\t' | '\f' | Newline )
        | ( InlineComment ) =>
        InlineComment
        | TerminatingComment
        )
        { $setType(Token::SKIP); }
        ;

        protected
        InlineComment
        : "--"
        ( ( ~( '-' ) )* '-' )+ '-'
        ;

        protected
        Newline
        : ( '\n' | '\r' ( '\n' )? )
        { newline(); }
        ;

        protected
        TerminatingComment
        : "--"
        ( ~( '\n' | '\r' ) )* Newline
        ;

        This, to me, highlights the strengths (and single weakness) of the new
        lexer metalanguage. I've made some very complex rules to handle all of my
        logical whitespace here, but each rule is a nice, self-contained chunk.
        This makes the resulting grammar look much nicer and more comprehensible
        than Lex/Flex or DLG let me have before, and it gives me a lot of powerful
        capabilities. One thing I particularly like about this is that using
        Newline, for example, guarantees that I will *always* count lines, whether
        Newline is part of a token which is actually used or not. I won't ever be
        able to forget that newlines increment line counts.

        The downside, of course, is that the new lexer generator is a real bitch to
        learn. :-)

        >Comment
        > : "--"
        > ( ( ~( '\n' | '\r' ) )* ( '\n' | '\r' ( '\n' )? )
        > { $setType(Token::SKIP); newline(); }
        > | ( ( ~( '-' ) )* '-' )+ '-'
        > { $setType(Token::SKIP); }
        > )
        > ;

        --
        Michael T. Richter <mtr@...> http://www.igs.net/~mtr/
        PGP Key: http://www.igs.net/~mtr/pgp-key.html
        PGP Fingerprint: 40D1 33E0 F70B 6BB5 8353 4669 B4CC DD09 04ED 4FE8
      • Jonathan Sergent
        /// Michael T. Richter : ] ] protected ] InlineComment ] : -- ] ( ( ~( - ) )* - )+ - ] ; ] Won t this suck up: --xyz--abc--def-- or
        Message 3 of 12 , Nov 9, 1999
        • 0 Attachment
          /// "Michael T. Richter" <mtr@...>:
          ]
          ] protected
          ] InlineComment
          ] : "--"
          ] ( ( ~( '-' ) )* '-' )+ '-'
          ] ;
          ]

          Won't this suck up:

          --xyz--abc--def--

          or

          --abc----

          as one comment?

          (or is this what you intended?)


          --jss.
        • Luke Blanshard
          Careful about the comment rules. Right now, the InlineComment will suck up the entire file looking for a closing pair of hyphens, happily skipping ends of
          Message 4 of 12 , Nov 9, 1999
          • 0 Attachment
            Careful about the comment rules. Right now, the "InlineComment" will suck
            up the entire file looking for a closing pair of hyphens, happily skipping
            ends of lines in the process. For example:

            LANGUAGE ITEMS -- A COMMENT STARTS HERE
            ... AND YOUR RULE THINKS IT CONTINUES TO HERE -- WHOOPS!

            You probably want to combine the two rules for comments.

            Luke


            > -----Original Message-----
            > From: Michael T. Richter [mailto:mtr@...]
            > Sent: Tuesday, November 09, 1999 10:44 AM
            > To: antlr-interest@onelist.com
            > Subject: Re: [antlr-interest] I can have one or the other, but not
            > both....
            >
            >
            > From: "Michael T. Richter" <mtr@...>
            >
            > I solved this one myself, but a pre-emptive "thank you" to anyone who
            > responded. :-)
            >
            > Once I learned to relax and start loving the "parsing lexer", I got this
            > one straightened out. (Basically the fact that I can do predicates dawned
            > on me.) The new rule set for my whitespace (including comments)
            > looks like
            > this:
            >
            > Whitespace
            > : ( ( ' ' | '\t' | '\f' | Newline )
            > | ( InlineComment ) =>
            > InlineComment
            > | TerminatingComment
            > )
            > { $setType(Token::SKIP); }
            > ;
            >
            > protected
            > InlineComment
            > : "--"
            > ( ( ~( '-' ) )* '-' )+ '-'
            > ;
            >
            > protected
            > Newline
            > : ( '\n' | '\r' ( '\n' )? )
            > { newline(); }
            > ;
            >
            > protected
            > TerminatingComment
            > : "--"
            > ( ~( '\n' | '\r' ) )* Newline
            > ;
            >
            > This, to me, highlights the strengths (and single weakness) of the new
            > lexer metalanguage. I've made some very complex rules to handle all of my
            > logical whitespace here, but each rule is a nice, self-contained chunk.
            > This makes the resulting grammar look much nicer and more comprehensible
            > than Lex/Flex or DLG let me have before, and it gives me a lot of powerful
            > capabilities. One thing I particularly like about this is that using
            > Newline, for example, guarantees that I will *always* count lines, whether
            > Newline is part of a token which is actually used or not. I won't ever be
            > able to forget that newlines increment line counts.
            >
            > The downside, of course, is that the new lexer generator is a
            > real bitch to
            > learn. :-)
            >
            > >Comment
            > > : "--"
            > > ( ( ~( '\n' | '\r' ) )* ( '\n' | '\r' ( '\n' )? )
            > > { $setType(Token::SKIP); newline(); }
            > > | ( ( ~( '-' ) )* '-' )+ '-'
            > > { $setType(Token::SKIP); }
            > > )
            > > ;
            >
            > --
            > Michael T. Richter <mtr@...> http://www.igs.net/~mtr/
            > PGP Key: http://www.igs.net/~mtr/pgp-key.html
            > PGP Fingerprint: 40D1 33E0 F70B 6BB5 8353 4669 B4CC DD09 04ED 4FE8
            >
            > >
          • Michael T. Richter
            ... Hmmm.... Let s use your first case: --xyz--abc--def-- First the -- is recognized, leaving me with: xyz--abc--def--. Now ( ~( - ) )* recognizes the
            Message 5 of 12 , Nov 9, 1999
            • 0 Attachment
              At 11:55 AM 11/9/99 , you wrote:
              >> protected
              >> InlineComment
              >> : "--"
              >> ( ( ~( '-' ) )* '-' )+ '-'
              >> ;

              > Won't this suck up:
              > --xyz--abc--def--
              > or
              > --abc----
              > as one comment?

              Hmmm....

              Let's use your first case: --xyz--abc--def--

              First the "--" is recognized, leaving me with: xyz--abc--def--.

              Now ( ~( '-' ) )* recognizes the "xyz", leaving me with: --abc--def--.

              Now the '-' is recognized, leaving me with: -abc--def--.

              And here it breaks down: the next '-' is recognized as part of the ( )+
              subrule because I can have zero or more non-hyphens ending with a hyphen.

              Back to the drawing board. I *hate* ISO specs!

              > (or is this what you intended?)

              Nope. Thanks for catching that for me.

              --
              Michael T. Richter <mtr@...> http://www.igs.net/~mtr/
              PGP Key: http://www.igs.net/~mtr/pgp-key.html
              PGP Fingerprint: 40D1 33E0 F70B 6BB5 8353 4669 B4CC DD09 04ED 4FE8
            • Michael T. Richter
              ... How? Put together I got nothing but endless streams of non-determinisms. Maybe if I recognized a terminating comment first (complete with syntactic
              Message 6 of 12 , Nov 9, 1999
              • 0 Attachment
                At 11:58 AM 11/9/99 , you wrote:
                >Careful about the comment rules. Right now, the "InlineComment" will suck
                >up the entire file looking for a closing pair of hyphens, happily skipping
                >ends of lines in the process. For example:

                > LANGUAGE ITEMS -- A COMMENT STARTS HERE
                > ... AND YOUR RULE THINKS IT CONTINUES TO HERE -- WHOOPS!

                >You probably want to combine the two rules for comments.

                How? Put together I got nothing but endless streams of non-determinisms.
                Maybe if I recognized a terminating comment first (complete with syntactic
                predicate) part of the problem will be solved?

                >
                >
                >> -----Original Message-----
                >> From: Michael T. Richter [mailto:mtr@...]
                >> Sent: Tuesday, November 09, 1999 10:44 AM
                >> To: antlr-interest@onelist.com
                >> Subject: Re: [antlr-interest] I can have one or the other, but not
                >> both....
                >>
                >>
                >> From: "Michael T. Richter" <mtr@...>
                >>
                >> I solved this one myself, but a pre-emptive "thank you" to anyone who
                >> responded. :-)
                >>
                >> Once I learned to relax and start loving the "parsing lexer", I got this
                >> one straightened out. (Basically the fact that I can do predicates dawned
                >> on me.) The new rule set for my whitespace (including comments)
                >> looks like
                >> this:
                >>
                >> Whitespace
                >> : ( ( ' ' | '\t' | '\f' | Newline )
                >> | ( InlineComment ) =>
                >> InlineComment
                >> | TerminatingComment
                >> )
                >> { $setType(Token::SKIP); }
                >> ;
                >>
                >> protected
                >> InlineComment
                >> : "--"
                >> ( ( ~( '-' ) )* '-' )+ '-'
                >> ;
                >>
                >> protected
                >> Newline
                >> : ( '\n' | '\r' ( '\n' )? )
                >> { newline(); }
                >> ;
                >>
                >> protected
                >> TerminatingComment
                >> : "--"
                >> ( ~( '\n' | '\r' ) )* Newline
                >> ;
                >>
                >> This, to me, highlights the strengths (and single weakness) of the new
                >> lexer metalanguage. I've made some very complex rules to handle all of my
                >> logical whitespace here, but each rule is a nice, self-contained chunk.
                >> This makes the resulting grammar look much nicer and more comprehensible
                >> than Lex/Flex or DLG let me have before, and it gives me a lot of powerful
                >> capabilities. One thing I particularly like about this is that using
                >> Newline, for example, guarantees that I will *always* count lines, whether
                >> Newline is part of a token which is actually used or not. I won't ever be
                >> able to forget that newlines increment line counts.
                >>
                >> The downside, of course, is that the new lexer generator is a
                >> real bitch to
                >> learn. :-)
                >>
                >> >Comment
                >> > : "--"
                >> > ( ( ~( '\n' | '\r' ) )* ( '\n' | '\r' ( '\n' )? )
                >> > { $setType(Token::SKIP); newline(); }
                >> > | ( ( ~( '-' ) )* '-' )+ '-'
                >> > { $setType(Token::SKIP); }
                >> > )
                >> > ;
                >>
                >> --
                >> Michael T. Richter <mtr@...> http://www.igs.net/~mtr/
                >> PGP Key: http://www.igs.net/~mtr/pgp-key.html
                >> PGP Fingerprint: 40D1 33E0 F70B 6BB5 8353 4669 B4CC DD09 04ED 4FE8
                >>
                >> >
                >
                >

                --
                Michael T. Richter <mtr@...> http://www.igs.net/~mtr/
                PGP Key: http://www.igs.net/~mtr/pgp-key.html
                PGP Fingerprint: 40D1 33E0 F70B 6BB5 8353 4669 B4CC DD09 04ED 4FE8
              • Luke Blanshard
                Here s what I had in mind for combining the comment rules: Comment ... ( -- | ( - )? Newline ) { $setType(Token.SKIP); } ; It s pretty damn complicated. Luke
                Message 7 of 12 , Nov 9, 1999
                • 0 Attachment
                  Here's what I had in mind for combining the comment rules:

                  Comment
                  : "--" ( ('-')? ~('-'|'\r'|'\n') )*
                  ( "--" | ('-')? Newline ) { $setType(Token.SKIP); }
                  ;

                  It's pretty damn complicated.

                  Luke

                  > -----Original Message-----
                  > From: Michael T. Richter [mailto:mtr@...]
                  > Sent: Tuesday, November 09, 1999 10:44 AM
                  > To: antlr-interest@onelist.com
                  > Subject: Re: [antlr-interest] I can have one or the other, but not
                  > both....
                  >
                  >
                  > From: "Michael T. Richter" <mtr@...>
                  >
                  > I solved this one myself, but a pre-emptive "thank you" to anyone who
                  > responded. :-)
                  >
                  > Once I learned to relax and start loving the "parsing lexer", I got this
                  > one straightened out. (Basically the fact that I can do predicates dawned
                  > on me.) The new rule set for my whitespace (including comments)
                  > looks like
                  > this:
                  >
                  > Whitespace
                  > : ( ( ' ' | '\t' | '\f' | Newline )
                  > | ( InlineComment ) =>
                  > InlineComment
                  > | TerminatingComment
                  > )
                  > { $setType(Token::SKIP); }
                  > ;
                  >
                  > protected
                  > InlineComment
                  > : "--"
                  > ( ( ~( '-' ) )* '-' )+ '-'
                  > ;
                  >
                  > protected
                  > Newline
                  > : ( '\n' | '\r' ( '\n' )? )
                  > { newline(); }
                  > ;
                  >
                  > protected
                  > TerminatingComment
                  > : "--"
                  > ( ~( '\n' | '\r' ) )* Newline
                  > ;
                  >
                  > This, to me, highlights the strengths (and single weakness) of the new
                  > lexer metalanguage. I've made some very complex rules to handle all of my
                  > logical whitespace here, but each rule is a nice, self-contained chunk.
                  > This makes the resulting grammar look much nicer and more comprehensible
                  > than Lex/Flex or DLG let me have before, and it gives me a lot of powerful
                  > capabilities. One thing I particularly like about this is that using
                  > Newline, for example, guarantees that I will *always* count lines, whether
                  > Newline is part of a token which is actually used or not. I won't ever be
                  > able to forget that newlines increment line counts.
                  >
                  > The downside, of course, is that the new lexer generator is a
                  > real bitch to
                  > learn. :-)
                  >
                  > >Comment
                  > > : "--"
                  > > ( ( ~( '\n' | '\r' ) )* ( '\n' | '\r' ( '\n' )? )
                  > > { $setType(Token::SKIP); newline(); }
                  > > | ( ( ~( '-' ) )* '-' )+ '-'
                  > > { $setType(Token::SKIP); }
                  > > )
                  > > ;
                  >
                  > --
                  > Michael T. Richter <mtr@...> http://www.igs.net/~mtr/
                  > PGP Key: http://www.igs.net/~mtr/pgp-key.html
                  > PGP Fingerprint: 40D1 33E0 F70B 6BB5 8353 4669 B4CC DD09 04ED 4FE8
                  >
                  > >
                • Sinan
                  ... Aha! The Aha! experience. Welcome to antlr.... Once you love it , you love it forever. Sinan
                  Message 8 of 12 , Nov 9, 1999
                  • 0 Attachment
                    "Michael T. Richter" wrote:
                    >
                    > From: "Michael T. Richter" <mtr@...>
                    >
                    > I solved this one myself, but a pre-emptive "thank you" to anyone who
                    > responded. :-)
                    >
                    > Once I learned to relax and start loving the "parsing lexer", I got this
                    > one straightened out. (Basically the fact that I can do predicates dawned
                    > on me.) The new rule set for my whitespace (including comments) looks like
                    > this:
                    >

                    Aha! The Aha! experience. Welcome to antlr....
                    Once you love it , you love it forever.

                    Sinan
                  • Sinan
                    Michael T. Richter wrote: [....] I would try something like. ( have not cehewed on this long enough but.... ... InlineComment ... protected CommentBody ... (
                    Message 9 of 12 , Nov 9, 1999
                    • 0 Attachment
                      "Michael T. Richter" wrote:
                      [....]

                      I would try something like. ( have not cehewed on this long enough
                      but....

                      ---------------------------------------------------------------------
                      InlineComment
                      : CommentDelimiter CommentBody (CommentDelimiter|NewLine)


                      protected
                      CommentBody
                      :
                      ( (~('-') | ~NewLine)+ ((('-')? CommentBody)? | NewLine ))?
                      ;

                      protected
                      CommentDelimiter
                      :
                      '-' '-'



                      Sinan
                    • Michael T. Richter
                      ... This is making ANTLR whine about non-determinisms for k as high as 5 (I didn t try beyond that point). Monty s stab at it is deterministic for k=3. I
                      Message 10 of 12 , Nov 9, 1999
                      • 0 Attachment
                        At 12:25 PM 11/9/99 , you wrote:
                        >Comment
                        > : "--" ( ('-')? ~('-'|'\r'|'\n') )*
                        > ( "--" | ('-')? Newline ) { $setType(Token.SKIP); }
                        > ;

                        This is making ANTLR whine about non-determinisms for k as high as 5 (I
                        didn't try beyond that point). Monty's stab at it is deterministic for
                        k=3. I can't immediately see where I'd put disambiguating predicates in
                        either of the solutions, but I'll tinker with it for a while. I'm
                        beginning to get the hang of things here.

                        >It's pretty damn complicated.

                        You can say that again. Did I mention how much I hated ISO/IEC/ITU specs
                        yet? :-)

                        --
                        Michael T. Richter <mtr@...> http://www.igs.net/~mtr/
                        PGP Key: http://www.igs.net/~mtr/pgp-key.html
                        PGP Fingerprint: 40D1 33E0 F70B 6BB5 8353 4669 B4CC DD09 04ED 4FE8
                      • Michael T. Richter
                        ... Well, my Aha! was a bit premature, but at least I understand what I can do with this stuff. And I m not new to ANTLR. I m new to ANTLR 2.n.... :-) --
                        Message 11 of 12 , Nov 9, 1999
                        • 0 Attachment
                          At 12:27 PM 11/9/99 , you wrote:
                          >> Once I learned to relax and start loving the "parsing lexer", I got this
                          >> one straightened out. (Basically the fact that I can do predicates dawned
                          >> on me.) The new rule set for my whitespace (including comments) looks like
                          >> this:

                          >Aha! The Aha! experience. Welcome to antlr....
                          >Once you love it , you love it forever.

                          Well, my "Aha!" was a bit premature, but at least I understand what I can
                          do with this stuff.

                          And I'm not new to ANTLR. I'm new to ANTLR 2.n.... :-)

                          --
                          Michael T. Richter <mtr@...> http://www.igs.net/~mtr/
                          PGP Key: http://www.igs.net/~mtr/pgp-key.html
                          PGP Fingerprint: 40D1 33E0 F70B 6BB5 8353 4669 B4CC DD09 04ED 4FE8
                        • Sinan
                          ... [....] ... Sinan ... Ok I take it back. John s solution is much cleaner. Besides this has an error ( should have ~NewLine, which is hard to code....).
                          Message 12 of 12 , Nov 9, 1999
                          • 0 Attachment
                            Sinan wrote:
                            >
                            > From: Sinan <sinan.karasu@...>
                            [....]
                            >
                            > :
                            > ( (~('-') | ~NewLine)+ ((('-')? CommentBody)? | NewLine ))?
                            Sinan
                            >

                            Ok I take it back. John's solution is much cleaner. Besides this has an
                            error
                            ( should have ~NewLine, which is hard to code....). However CommentBody
                            does lend itself
                            to recursion nicely.

                            Sinan
                          Your message has been successfully submitted and would be delivered to recipients shortly.