Loading ...
Sorry, an error occurred while loading the content.

Re: RFC: defaulterrorhandler [WAS: Re: how do i skip unmatched characters?]

Expand Messages
  • Ric Klaren
    Hi, Finaly found the time to answer this one with thinking ... ... Yup! ... Ack that s why I did this RFC thing =) ... Yup. ... Aha. ... Only problem is that
    Message 1 of 19 , Jul 6, 2001
    • 0 Attachment
      Hi,

      Finaly found the time to answer this one with thinking ...

      On Tue, Jul 03, 2001 at 11:08:50AM -0700, Terence Parr wrote:
      > > (or am I jumping the gun =) )

      Yup!

      > Unless it's a bug, we should discuss changes to behavior I think.

      Ack that's why I did this RFC thing =)

      > Anyway...I'm can't remember my reasons and I'm foggy at the moment, but
      > lexers are different in the sense that you don't want the errors to be
      > trapped in the rules I think--all output of the lexer goes thru the
      > nextToken method.

      Yup.

      > If an error is trapped in a rule, it will return with bogus information and
      > most importantly w/o knowledge that an error occurred. nextToken will
      > return bogus tokens to the parser. Unless the lexer is very complicated,
      > it's usually ok to just say "this text 'xxx' is bogus on line n."

      Aha.

      > So, when I want to detect errors WITHIN a token and keep going to return
      > some valid token to the parser (fault tolerance) I use the default handlers
      > or specify one for a protected rule.

      Only problem is that you can't specify a errorhandler for the nextToken
      rule... So if you want unexpected char's reported inside your lexer without
      going back to the parser (which is not practical in some cases). You
      a) have to specify defaultErrorhandler = true; and maybe in lot's of other
      places defaultErrorHandler = false; (AFAIK only way to get
      defaulterrorhandler in just the nextToken rule)
      b) use the filter rule 'hack' which is IMHO not the most intuitive way to deal
      with these things. (faq's on this topic are shortish)

      > Ok, i've convinced myself that the current behavior is appropriate.

      Me as well =) but with the above notes.

      I guess we should do a few documentation fixes with respect to this. Maybe
      add a section on skipping/reporting on unrecognized chars in the lexer.

      I've been thinking in extending the grammar to allow a:

      class MyParser extends Parser;
      options {
      ...
      }
      exception catch [ ... ] { .. }

      Syntax for at least (tree)parsers so you can specify a different
      defaultErrorhandler for all rules (this should work nicely together with
      Ernest's $lookaheadSet patch).

      For a lexer we could then modify the behaviour to change the errorhandler
      for nextToken?

      Any thoughts?

      Ric
      --
      -----+++++*****************************************************+++++++++-------
      ---- Ric Klaren ----- klaren@... ----- +31 53 4893722 ----
      -----+++++*****************************************************+++++++++-------
      Why don't we just invite them to dinner and massacre them all when they're
      drunk? You heard the man. There's seven hundred thousand of them.
      Ah? ... So it'd have to be something simple with pasta, then.
      --- From: Interesting Times by Terry Pratchet
      -----+++++*****************************************************+++++++++-------
    • Terence Parr
      ... Yes, I definitely think we should fix the documentation and figure out how to specify exception handling for nextToken. NOte that somebody correctly
      Message 2 of 19 , Jul 19, 2001
      • 0 Attachment
        Friday, July 06, 2001, Ric Klaren hath spoken:
        > Hi,

        > Finaly found the time to answer this one with thinking ...

        > On Tue, Jul 03, 2001 at 11:08:50AM -0700, Terence Parr wrote:
        >> > (or am I jumping the gun =) )

        > Yup!

        >> Unless it's a bug, we should discuss changes to behavior I think.

        > Ack that's why I did this RFC thing =)

        >> Anyway...I'm can't remember my reasons and I'm foggy at the moment, but
        >> lexers are different in the sense that you don't want the errors to be
        >> trapped in the rules I think--all output of the lexer goes thru the
        >> nextToken method.

        > Yup.

        >> If an error is trapped in a rule, it will return with bogus information and
        >> most importantly w/o knowledge that an error occurred. nextToken will
        >> return bogus tokens to the parser. Unless the lexer is very complicated,
        >> it's usually ok to just say "this text 'xxx' is bogus on line n."

        > Aha.

        >> So, when I want to detect errors WITHIN a token and keep going to return
        >> some valid token to the parser (fault tolerance) I use the default handlers
        >> or specify one for a protected rule.

        > Only problem is that you can't specify a errorhandler for the nextToken
        > rule... So if you want unexpected char's reported inside your lexer without
        > going back to the parser (which is not practical in some cases). You
        > a) have to specify defaultErrorhandler = true; and maybe in lot's of other
        > places defaultErrorHandler = false; (AFAIK only way to get
        > defaulterrorhandler in just the nextToken rule)
        > b) use the filter rule 'hack' which is IMHO not the most intuitive way to deal
        > with these things. (faq's on this topic are shortish)

        >> Ok, i've convinced myself that the current behavior is appropriate.

        > Me as well =) but with the above notes.

        > I guess we should do a few documentation fixes with respect to this. Maybe
        > add a section on skipping/reporting on unrecognized chars in the lexer.

        > I've been thinking in extending the grammar to allow a:

        > class MyParser extends Parser;
        > options {
        > ...
        > }
        > exception catch [ ... ] { .. }

        > Syntax for at least (tree)parsers so you can specify a different
        > defaultErrorhandler for all rules (this should work nicely together with
        > Ernest's $lookaheadSet patch).

        > For a lexer we could then modify the behaviour to change the errorhandler
        > for nextToken?

        Yes, I definitely think we should fix the documentation and figure out
        how to specify exception handling for nextToken. NOte that somebody
        correctly figured out that the filter=UNKNOWN_TOKEN option gives you
        the desired behavior: all errors go to the UNKNOWN_TOKEN rule. Is
        that cool for now? I.e., add a FAQ entry / update the DOC?

        BTW, do people find the FAQ useful http://www.jguru.com/faq/ANTLR ?

        Ter
        --
        Chief Scientist & Co-founder, http://www.jguru.com
        Co-founder, http://www.NoWebPatents.org -- Stop Patent Stupidity
        parrt@...
      Your message has been successfully submitted and would be delivered to recipients shortly.