Loading ...
Sorry, an error occurred while loading the content.

Re: [antlr-interest] how do i skip unmatched characters?

Expand Messages
  • Matthew Ford
    I had this problem when handling commands comming in via telnet I think this should be a common requirement. There should be an FAQ about it. matthew This is
    Message 1 of 19 , Jun 29, 2001
    • 0 Attachment
      I had this problem when handling commands comming in via telnet
      I think this should be a common requirement. There should be an FAQ about
      it.
      matthew

      This is what I did to skip to the next ; and then continue parsing
      (ErrorLog and Debug are my own reporting logs)


      In the Parser

      options {
      k = 1; // one token lookahead
      defaultErrorHandler = false; // Don't generate parser error handlers
      buildAST = true;
      importVocab = TIMEFRAMECOMMONLEXER;
      exportVocab = TIMEFRAMESERVERPARSER;

      }


      {

      ICommandGeneratorServer commandGenerator;
      TimeFrameCommonLexer TFlexer;
      static String nl = System.getProperty("line.separator","\n");

      final static int MAJOR = 401;
      final static int TRANSLATION_ERROR = 1;
      final static int FIELD_QUALIFIER_ERROR = 2;
      final static int QUERY_NOT_DEFINED = 3;
      final static int NO_AGE_INDEX = 4;

      public TimeFrameCommandsParser(TimeFrameCommonLexer lexer,
      ICommandGeneratorServer commandGenerator, IScope iScope) {
      this(lexer);
      TFlexer = lexer;
      this.commandGenerator = commandGenerator;
      this.iScope = iScope;
      }

      public void reportError(ANTLRException ex) {
      Debug.out("in reportError", MAJOR, 0);

      commandGenerator.translationError(TFlexer.getLineBuffer(),

      0,0,null,null,TimeFrameException.makeTimeFrameException(MAJOR,TRANSLATION_ER
      ROR,ex.getMessage()));
      Debug.out("******* end reportError *******", MAJOR, 0);
      }



      public void processError(ANTLRException ex) throws TokenStreamException,
      CharStreamException {
      // actually only throws TokenStreamIOException others caught here
      int tokenType=0;
      LexerSharedInputState inputState = TFlexer.getInputState();
      inputState.guessing = 0; // clear guessing mode
      Debug.out("in processError", MAJOR, 0);
      if (!errorFlag) { // first error
      reportError(ex);
      errorFlag=true; // block new errors until after syncing.
      }

      do {
      try {
      if (ex instanceof TokenStreamRecognitionException) {
      TokenStreamRecognitionException rex =
      (TokenStreamRecognitionException)ex;
      // get underlying exception
      ex = null; // have handled this one now
      if ((rex.recog instanceof MismatchedCharException) ||
      (rex.recog instanceof NoViableAltForCharException)) {
      try {
      TFlexer.consume(); // remove current error char;
      } catch (CharStreamException cse) {
      if ( cse instanceof CharStreamIOException ) {
      throw new TokenStreamIOException(((CharStreamIOException)cse).io);
      } else {
      throw new TokenStreamIOException(new IOException(cse.getMessage()));
      }
      }
      }
      }

      tokenType = LA(1);
      if ((tokenType != EOF) && (tokenType != SEMI)) {
      consume(); // remove ;
      Debug.out("Input buffer:'"+TFlexer.getLineBuffer()+"'", MAJOR, 0);
      }

      } catch (TokenStreamRecognitionException ex1) {
      ex = ex1; // and loop
      // TFlexer.consume(); // remove current error char;
      Debug.out("** found :"+ ex1, MAJOR, 0);
      } catch (TokenStreamRetryException ex1) {
      Debug.out("** found :"+ ex1, MAJOR, 0);
      throw new TokenStreamIOException(new IOException(ex1.getMessage()));
      }
      } while ( tokenType != SEMI && tokenType != EOF && !isEOF());
      Debug.out("** end processError *******", MAJOR, 0);
      // if telnet print prompt again (How??)

      }

      private boolean errorFlag = false;

      private boolean eofFlag = false;

      public boolean isEOF() {
      return eofFlag;
      }

      private void clearErrorFlag() {
      errorFlag = false;
      }



      After the SEMI is seen I expect to find a new statement
      if I do then after I find a valid statement I call clearErrorFlag()

      // SetAttribute("attribute1","attributeValue");
      setattribute!
      : SETATTRIBUTE
      LPAREN attr:STRING_LITERAL COMMA value:STRING_LITERAL RPAREN SEMI
      { clearErrorFlagAndScope();
      commandGenerator.setAttribute(TFlexer.getLineBuffer(),attr.getText(),value.g
      etText());
      }
      ;



      Finally to tie it all together

      In the main program I start the parser like this (in its own thread)
      The command the parser finds are put on a command stack to be handled by
      another thread. This lets you issue cancel commands at any time.

      /**
      * The method reads in commands one by one.
      */
      public void run() {
      Debug.out(""+GlobalData.nl+" ---------- InputThread " + connectionNo + "
      starts.",MAJOR,0);

      try {
      do {
      try {
      Debug.out("InputThread Call Parser",MAJOR,0);
      parser.program();
      } catch (RecognitionException ex) {
      ErrorLog.log.println("RecognitionException: "+ ex.getMessage());
      Debug.out("InputThread RecognitionException: "+
      ex.getMessage(),MAJOR,0);
      parser.processError(ex);
      } catch (TokenStreamRecognitionException ex) {
      ErrorLog.log.println("TokenStreamRecognitionException: " +
      ex.getMessage());
      Debug.out("InputThread TokenStreamRecognitionException: " +
      ex.getMessage(),MAJOR,0);
      parser.processError(ex);
      } catch (TokenStreamRetryException ex) {
      ErrorLog.log.println("TokenStreamRetryException: " + ex.getMessage());
      Debug.out("InputThread TokenStreamRetryException: " +
      ex.getMessage(),MAJOR,0);
      parser.processError(ex);
      } catch(TokenStreamIOException ex) {
      Debug.out("InputThread TokenStreamIOException: " +
      ex.getMessage(),MAJOR,0);
      if (getStopped()) {
      break;
      }
      }
      if (parser.isEOF()) {
      Debug.out("parser found EOF *****************",MAJOR,0);
      break; // do not call the program again
      }
      } while (!getStopped()); // was while true

      }
      catch(Exception e) { //TokenStream IO exceptions or CharStreamExceptions
      Debug.out(ExceptionBuffer.getStackTrace(e), Debug.STACKTRACE, 0);
      // Close stream on IO errors
      if (e instanceof antlr.TokenStreamIOException) {
      Debug.out("TokenStreamIOException: one connection is lost",MAJOR,0);
      listener.cancelConnection();
      }
      else {
      Debug.out("other exception:",MAJOR,0);
      TimeFrameException tfe =
      TimeFrameException.makeTimeFrameException(MAJOR,EXCEPTION,e.toString() +
      e.getMessage());
      Errors error = new Errors(tfe);
      beaconit.serverutils.CommandLog.log.printCommand(sessionNo,
      connectionNo, error.toLogText());
      ErrorLog.log.printError(sessionNo, connectionNo, error, "");
      stack.add(error);
      }
      }


      This is how I stop the input thread
      /**
      * This method is used to set the stop flag.
      */
      synchronized public void stopThread() {
      Debug.out("stopThread is called in InputThread " + connectionNo,MAJOR,0);
      stopped = true;
      }

      /**
      * Synchronized method returns the status of stop flag.
      * @return the status of stop flag.
      */
      synchronized public boolean getStopped() {
      return stopped;
      }

      ----- Original Message -----
      From: "Stdiobe" <stdiobe@...>
      To: <antlr-interest@yahoogroups.com>
      Sent: Saturday, June 30, 2001 4:47 AM
      Subject: [antlr-interest] how do i skip unmatched characters?


      >
      > Hi,
      >
      > when the lexer generated by ANTLR encounters an unmatched character,
      > it throws a TokenStreamRecognitionException which causes my lexer
      > to exit (and also my parser).
      >
      > Does anyone know how I can skip unmatched characters in the lexer
      > by reporting the error to the user (with linenumber, etc.), and have
      > the lexer continue scanning for valid tokens.
      >
      > Stdiobe.
      >
      >
      >
      >
      >
      >
      > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
      >
      >
    • Stdiobe
      Gary, thanks for your response! I ve tried your solution but it also skips ignore-characters in tokens being matched, which is not what I want. From the
      Message 2 of 19 , Jul 2, 2001
      • 0 Attachment
        Gary,

        thanks for your response!

        I've tried your solution but it also skips ignore-characters in tokens
        being matched, which is not what I want.

        From the documentation I understand I can use "setCommitToPath"
        to prevent this from happening but then I would have to add this to
        every rule :-(

        But, if I can't find any other solution ....

        > Something like this might work. You'll have to define
        > "unmatched character" more precisely of course.
        >
        > class MyLexer extends Lexer;
        > options {
        > k=2;
        > filter = IGNORE;
        > charVocabulary = '\3'..'\177';
        > }
        >
        > ....
        >
        > protected
        > IGNORE
        > : '\3'..'\177'
        > {
        > newline();
        > System.err.println("Error: unmatched character "
        > + "\"" + getText() + "\" line: " + getLine());
        > }
        > ;
        >
        > Gary Schaps
        >
        > Stdiobe wrote:
        >
        > > Hi,
        > >
        > > when the lexer generated by ANTLR encounters an unmatched character,
        > > it throws a TokenStreamRecognitionException which causes my lexer
        > > to exit (and also my parser).
        > >
        > > Does anyone know how I can skip unmatched characters in the lexer
        > > by reporting the error to the user (with linenumber, etc.), and have
        > > the lexer continue scanning for valid tokens.
      • Stdiobe
        Matthew, thanks for your response! If I understand your solution correctly, you catch the exception when the parser exits and restart the parser until you get
        Message 3 of 19 , Jul 2, 2001
        • 0 Attachment
          Matthew,

          thanks for your response!

          If I understand your solution correctly, you catch the exception when
          the parser exits and restart the parser until you get correct input.

          That wouldn't work in my case, because my parser expects a "complete"
          program to parse, so the lexer should NOT return an unexpected character
          exception to the parser, but instead should capture it itself.

          ----- Original Message -----
          From: Matthew Ford <Matthew.Ford@...>
          To: <antlr-interest@yahoogroups.com>
          Sent: Saturday, June 30, 2001 1:05 AM
          Subject: Re: [antlr-interest] how do i skip unmatched characters?


          > I had this problem when handling commands comming in via telnet
          > I think this should be a common requirement. There should be an FAQ about
          > it.
          > matthew
          >
          > This is what I did to skip to the next ; and then continue parsing
          > (ErrorLog and Debug are my own reporting logs)
          >
          >
          > In the Parser
          >
          > options {
          > k = 1; // one token lookahead
          > defaultErrorHandler = false; // Don't generate parser error handlers
          > buildAST = true;
          > importVocab = TIMEFRAMECOMMONLEXER;
          > exportVocab = TIMEFRAMESERVERPARSER;
          >
          > }
          >
          >
          > {
          >
          > ICommandGeneratorServer commandGenerator;
          > TimeFrameCommonLexer TFlexer;
          > static String nl = System.getProperty("line.separator","\n");
          >
          > final static int MAJOR = 401;
          > final static int TRANSLATION_ERROR = 1;
          > final static int FIELD_QUALIFIER_ERROR = 2;
          > final static int QUERY_NOT_DEFINED = 3;
          > final static int NO_AGE_INDEX = 4;
          >
          > public TimeFrameCommandsParser(TimeFrameCommonLexer lexer,
          > ICommandGeneratorServer commandGenerator, IScope iScope) {
          > this(lexer);
          > TFlexer = lexer;
          > this.commandGenerator = commandGenerator;
          > this.iScope = iScope;
          > }
          >
          > public void reportError(ANTLRException ex) {
          > Debug.out("in reportError", MAJOR, 0);
          >
          > commandGenerator.translationError(TFlexer.getLineBuffer(),
          >
          >
          0,0,null,null,TimeFrameException.makeTimeFrameException(MAJOR,TRANSLATION_ER
          > ROR,ex.getMessage()));
          > Debug.out("******* end reportError *******", MAJOR, 0);
          > }
          >
          >
          >
          > public void processError(ANTLRException ex) throws TokenStreamException,
          > CharStreamException {
          > // actually only throws TokenStreamIOException others caught here
          > int tokenType=0;
          > LexerSharedInputState inputState = TFlexer.getInputState();
          > inputState.guessing = 0; // clear guessing mode
          > Debug.out("in processError", MAJOR, 0);
          > if (!errorFlag) { // first error
          > reportError(ex);
          > errorFlag=true; // block new errors until after syncing.
          > }
          >
          > do {
          > try {
          > if (ex instanceof TokenStreamRecognitionException) {
          > TokenStreamRecognitionException rex =
          > (TokenStreamRecognitionException)ex;
          > // get underlying exception
          > ex = null; // have handled this one now
          > if ((rex.recog instanceof MismatchedCharException) ||
          > (rex.recog instanceof NoViableAltForCharException)) {
          > try {
          > TFlexer.consume(); // remove current error char;
          > } catch (CharStreamException cse) {
          > if ( cse instanceof CharStreamIOException ) {
          > throw new TokenStreamIOException(((CharStreamIOException)cse).io);
          > } else {
          > throw new TokenStreamIOException(new
          IOException(cse.getMessage()));
          > }
          > }
          > }
          > }
          >
          > tokenType = LA(1);
          > if ((tokenType != EOF) && (tokenType != SEMI)) {
          > consume(); // remove ;
          > Debug.out("Input buffer:'"+TFlexer.getLineBuffer()+"'", MAJOR,
          0);
          > }
          >
          > } catch (TokenStreamRecognitionException ex1) {
          > ex = ex1; // and loop
          > // TFlexer.consume(); // remove current error char;
          > Debug.out("** found :"+ ex1, MAJOR, 0);
          > } catch (TokenStreamRetryException ex1) {
          > Debug.out("** found :"+ ex1, MAJOR, 0);
          > throw new TokenStreamIOException(new IOException(ex1.getMessage()));
          > }
          > } while ( tokenType != SEMI && tokenType != EOF && !isEOF());
          > Debug.out("** end processError *******", MAJOR, 0);
          > // if telnet print prompt again (How??)
          >
          > }
          >
          > private boolean errorFlag = false;
          >
          > private boolean eofFlag = false;
          >
          > public boolean isEOF() {
          > return eofFlag;
          > }
          >
          > private void clearErrorFlag() {
          > errorFlag = false;
          > }
          >
          >
          >
          > After the SEMI is seen I expect to find a new statement
          > if I do then after I find a valid statement I call clearErrorFlag()
          >
          > // SetAttribute("attribute1","attributeValue");
          > setattribute!
          > : SETATTRIBUTE
          > LPAREN attr:STRING_LITERAL COMMA value:STRING_LITERAL RPAREN SEMI
          > { clearErrorFlagAndScope();
          >
          commandGenerator.setAttribute(TFlexer.getLineBuffer(),attr.getText(),value.g
          > etText());
          > }
          > ;
          >
          >
          >
          > Finally to tie it all together
          >
          > In the main program I start the parser like this (in its own thread)
          > The command the parser finds are put on a command stack to be handled by
          > another thread. This lets you issue cancel commands at any time.
          >
          > /**
          > * The method reads in commands one by one.
          > */
          > public void run() {
          > Debug.out(""+GlobalData.nl+" ---------- InputThread " + connectionNo + "
          > starts.",MAJOR,0);
          >
          > try {
          > do {
          > try {
          > Debug.out("InputThread Call Parser",MAJOR,0);
          > parser.program();
          > } catch (RecognitionException ex) {
          > ErrorLog.log.println("RecognitionException: "+ ex.getMessage());
          > Debug.out("InputThread RecognitionException: "+
          > ex.getMessage(),MAJOR,0);
          > parser.processError(ex);
          > } catch (TokenStreamRecognitionException ex) {
          > ErrorLog.log.println("TokenStreamRecognitionException: " +
          > ex.getMessage());
          > Debug.out("InputThread TokenStreamRecognitionException: " +
          > ex.getMessage(),MAJOR,0);
          > parser.processError(ex);
          > } catch (TokenStreamRetryException ex) {
          > ErrorLog.log.println("TokenStreamRetryException: " +
          ex.getMessage());
          > Debug.out("InputThread TokenStreamRetryException: " +
          > ex.getMessage(),MAJOR,0);
          > parser.processError(ex);
          > } catch(TokenStreamIOException ex) {
          > Debug.out("InputThread TokenStreamIOException: " +
          > ex.getMessage(),MAJOR,0);
          > if (getStopped()) {
          > break;
          > }
          > }
          > if (parser.isEOF()) {
          > Debug.out("parser found EOF *****************",MAJOR,0);
          > break; // do not call the program again
          > }
          > } while (!getStopped()); // was while true
          >
          > }
          > catch(Exception e) { file://TokenStream IO exceptions or
          CharStreamExceptions
          > Debug.out(ExceptionBuffer.getStackTrace(e), Debug.STACKTRACE, 0);
          > // Close stream on IO errors
          > if (e instanceof antlr.TokenStreamIOException) {
          > Debug.out("TokenStreamIOException: one connection is lost",MAJOR,0);
          > listener.cancelConnection();
          > }
          > else {
          > Debug.out("other exception:",MAJOR,0);
          > TimeFrameException tfe =
          > TimeFrameException.makeTimeFrameException(MAJOR,EXCEPTION,e.toString() +
          > e.getMessage());
          > Errors error = new Errors(tfe);
          > beaconit.serverutils.CommandLog.log.printCommand(sessionNo,
          > connectionNo, error.toLogText());
          > ErrorLog.log.printError(sessionNo, connectionNo, error, "");
          > stack.add(error);
          > }
          > }
          >
          >
          > This is how I stop the input thread
          > /**
          > * This method is used to set the stop flag.
          > */
          > synchronized public void stopThread() {
          > Debug.out("stopThread is called in InputThread " +
          connectionNo,MAJOR,0);
          > stopped = true;
          > }
          >
          > /**
          > * Synchronized method returns the status of stop flag.
          > * @return the status of stop flag.
          > */
          > synchronized public boolean getStopped() {
          > return stopped;
          > }
          >
          > ----- Original Message -----
          > From: "Stdiobe" <stdiobe@...>
          > To: <antlr-interest@yahoogroups.com>
          > Sent: Saturday, June 30, 2001 4:47 AM
          > Subject: [antlr-interest] how do i skip unmatched characters?
          >
          >
          > >
          > > Hi,
          > >
          > > when the lexer generated by ANTLR encounters an unmatched character,
          > > it throws a TokenStreamRecognitionException which causes my lexer
          > > to exit (and also my parser).
          > >
          > > Does anyone know how I can skip unmatched characters in the lexer
          > > by reporting the error to the user (with linenumber, etc.), and have
          > > the lexer continue scanning for valid tokens.
          > >
          > > Stdiobe.
          > >
          > >
          > >
          > >
          > >
          > >
          > > Your use of Yahoo! Groups is subject to
          http://docs.yahoo.com/info/terms/
          > >
          > >
          >
          >
          >
          >
          > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
          >
          >
        • Ric Klaren
          Hi, ... How did you setup your exception handling in the lexer? e.g. defaultErrorHandler true or false? My guess is that you have it turned off? (As aside.. I
          Message 4 of 19 , Jul 2, 2001
          • 0 Attachment
            Hi,

            On Fri, Jun 29, 2001 at 08:47:04PM +0200, Stdiobe wrote:
            > when the lexer generated by ANTLR encounters an unmatched character,
            > it throws a TokenStreamRecognitionException which causes my lexer
            > to exit (and also my parser).

            How did you setup your exception handling in the lexer? e.g.
            defaultErrorHandler true or false? My guess is that you have it turned off?

            (As aside.. I think it is kindoff a deficiency that you can't specify an
            error handler for the nextToken rule?)

            > Does anyone know how I can skip unmatched characters in the lexer
            > by reporting the error to the user (with linenumber, etc.), and have
            > the lexer continue scanning for valid tokens.

            In my experience this is the default behaviour:

            -------snip---
            Parsing...

            int a�;
            line 1:6: unexpected char: 0xE9
            ------snip----

            (the 0xE9 is devel snapshot output of 'unprintable' chars, parsing continues
            after this error)

            This is with the following relevant options:

            charVocabulary= '\u0000' .. '\u00FF';
            defaultErrorHandler = true;

            If you need defaultErrorHandler false you need to fix the error outside the
            lexer (the reason for my aside remark above). I'm not 100% sure where in
            the path to the parser the most practical place is to catch this beast.
            Relevant files would be the inputbuffer/charbuffer/sharedinputstate files.
            Maybe its better to have defaultErrorHandler set to true and override per
            rule (for as far as possible).

            Groetsels,

            Ric
            --
            -----+++++*****************************************************+++++++++-------
            ---- Ric Klaren ----- klaren@... ----- +31 53 4893722 ----
            -----+++++*****************************************************+++++++++-------
            Why don't we just invite them to dinner and massacre them all when they're
            drunk? You heard the man. There's seven hundred thousand of them.
            Ah? ... So it'd have to be something simple with pasta, then.
            --- From: Interesting Times by Terry Pratchet
            -----+++++*****************************************************+++++++++-------
          • Stdiobe
            Ric, ... off? I don t use the defaultErrorHandler option so I m getting the default behaviour, which I thought was on , but I just checked and it turns out
            Message 5 of 19 , Jul 2, 2001
            • 0 Attachment
              Ric,

              > How did you setup your exception handling in the lexer? e.g.
              > defaultErrorHandler true or false? My guess is that you have it turned
              off?

              I don't use the "defaultErrorHandler" option so I'm getting the default
              behaviour, which I thought was "on", but I just checked and it turns
              out the default behaviour is "off"! That sucks!

              After specifying defaultErrorHandler = "true" I get the desired behaviour.

              I've checked the source code of ANTLR and found that class LexerGrammar
              turns the defaultErrorHandler off. That's not mentioned anywhere in the
              documentation! Also, I can't think of any reason why this is turned "off" by
              default!

              Many thanks for the advise!
              Stdiobe

              > -------snip---
              > Parsing...
              >
              > int aé;
              > line 1:6: unexpected char: 0xE9
              > ------snip----
              >
              > (the 0xE9 is devel snapshot output of 'unprintable' chars, parsing
              continues
              > after this error)
              >
              > This is with the following relevant options:
              >
              > charVocabulary= '\u0000' .. '\u00FF';
              > defaultErrorHandler = true;
              >
              > If you need defaultErrorHandler false you need to fix the error outside
              the
              > lexer (the reason for my aside remark above). I'm not 100% sure where in
              > the path to the parser the most practical place is to catch this beast.
              > Relevant files would be the inputbuffer/charbuffer/sharedinputstate files.
              > Maybe its better to have defaultErrorHandler set to true and override per
              > rule (for as far as possible).
              >
              > Groetsels,
              >
              > Ric
              > --
              > -----+++++*****************************************************+++++++++--
              -----
              > ---- Ric Klaren ----- klaren@... ----- +31 53 4893722 ----
              > -----+++++*****************************************************+++++++++--
              -----
              > Why don't we just invite them to dinner and massacre them all when
              they're
              > drunk? You heard the man. There's seven hundred thousand of them.
              > Ah? ... So it'd have to be something simple with pasta, then.
              > --- From: Interesting Times by Terry
              Pratchet
              > -----+++++*****************************************************+++++++++--
              -----
              >
              >
              >
              > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
              >
              >
            • Ric Klaren
              ... Doh! What version of antlr are you using one of my snapshots? According to the docs it should be on by default? Ric -- ... Why don t we just invite them to
              Message 6 of 19 , Jul 2, 2001
              • 0 Attachment
                On Mon, Jul 02, 2001 at 05:55:21PM +0200, Stdiobe wrote:
                > I don't use the "defaultErrorHandler" option so I'm getting the default
                > behaviour, which I thought was "on", but I just checked and it turns
                > out the default behaviour is "off"! That sucks!
                >
                > After specifying defaultErrorHandler = "true" I get the desired behaviour.
                >
                > I've checked the source code of ANTLR and found that class LexerGrammar
                > turns the defaultErrorHandler off. That's not mentioned anywhere in the
                > documentation! Also, I can't think of any reason why this is turned "off" by
                > default!

                Doh! What version of antlr are you using one of my snapshots? According to
                the docs it should be on by default?

                Ric
                --
                -----+++++*****************************************************+++++++++-------
                ---- Ric Klaren ----- klaren@... ----- +31 53 4893722 ----
                -----+++++*****************************************************+++++++++-------
                Why don't we just invite them to dinner and massacre them all when they're
                drunk? You heard the man. There's seven hundred thousand of them.
                Ah? ... So it'd have to be something simple with pasta, then.
                --- From: Interesting Times by Terry Pratchet
                -----+++++*****************************************************+++++++++-------
              • Stdiobe
                Ric, ... I m using standard 2.7.1, but it s also turned off in the patched versions (and also in version 2.6.0). I just checked the documentation again and it
                Message 7 of 19 , Jul 2, 2001
                • 0 Attachment
                  Ric,

                  > Doh! What version of antlr are you using one of my snapshots? According to
                  > the docs it should be on by default?

                  I'm using standard 2.7.1, but it's also turned off in the patched versions
                  (and also in version 2.6.0).

                  I just checked the documentation again and it seems the documentation
                  doesn't literally say it's "on" but it does suggest it in "err.html" under
                  "Default
                  Exception Handling in the Lexer":

                  "Normally you want the lexer to keep trying to get a valid token upon
                  lexical error. That way, the parser doesn't have to deal with lexical
                  errors
                  and ask for another token ...... To get ANTLR to generate lexers that
                  pass on RecognitionException's to the parser as TokenStreamException's,
                  use the defaultErrorHandler=false grammar option."

                  However, the documentation in "options.html" mentions that "ANTLR
                  will generate default exception handling code for a parser or tree-parser
                  rule".
                  It doesn't say that a default error handler is generated for lexers, so it
                  seems I was wrong with my assumption that Antlr will also generate
                  a default error handler for lexers (although I still think the current
                  behaviour
                  is not logical).

                  Note that the lexer of Antlr itself (in antlr.g) doesn't use a default
                  error handler. Sometimes I get unmatched character exceptions with
                  ANTLR where it doesn't report the linenumber. Turning error handling
                  on would solve that problem.

                  Again, thanks for your advise!

                  Stdiobe.
                • Matthew Ford
                  you can also use it to resync inside programs where there is a statement delimiter (such as ; in C and Java) ... From: Stdiobe To:
                  Message 8 of 19 , Jul 2, 2001
                  • 0 Attachment
                    you can also use it to resync inside programs where there is a statement
                    delimiter (such as ; in C and Java)

                    ----- Original Message -----
                    From: "Stdiobe" <stdiobe@...>
                    To: <antlr-interest@yahoogroups.com>
                    Sent: Monday, July 02, 2001 11:46 PM
                    Subject: Re: [antlr-interest] how do i skip unmatched characters?


                    >
                    > Matthew,
                    >
                    > thanks for your response!
                    >
                    > If I understand your solution correctly, you catch the exception when
                    > the parser exits and restart the parser until you get correct input.
                    >
                    > That wouldn't work in my case, because my parser expects a "complete"
                    > program to parse, so the lexer should NOT return an unexpected character
                    > exception to the parser, but instead should capture it itself.
                    >
                    > ----- Original Message -----
                    > From: Matthew Ford <Matthew.Ford@...>
                    > To: <antlr-interest@yahoogroups.com>
                    > Sent: Saturday, June 30, 2001 1:05 AM
                    > Subject: Re: [antlr-interest] how do i skip unmatched characters?
                    >
                    >
                    > > I had this problem when handling commands comming in via telnet
                    > > I think this should be a common requirement. There should be an FAQ
                    about
                    > > it.
                    > > matthew
                    > >
                    > > This is what I did to skip to the next ; and then continue parsing
                    > > (ErrorLog and Debug are my own reporting logs)
                    > >
                    > >
                    > > In the Parser
                    > >
                    > > options {
                    > > k = 1; // one token lookahead
                    > > defaultErrorHandler = false; // Don't generate parser error
                    handlers
                    > > buildAST = true;
                    > > importVocab = TIMEFRAMECOMMONLEXER;
                    > > exportVocab = TIMEFRAMESERVERPARSER;
                    > >
                    > > }
                    > >
                    > >
                    > > {
                    > >
                    > > ICommandGeneratorServer commandGenerator;
                    > > TimeFrameCommonLexer TFlexer;
                    > > static String nl = System.getProperty("line.separator","\n");
                    > >
                    > > final static int MAJOR = 401;
                    > > final static int TRANSLATION_ERROR = 1;
                    > > final static int FIELD_QUALIFIER_ERROR = 2;
                    > > final static int QUERY_NOT_DEFINED = 3;
                    > > final static int NO_AGE_INDEX = 4;
                    > >
                    > > public TimeFrameCommandsParser(TimeFrameCommonLexer lexer,
                    > > ICommandGeneratorServer commandGenerator, IScope iScope) {
                    > > this(lexer);
                    > > TFlexer = lexer;
                    > > this.commandGenerator = commandGenerator;
                    > > this.iScope = iScope;
                    > > }
                    > >
                    > > public void reportError(ANTLRException ex) {
                    > > Debug.out("in reportError", MAJOR, 0);
                    > >
                    > > commandGenerator.translationError(TFlexer.getLineBuffer(),
                    > >
                    > >
                    >
                    0,0,null,null,TimeFrameException.makeTimeFrameException(MAJOR,TRANSLATION_ER
                    > > ROR,ex.getMessage()));
                    > > Debug.out("******* end reportError *******", MAJOR, 0);
                    > > }
                    > >
                    > >
                    > >
                    > > public void processError(ANTLRException ex) throws
                    TokenStreamException,
                    > > CharStreamException {
                    > > // actually only throws TokenStreamIOException others caught here
                    > > int tokenType=0;
                    > > LexerSharedInputState inputState = TFlexer.getInputState();
                    > > inputState.guessing = 0; // clear guessing mode
                    > > Debug.out("in processError", MAJOR, 0);
                    > > if (!errorFlag) { // first error
                    > > reportError(ex);
                    > > errorFlag=true; // block new errors until after syncing.
                    > > }
                    > >
                    > > do {
                    > > try {
                    > > if (ex instanceof TokenStreamRecognitionException) {
                    > > TokenStreamRecognitionException rex =
                    > > (TokenStreamRecognitionException)ex;
                    > > // get underlying exception
                    > > ex = null; // have handled this one now
                    > > if ((rex.recog instanceof MismatchedCharException) ||
                    > > (rex.recog instanceof NoViableAltForCharException)) {
                    > > try {
                    > > TFlexer.consume(); // remove current error char;
                    > > } catch (CharStreamException cse) {
                    > > if ( cse instanceof CharStreamIOException ) {
                    > > throw new
                    TokenStreamIOException(((CharStreamIOException)cse).io);
                    > > } else {
                    > > throw new TokenStreamIOException(new
                    > IOException(cse.getMessage()));
                    > > }
                    > > }
                    > > }
                    > > }
                    > >
                    > > tokenType = LA(1);
                    > > if ((tokenType != EOF) && (tokenType != SEMI)) {
                    > > consume(); // remove ;
                    > > Debug.out("Input buffer:'"+TFlexer.getLineBuffer()+"'", MAJOR,
                    > 0);
                    > > }
                    > >
                    > > } catch (TokenStreamRecognitionException ex1) {
                    > > ex = ex1; // and loop
                    > > // TFlexer.consume(); // remove current error char;
                    > > Debug.out("** found :"+ ex1, MAJOR, 0);
                    > > } catch (TokenStreamRetryException ex1) {
                    > > Debug.out("** found :"+ ex1, MAJOR, 0);
                    > > throw new TokenStreamIOException(new
                    IOException(ex1.getMessage()));
                    > > }
                    > > } while ( tokenType != SEMI && tokenType != EOF && !isEOF());
                    > > Debug.out("** end processError *******", MAJOR, 0);
                    > > // if telnet print prompt again (How??)
                    > >
                    > > }
                    > >
                    > > private boolean errorFlag = false;
                    > >
                    > > private boolean eofFlag = false;
                    > >
                    > > public boolean isEOF() {
                    > > return eofFlag;
                    > > }
                    > >
                    > > private void clearErrorFlag() {
                    > > errorFlag = false;
                    > > }
                    > >
                    > >
                    > >
                    > > After the SEMI is seen I expect to find a new statement
                    > > if I do then after I find a valid statement I call clearErrorFlag()
                    > >
                    > > // SetAttribute("attribute1","attributeValue");
                    > > setattribute!
                    > > : SETATTRIBUTE
                    > > LPAREN attr:STRING_LITERAL COMMA value:STRING_LITERAL RPAREN SEMI
                    > > { clearErrorFlagAndScope();
                    > >
                    >
                    commandGenerator.setAttribute(TFlexer.getLineBuffer(),attr.getText(),value.g
                    > > etText());
                    > > }
                    > > ;
                    > >
                    > >
                    > >
                    > > Finally to tie it all together
                    > >
                    > > In the main program I start the parser like this (in its own thread)
                    > > The command the parser finds are put on a command stack to be handled by
                    > > another thread. This lets you issue cancel commands at any time.
                    > >
                    > > /**
                    > > * The method reads in commands one by one.
                    > > */
                    > > public void run() {
                    > > Debug.out(""+GlobalData.nl+" ---------- InputThread " + connectionNo +
                    "
                    > > starts.",MAJOR,0);
                    > >
                    > > try {
                    > > do {
                    > > try {
                    > > Debug.out("InputThread Call Parser",MAJOR,0);
                    > > parser.program();
                    > > } catch (RecognitionException ex) {
                    > > ErrorLog.log.println("RecognitionException: "+ ex.getMessage());
                    > > Debug.out("InputThread RecognitionException: "+
                    > > ex.getMessage(),MAJOR,0);
                    > > parser.processError(ex);
                    > > } catch (TokenStreamRecognitionException ex) {
                    > > ErrorLog.log.println("TokenStreamRecognitionException: " +
                    > > ex.getMessage());
                    > > Debug.out("InputThread TokenStreamRecognitionException: " +
                    > > ex.getMessage(),MAJOR,0);
                    > > parser.processError(ex);
                    > > } catch (TokenStreamRetryException ex) {
                    > > ErrorLog.log.println("TokenStreamRetryException: " +
                    > ex.getMessage());
                    > > Debug.out("InputThread TokenStreamRetryException: " +
                    > > ex.getMessage(),MAJOR,0);
                    > > parser.processError(ex);
                    > > } catch(TokenStreamIOException ex) {
                    > > Debug.out("InputThread TokenStreamIOException: " +
                    > > ex.getMessage(),MAJOR,0);
                    > > if (getStopped()) {
                    > > break;
                    > > }
                    > > }
                    > > if (parser.isEOF()) {
                    > > Debug.out("parser found EOF *****************",MAJOR,0);
                    > > break; // do not call the program again
                    > > }
                    > > } while (!getStopped()); // was while true
                    > >
                    > > }
                    > > catch(Exception e) { file://TokenStream IO exceptions or
                    > CharStreamExceptions
                    > > Debug.out(ExceptionBuffer.getStackTrace(e), Debug.STACKTRACE, 0);
                    > > // Close stream on IO errors
                    > > if (e instanceof antlr.TokenStreamIOException) {
                    > > Debug.out("TokenStreamIOException: one connection is lost",MAJOR,0);
                    > > listener.cancelConnection();
                    > > }
                    > > else {
                    > > Debug.out("other exception:",MAJOR,0);
                    > > TimeFrameException tfe =
                    > > TimeFrameException.makeTimeFrameException(MAJOR,EXCEPTION,e.toString() +
                    > > e.getMessage());
                    > > Errors error = new Errors(tfe);
                    > > beaconit.serverutils.CommandLog.log.printCommand(sessionNo,
                    > > connectionNo, error.toLogText());
                    > > ErrorLog.log.printError(sessionNo, connectionNo, error, "");
                    > > stack.add(error);
                    > > }
                    > > }
                    > >
                    > >
                    > > This is how I stop the input thread
                    > > /**
                    > > * This method is used to set the stop flag.
                    > > */
                    > > synchronized public void stopThread() {
                    > > Debug.out("stopThread is called in InputThread " +
                    > connectionNo,MAJOR,0);
                    > > stopped = true;
                    > > }
                    > >
                    > > /**
                    > > * Synchronized method returns the status of stop flag.
                    > > * @return the status of stop flag.
                    > > */
                    > > synchronized public boolean getStopped() {
                    > > return stopped;
                    > > }
                    > >
                    > > ----- Original Message -----
                    > > From: "Stdiobe" <stdiobe@...>
                    > > To: <antlr-interest@yahoogroups.com>
                    > > Sent: Saturday, June 30, 2001 4:47 AM
                    > > Subject: [antlr-interest] how do i skip unmatched characters?
                    > >
                    > >
                    > > >
                    > > > Hi,
                    > > >
                    > > > when the lexer generated by ANTLR encounters an unmatched character,
                    > > > it throws a TokenStreamRecognitionException which causes my lexer
                    > > > to exit (and also my parser).
                    > > >
                    > > > Does anyone know how I can skip unmatched characters in the lexer
                    > > > by reporting the error to the user (with linenumber, etc.), and have
                    > > > the lexer continue scanning for valid tokens.
                    > > >
                    > > > Stdiobe.
                    > > >
                    > > >
                    > > >
                    > > >
                    > > >
                    > > >
                    > > > Your use of Yahoo! Groups is subject to
                    > http://docs.yahoo.com/info/terms/
                    > > >
                    > > >
                    > >
                    > >
                    > >
                    > >
                    > > Your use of Yahoo! Groups is subject to
                    http://docs.yahoo.com/info/terms/
                    > >
                    > >
                    >
                    >
                    >
                    >
                    > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                    >
                    >
                  • Ric Klaren
                    Hi, People should the default errorhandler be on in lexers ? (like it is with all the other parsers?) If I m not receiving counterarguments I m gonna change
                    Message 9 of 19 , Jul 3, 2001
                    • 0 Attachment
                      Hi,

                      People should the default errorhandler be on in lexers ? (like it is with
                      all the other parsers?)

                      If I'm not receiving counterarguments I'm gonna change it. So it's
                      consistent with the rest of the behaviour of the tool.

                      (or am I jumping the gun =) )

                      Ric

                      On Mon, Jul 02, 2001 at 06:56:44PM +0200, Stdiobe wrote:
                      > > Doh! What version of antlr are you using one of my snapshots? According to
                      > > the docs it should be on by default?

                      > I just checked the documentation again and it seems the documentation
                      > doesn't literally say it's "on" but it does suggest it in "err.html" under
                      > "Default Exception Handling in the Lexer":
                      >
                      > "Normally you want the lexer to keep trying to get a valid token upon
                      > lexical error. That way, the parser doesn't have to deal with lexical
                      > errors and ask for another token ...... To get ANTLR to generate lexers
                      > that pass on RecognitionException's to the parser as
                      > TokenStreamException's, use the defaultErrorHandler=false grammar option."
                      >
                      > However, the documentation in "options.html" mentions that "ANTLR will
                      > generate default exception handling code for a parser or tree-parser rule".

                      Hmm looks like someone changed their mind somewhere along the lines.

                      > It doesn't say that a default error handler is generated for lexers, so it
                      > seems I was wrong with my assumption that Antlr will also generate a
                      > default error handler for lexers (although I still think the current
                      > behaviour is not logical).

                      I think I agree on that. It didn't bite with my lexer me because I usually
                      don't rely too much on defaults and put the stuff in the options header
                      anyway.

                      > Note that the lexer of Antlr itself (in antlr.g) doesn't use a default
                      > error handler. Sometimes I get unmatched character exceptions with ANTLR
                      > where it doesn't report the linenumber. Turning error handling on would
                      > solve that problem.

                      Which would not be a bad thing (tm)

                      Ric
                      --
                      -----+++++*****************************************************+++++++++-------
                      ---- Ric Klaren ----- klaren@... ----- +31 53 4893722 ----
                      -----+++++*****************************************************+++++++++-------
                      Why don't we just invite them to dinner and massacre them all when they're
                      drunk? You heard the man. There's seven hundred thousand of them.
                      Ah? ... So it'd have to be something simple with pasta, then.
                      --- From: Interesting Times by Terry Pratchet
                      -----+++++*****************************************************+++++++++-------
                    • Terence Parr
                      ... Unless it s a bug, we should discuss changes to behavior I think. Anyway...I m can t remember my reasons and I m foggy at the moment, but lexers are
                      Message 10 of 19 , Jul 3, 2001
                      • 0 Attachment
                        Tuesday, July 03, 2001, Ric Klaren hath spoken:
                        > Hi,

                        > People should the default errorhandler be on in lexers ? (like it is with
                        > all the other parsers?)

                        > If I'm not receiving counterarguments I'm gonna change it. So it's
                        > consistent with the rest of the behaviour of the tool.

                        > (or am I jumping the gun =) )

                        Unless it's a bug, we should discuss changes to behavior I think.

                        Anyway...I'm can't remember my reasons and I'm foggy at the moment,
                        but lexers are different in the sense that you don't want the errors
                        to be trapped in the rules I think--all output of the lexer goes thru
                        the nextToken method. If an error is trapped in a rule, it will return
                        with bogus information and most importantly w/o knowledge that an
                        error occurred. nextToken will return bogus tokens to the parser.
                        Unless the lexer is very complicated, it's usually ok to just say
                        "this text 'xxx' is bogus on line n."

                        Note that I specifically turn ON default handling often in protected
                        rules (note these are not invoked directly by the nextToken method,
                        hence, avoiding the abovementioned problem). In these rules, such as
                        the args for an HTML tag, I often want to say "bogus image tag
                        argument on line n" and keep going.

                        So, when I want to detect errors WITHIN a token and keep going to
                        return some valid token to the parser (fault tolerance) I use the
                        default handlers or specify one for a protected rule. Ok, i've
                        convinced myself that the current behavior is appropriate. Somebody
                        could convince me though that they should be on for protected rules by
                        default, but these rules are already confusing enough for people ;)

                        Counter examples?

                        Thanks,
                        Ter
                        --
                        Chief Scientist & Co-founder, http://www.jguru.com
                        Co-founder, http://www.NoWebPatents.org -- Stop Patent Stupidity
                        parrt@...
                      • Stdiobe
                        Ter, ... With a DFA based lexer (like LEX) behaviour is simple: either a pattern matches completely or it doesn t. If it doesn t then the unmatched character
                        Message 11 of 19 , Jul 3, 2001
                        • 0 Attachment
                          Ter,

                          > Counter examples?

                          With a DFA based lexer (like LEX) behaviour is simple: either a pattern
                          matches completely or it doesn't. If it doesn't then the unmatched character
                          will be reported and the lexer tries again.

                          This is the kind of behaviour I expect from a lexer, including the
                          Antlr lexer.

                          However, the Antlr lexer works differently:

                          - "defaultErrorHandler=false" will ensure that patterns are always matched
                          completely. However, unmatched characters are not caught by the
                          lexer, but instead result in a TokenStreamRecognitionException that will
                          be passed on to the parser (which will pass it on to the caller of the
                          parser).

                          To illustrate my point: just put an illegal character like '$' somewhere
                          in an ANTLR grammar (outside an action) and run ANTLR. It will
                          report a token stream exception, but will not mention on what line,
                          and will exit.

                          - "defaultErrorHandler=true" will ensure that unmatched characters are
                          properly caught and reported. However, it also causes protected
                          lexer rules to match bogus input which should not happen (unless
                          specifically specified by the developer, as you describe in your
                          comment).

                          For me the best solution would be to turn defaultErrorHandler "off" by
                          default, but with the possibility of catching RecognitionExceptions within
                          the nextToken method, and having the lexer retry.

                          But, I have no idea how I can catch a RecognitionException in the
                          nextToken method which is completely generated by the lexer.
                          If that would be possibly (and preferably default behaviour) then
                          defaultErrorHandler should be "off" as default.

                          However, without that possibility I think the default behaviour for
                          defaultErrorHandler should be "on". If needed I can turn it off for
                          protected rules to force exact matches (and if I don't, then mismatches
                          will at least be reported to the user so they know something is wrong,
                          and they get file/line information for locating the error).

                          Note: whatever the default behaviour, it should be well documented.
                          After reading the documentation I (incorrectly) assumed the default
                          behaviour was "on", but after reading more carefully I learned that
                          the documentation doesn't explicitly say that it is "on" (neither does
                          it explicitly say it's "off" ...).

                          Stdiobe.
                        • Matthew Mower
                          On Tue, 3 Jul 2001 23:39:24 +0200, you wrote in ... Could you subclass the generated lexer and override nextToken() to implement your exception catching?
                          Message 12 of 19 , Jul 4, 2001
                          • 0 Attachment
                            On Tue, 3 Jul 2001 23:39:24 +0200, you wrote in
                            <014c01c10408$d5e61640$0d412d3e@daemon>:

                            >But, I have no idea how I can catch a RecognitionException in the
                            >nextToken method which is completely generated by the lexer.
                            >If that would be possibly (and preferably default behaviour) then
                            >defaultErrorHandler should be "off" as default.
                            >

                            Could you subclass the generated lexer and override nextToken() to
                            implement your exception catching?

                            Regards,

                            Matt.

                            -----
                            Matt Mower, Development Manager, MetaDyne Ltd, 44-1895-254254

                            "Never play cards with a man named 'Doc'." - Nelson Algren
                          • Stdiobe
                            Matt, ... I have thought about that idea, but due to the large number of lexers that I (will) have, I considered this to be not very practical. (I m developing
                            Message 13 of 19 , Jul 4, 2001
                            • 0 Attachment
                              Matt,

                              > Could you subclass the generated lexer and override nextToken() to
                              > implement your exception catching?

                              I have thought about that idea, but due to the large number of lexers that
                              I (will) have, I considered this to be not very practical.

                              (I'm developing parsers for several programming languages, each
                              with multiple dialects, preprocessors, etc., so that will result in a large
                              number of lexers. I don't want to solve this problem in each (derived)
                              lexer, although that's still an option.)

                              I would prefer a solution where ANTLR generates the desired behaviour,
                              i.e. catch RecognitionExceptions in the nextToken method.
                              (I know, I could change the Antlr generator, but before I do that, I want
                              to be sure that I'm not re-inventing the wheel).

                              Stdiobe
                            • Stdiobe
                              Ter, ... I finally figured it out .... the answer was there all the time; I just didn t see it. All that was needed was: - filter = UNKNOWN_TOKEN -
                              Message 14 of 19 , Jul 4, 2001
                              • 0 Attachment
                                Ter,

                                > However, without that possibility I think the default behaviour for
                                > defaultErrorHandler should be "on". If needed I can turn it off for
                                > protected rules to force exact matches (and if I don't, then mismatches
                                > will at least be reported to the user so they know something is wrong,
                                > and they get file/line information for locating the error).

                                I finally figured it out .... the answer was there all the time; I just
                                didn't see it.

                                All that was needed was:
                                - "filter = UNKNOWN_TOKEN"
                                - UNKNOWN_TOKEN : . {issue message} ;
                                - don't call setCommitToPath so the lexer will try again.

                                Given this solution, I must conclude that the current behaviour of the
                                lexer is indeed desired and no changes in the generator are needed.

                                However, it would be nice if:
                                - the documentation was more clear about the error handling behaviour
                                of the lexer (i.e. mention explicitly that lexer does not default to error
                                handling).
                                - the Antlr generator itself should include such a rule, so that unmatched
                                characters are reported including linenumber.

                                Sorry for the trouble, thanks for your response,

                                Stdiobe.
                              • Stdiobe
                                Gary Schaps, I finally figured out your solution. Seems you were right all along! Thanks. Stdiobe
                                Message 15 of 19 , Jul 4, 2001
                                • 0 Attachment
                                  Gary Schaps,

                                  I finally figured out your solution. Seems you were right all along! Thanks.

                                  Stdiobe

                                  > > Something like this might work. You'll have to define
                                  > > "unmatched character" more precisely of course.
                                  > >
                                  > > class MyLexer extends Lexer;
                                  > > options {
                                  > > k=2;
                                  > > filter = IGNORE;
                                  > > charVocabulary = '\3'..'\177';
                                  > > }
                                • Ric Klaren
                                  Hi, Finaly found the time to answer this one with thinking ... ... Yup! ... Ack that s why I did this RFC thing =) ... Yup. ... Aha. ... Only problem is that
                                  Message 16 of 19 , Jul 6, 2001
                                  • 0 Attachment
                                    Hi,

                                    Finaly found the time to answer this one with thinking ...

                                    On Tue, Jul 03, 2001 at 11:08:50AM -0700, Terence Parr wrote:
                                    > > (or am I jumping the gun =) )

                                    Yup!

                                    > Unless it's a bug, we should discuss changes to behavior I think.

                                    Ack that's why I did this RFC thing =)

                                    > Anyway...I'm can't remember my reasons and I'm foggy at the moment, but
                                    > lexers are different in the sense that you don't want the errors to be
                                    > trapped in the rules I think--all output of the lexer goes thru the
                                    > nextToken method.

                                    Yup.

                                    > If an error is trapped in a rule, it will return with bogus information and
                                    > most importantly w/o knowledge that an error occurred. nextToken will
                                    > return bogus tokens to the parser. Unless the lexer is very complicated,
                                    > it's usually ok to just say "this text 'xxx' is bogus on line n."

                                    Aha.

                                    > So, when I want to detect errors WITHIN a token and keep going to return
                                    > some valid token to the parser (fault tolerance) I use the default handlers
                                    > or specify one for a protected rule.

                                    Only problem is that you can't specify a errorhandler for the nextToken
                                    rule... So if you want unexpected char's reported inside your lexer without
                                    going back to the parser (which is not practical in some cases). You
                                    a) have to specify defaultErrorhandler = true; and maybe in lot's of other
                                    places defaultErrorHandler = false; (AFAIK only way to get
                                    defaulterrorhandler in just the nextToken rule)
                                    b) use the filter rule 'hack' which is IMHO not the most intuitive way to deal
                                    with these things. (faq's on this topic are shortish)

                                    > Ok, i've convinced myself that the current behavior is appropriate.

                                    Me as well =) but with the above notes.

                                    I guess we should do a few documentation fixes with respect to this. Maybe
                                    add a section on skipping/reporting on unrecognized chars in the lexer.

                                    I've been thinking in extending the grammar to allow a:

                                    class MyParser extends Parser;
                                    options {
                                    ...
                                    }
                                    exception catch [ ... ] { .. }

                                    Syntax for at least (tree)parsers so you can specify a different
                                    defaultErrorhandler for all rules (this should work nicely together with
                                    Ernest's $lookaheadSet patch).

                                    For a lexer we could then modify the behaviour to change the errorhandler
                                    for nextToken?

                                    Any thoughts?

                                    Ric
                                    --
                                    -----+++++*****************************************************+++++++++-------
                                    ---- Ric Klaren ----- klaren@... ----- +31 53 4893722 ----
                                    -----+++++*****************************************************+++++++++-------
                                    Why don't we just invite them to dinner and massacre them all when they're
                                    drunk? You heard the man. There's seven hundred thousand of them.
                                    Ah? ... So it'd have to be something simple with pasta, then.
                                    --- From: Interesting Times by Terry Pratchet
                                    -----+++++*****************************************************+++++++++-------
                                  • Terence Parr
                                    ... Yes, I definitely think we should fix the documentation and figure out how to specify exception handling for nextToken. NOte that somebody correctly
                                    Message 17 of 19 , Jul 19, 2001
                                    • 0 Attachment
                                      Friday, July 06, 2001, Ric Klaren hath spoken:
                                      > Hi,

                                      > Finaly found the time to answer this one with thinking ...

                                      > On Tue, Jul 03, 2001 at 11:08:50AM -0700, Terence Parr wrote:
                                      >> > (or am I jumping the gun =) )

                                      > Yup!

                                      >> Unless it's a bug, we should discuss changes to behavior I think.

                                      > Ack that's why I did this RFC thing =)

                                      >> Anyway...I'm can't remember my reasons and I'm foggy at the moment, but
                                      >> lexers are different in the sense that you don't want the errors to be
                                      >> trapped in the rules I think--all output of the lexer goes thru the
                                      >> nextToken method.

                                      > Yup.

                                      >> If an error is trapped in a rule, it will return with bogus information and
                                      >> most importantly w/o knowledge that an error occurred. nextToken will
                                      >> return bogus tokens to the parser. Unless the lexer is very complicated,
                                      >> it's usually ok to just say "this text 'xxx' is bogus on line n."

                                      > Aha.

                                      >> So, when I want to detect errors WITHIN a token and keep going to return
                                      >> some valid token to the parser (fault tolerance) I use the default handlers
                                      >> or specify one for a protected rule.

                                      > Only problem is that you can't specify a errorhandler for the nextToken
                                      > rule... So if you want unexpected char's reported inside your lexer without
                                      > going back to the parser (which is not practical in some cases). You
                                      > a) have to specify defaultErrorhandler = true; and maybe in lot's of other
                                      > places defaultErrorHandler = false; (AFAIK only way to get
                                      > defaulterrorhandler in just the nextToken rule)
                                      > b) use the filter rule 'hack' which is IMHO not the most intuitive way to deal
                                      > with these things. (faq's on this topic are shortish)

                                      >> Ok, i've convinced myself that the current behavior is appropriate.

                                      > Me as well =) but with the above notes.

                                      > I guess we should do a few documentation fixes with respect to this. Maybe
                                      > add a section on skipping/reporting on unrecognized chars in the lexer.

                                      > I've been thinking in extending the grammar to allow a:

                                      > class MyParser extends Parser;
                                      > options {
                                      > ...
                                      > }
                                      > exception catch [ ... ] { .. }

                                      > Syntax for at least (tree)parsers so you can specify a different
                                      > defaultErrorhandler for all rules (this should work nicely together with
                                      > Ernest's $lookaheadSet patch).

                                      > For a lexer we could then modify the behaviour to change the errorhandler
                                      > for nextToken?

                                      Yes, I definitely think we should fix the documentation and figure out
                                      how to specify exception handling for nextToken. NOte that somebody
                                      correctly figured out that the filter=UNKNOWN_TOKEN option gives you
                                      the desired behavior: all errors go to the UNKNOWN_TOKEN rule. Is
                                      that cool for now? I.e., add a FAQ entry / update the DOC?

                                      BTW, do people find the FAQ useful http://www.jguru.com/faq/ANTLR ?

                                      Ter
                                      --
                                      Chief Scientist & Co-founder, http://www.jguru.com
                                      Co-founder, http://www.NoWebPatents.org -- Stop Patent Stupidity
                                      parrt@...
                                    Your message has been successfully submitted and would be delivered to recipients shortly.