Loading ...
Sorry, an error occurred while loading the content.
 

Re: [antlr-interest] how do i skip unmatched characters?

Expand Messages
  • Stdiobe
    Gary, thanks for your response! I ve tried your solution but it also skips ignore-characters in tokens being matched, which is not what I want. From the
    Message 1 of 19 , Jul 2, 2001
      Gary,

      thanks for your response!

      I've tried your solution but it also skips ignore-characters in tokens
      being matched, which is not what I want.

      From the documentation I understand I can use "setCommitToPath"
      to prevent this from happening but then I would have to add this to
      every rule :-(

      But, if I can't find any other solution ....

      > Something like this might work. You'll have to define
      > "unmatched character" more precisely of course.
      >
      > class MyLexer extends Lexer;
      > options {
      > k=2;
      > filter = IGNORE;
      > charVocabulary = '\3'..'\177';
      > }
      >
      > ....
      >
      > protected
      > IGNORE
      > : '\3'..'\177'
      > {
      > newline();
      > System.err.println("Error: unmatched character "
      > + "\"" + getText() + "\" line: " + getLine());
      > }
      > ;
      >
      > Gary Schaps
      >
      > Stdiobe wrote:
      >
      > > Hi,
      > >
      > > when the lexer generated by ANTLR encounters an unmatched character,
      > > it throws a TokenStreamRecognitionException which causes my lexer
      > > to exit (and also my parser).
      > >
      > > Does anyone know how I can skip unmatched characters in the lexer
      > > by reporting the error to the user (with linenumber, etc.), and have
      > > the lexer continue scanning for valid tokens.
    • Stdiobe
      Matthew, thanks for your response! If I understand your solution correctly, you catch the exception when the parser exits and restart the parser until you get
      Message 2 of 19 , Jul 2, 2001
        Matthew,

        thanks for your response!

        If I understand your solution correctly, you catch the exception when
        the parser exits and restart the parser until you get correct input.

        That wouldn't work in my case, because my parser expects a "complete"
        program to parse, so the lexer should NOT return an unexpected character
        exception to the parser, but instead should capture it itself.

        ----- Original Message -----
        From: Matthew Ford <Matthew.Ford@...>
        To: <antlr-interest@yahoogroups.com>
        Sent: Saturday, June 30, 2001 1:05 AM
        Subject: Re: [antlr-interest] how do i skip unmatched characters?


        > I had this problem when handling commands comming in via telnet
        > I think this should be a common requirement. There should be an FAQ about
        > it.
        > matthew
        >
        > This is what I did to skip to the next ; and then continue parsing
        > (ErrorLog and Debug are my own reporting logs)
        >
        >
        > In the Parser
        >
        > options {
        > k = 1; // one token lookahead
        > defaultErrorHandler = false; // Don't generate parser error handlers
        > buildAST = true;
        > importVocab = TIMEFRAMECOMMONLEXER;
        > exportVocab = TIMEFRAMESERVERPARSER;
        >
        > }
        >
        >
        > {
        >
        > ICommandGeneratorServer commandGenerator;
        > TimeFrameCommonLexer TFlexer;
        > static String nl = System.getProperty("line.separator","\n");
        >
        > final static int MAJOR = 401;
        > final static int TRANSLATION_ERROR = 1;
        > final static int FIELD_QUALIFIER_ERROR = 2;
        > final static int QUERY_NOT_DEFINED = 3;
        > final static int NO_AGE_INDEX = 4;
        >
        > public TimeFrameCommandsParser(TimeFrameCommonLexer lexer,
        > ICommandGeneratorServer commandGenerator, IScope iScope) {
        > this(lexer);
        > TFlexer = lexer;
        > this.commandGenerator = commandGenerator;
        > this.iScope = iScope;
        > }
        >
        > public void reportError(ANTLRException ex) {
        > Debug.out("in reportError", MAJOR, 0);
        >
        > commandGenerator.translationError(TFlexer.getLineBuffer(),
        >
        >
        0,0,null,null,TimeFrameException.makeTimeFrameException(MAJOR,TRANSLATION_ER
        > ROR,ex.getMessage()));
        > Debug.out("******* end reportError *******", MAJOR, 0);
        > }
        >
        >
        >
        > public void processError(ANTLRException ex) throws TokenStreamException,
        > CharStreamException {
        > // actually only throws TokenStreamIOException others caught here
        > int tokenType=0;
        > LexerSharedInputState inputState = TFlexer.getInputState();
        > inputState.guessing = 0; // clear guessing mode
        > Debug.out("in processError", MAJOR, 0);
        > if (!errorFlag) { // first error
        > reportError(ex);
        > errorFlag=true; // block new errors until after syncing.
        > }
        >
        > do {
        > try {
        > if (ex instanceof TokenStreamRecognitionException) {
        > TokenStreamRecognitionException rex =
        > (TokenStreamRecognitionException)ex;
        > // get underlying exception
        > ex = null; // have handled this one now
        > if ((rex.recog instanceof MismatchedCharException) ||
        > (rex.recog instanceof NoViableAltForCharException)) {
        > try {
        > TFlexer.consume(); // remove current error char;
        > } catch (CharStreamException cse) {
        > if ( cse instanceof CharStreamIOException ) {
        > throw new TokenStreamIOException(((CharStreamIOException)cse).io);
        > } else {
        > throw new TokenStreamIOException(new
        IOException(cse.getMessage()));
        > }
        > }
        > }
        > }
        >
        > tokenType = LA(1);
        > if ((tokenType != EOF) && (tokenType != SEMI)) {
        > consume(); // remove ;
        > Debug.out("Input buffer:'"+TFlexer.getLineBuffer()+"'", MAJOR,
        0);
        > }
        >
        > } catch (TokenStreamRecognitionException ex1) {
        > ex = ex1; // and loop
        > // TFlexer.consume(); // remove current error char;
        > Debug.out("** found :"+ ex1, MAJOR, 0);
        > } catch (TokenStreamRetryException ex1) {
        > Debug.out("** found :"+ ex1, MAJOR, 0);
        > throw new TokenStreamIOException(new IOException(ex1.getMessage()));
        > }
        > } while ( tokenType != SEMI && tokenType != EOF && !isEOF());
        > Debug.out("** end processError *******", MAJOR, 0);
        > // if telnet print prompt again (How??)
        >
        > }
        >
        > private boolean errorFlag = false;
        >
        > private boolean eofFlag = false;
        >
        > public boolean isEOF() {
        > return eofFlag;
        > }
        >
        > private void clearErrorFlag() {
        > errorFlag = false;
        > }
        >
        >
        >
        > After the SEMI is seen I expect to find a new statement
        > if I do then after I find a valid statement I call clearErrorFlag()
        >
        > // SetAttribute("attribute1","attributeValue");
        > setattribute!
        > : SETATTRIBUTE
        > LPAREN attr:STRING_LITERAL COMMA value:STRING_LITERAL RPAREN SEMI
        > { clearErrorFlagAndScope();
        >
        commandGenerator.setAttribute(TFlexer.getLineBuffer(),attr.getText(),value.g
        > etText());
        > }
        > ;
        >
        >
        >
        > Finally to tie it all together
        >
        > In the main program I start the parser like this (in its own thread)
        > The command the parser finds are put on a command stack to be handled by
        > another thread. This lets you issue cancel commands at any time.
        >
        > /**
        > * The method reads in commands one by one.
        > */
        > public void run() {
        > Debug.out(""+GlobalData.nl+" ---------- InputThread " + connectionNo + "
        > starts.",MAJOR,0);
        >
        > try {
        > do {
        > try {
        > Debug.out("InputThread Call Parser",MAJOR,0);
        > parser.program();
        > } catch (RecognitionException ex) {
        > ErrorLog.log.println("RecognitionException: "+ ex.getMessage());
        > Debug.out("InputThread RecognitionException: "+
        > ex.getMessage(),MAJOR,0);
        > parser.processError(ex);
        > } catch (TokenStreamRecognitionException ex) {
        > ErrorLog.log.println("TokenStreamRecognitionException: " +
        > ex.getMessage());
        > Debug.out("InputThread TokenStreamRecognitionException: " +
        > ex.getMessage(),MAJOR,0);
        > parser.processError(ex);
        > } catch (TokenStreamRetryException ex) {
        > ErrorLog.log.println("TokenStreamRetryException: " +
        ex.getMessage());
        > Debug.out("InputThread TokenStreamRetryException: " +
        > ex.getMessage(),MAJOR,0);
        > parser.processError(ex);
        > } catch(TokenStreamIOException ex) {
        > Debug.out("InputThread TokenStreamIOException: " +
        > ex.getMessage(),MAJOR,0);
        > if (getStopped()) {
        > break;
        > }
        > }
        > if (parser.isEOF()) {
        > Debug.out("parser found EOF *****************",MAJOR,0);
        > break; // do not call the program again
        > }
        > } while (!getStopped()); // was while true
        >
        > }
        > catch(Exception e) { file://TokenStream IO exceptions or
        CharStreamExceptions
        > Debug.out(ExceptionBuffer.getStackTrace(e), Debug.STACKTRACE, 0);
        > // Close stream on IO errors
        > if (e instanceof antlr.TokenStreamIOException) {
        > Debug.out("TokenStreamIOException: one connection is lost",MAJOR,0);
        > listener.cancelConnection();
        > }
        > else {
        > Debug.out("other exception:",MAJOR,0);
        > TimeFrameException tfe =
        > TimeFrameException.makeTimeFrameException(MAJOR,EXCEPTION,e.toString() +
        > e.getMessage());
        > Errors error = new Errors(tfe);
        > beaconit.serverutils.CommandLog.log.printCommand(sessionNo,
        > connectionNo, error.toLogText());
        > ErrorLog.log.printError(sessionNo, connectionNo, error, "");
        > stack.add(error);
        > }
        > }
        >
        >
        > This is how I stop the input thread
        > /**
        > * This method is used to set the stop flag.
        > */
        > synchronized public void stopThread() {
        > Debug.out("stopThread is called in InputThread " +
        connectionNo,MAJOR,0);
        > stopped = true;
        > }
        >
        > /**
        > * Synchronized method returns the status of stop flag.
        > * @return the status of stop flag.
        > */
        > synchronized public boolean getStopped() {
        > return stopped;
        > }
        >
        > ----- Original Message -----
        > From: "Stdiobe" <stdiobe@...>
        > To: <antlr-interest@yahoogroups.com>
        > Sent: Saturday, June 30, 2001 4:47 AM
        > Subject: [antlr-interest] how do i skip unmatched characters?
        >
        >
        > >
        > > Hi,
        > >
        > > when the lexer generated by ANTLR encounters an unmatched character,
        > > it throws a TokenStreamRecognitionException which causes my lexer
        > > to exit (and also my parser).
        > >
        > > Does anyone know how I can skip unmatched characters in the lexer
        > > by reporting the error to the user (with linenumber, etc.), and have
        > > the lexer continue scanning for valid tokens.
        > >
        > > Stdiobe.
        > >
        > >
        > >
        > >
        > >
        > >
        > > Your use of Yahoo! Groups is subject to
        http://docs.yahoo.com/info/terms/
        > >
        > >
        >
        >
        >
        >
        > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
        >
        >
      • Ric Klaren
        Hi, ... How did you setup your exception handling in the lexer? e.g. defaultErrorHandler true or false? My guess is that you have it turned off? (As aside.. I
        Message 3 of 19 , Jul 2, 2001
          Hi,

          On Fri, Jun 29, 2001 at 08:47:04PM +0200, Stdiobe wrote:
          > when the lexer generated by ANTLR encounters an unmatched character,
          > it throws a TokenStreamRecognitionException which causes my lexer
          > to exit (and also my parser).

          How did you setup your exception handling in the lexer? e.g.
          defaultErrorHandler true or false? My guess is that you have it turned off?

          (As aside.. I think it is kindoff a deficiency that you can't specify an
          error handler for the nextToken rule?)

          > Does anyone know how I can skip unmatched characters in the lexer
          > by reporting the error to the user (with linenumber, etc.), and have
          > the lexer continue scanning for valid tokens.

          In my experience this is the default behaviour:

          -------snip---
          Parsing...

          int a�;
          line 1:6: unexpected char: 0xE9
          ------snip----

          (the 0xE9 is devel snapshot output of 'unprintable' chars, parsing continues
          after this error)

          This is with the following relevant options:

          charVocabulary= '\u0000' .. '\u00FF';
          defaultErrorHandler = true;

          If you need defaultErrorHandler false you need to fix the error outside the
          lexer (the reason for my aside remark above). I'm not 100% sure where in
          the path to the parser the most practical place is to catch this beast.
          Relevant files would be the inputbuffer/charbuffer/sharedinputstate files.
          Maybe its better to have defaultErrorHandler set to true and override per
          rule (for as far as possible).

          Groetsels,

          Ric
          --
          -----+++++*****************************************************+++++++++-------
          ---- Ric Klaren ----- klaren@... ----- +31 53 4893722 ----
          -----+++++*****************************************************+++++++++-------
          Why don't we just invite them to dinner and massacre them all when they're
          drunk? You heard the man. There's seven hundred thousand of them.
          Ah? ... So it'd have to be something simple with pasta, then.
          --- From: Interesting Times by Terry Pratchet
          -----+++++*****************************************************+++++++++-------
        • Stdiobe
          Ric, ... off? I don t use the defaultErrorHandler option so I m getting the default behaviour, which I thought was on , but I just checked and it turns out
          Message 4 of 19 , Jul 2, 2001
            Ric,

            > How did you setup your exception handling in the lexer? e.g.
            > defaultErrorHandler true or false? My guess is that you have it turned
            off?

            I don't use the "defaultErrorHandler" option so I'm getting the default
            behaviour, which I thought was "on", but I just checked and it turns
            out the default behaviour is "off"! That sucks!

            After specifying defaultErrorHandler = "true" I get the desired behaviour.

            I've checked the source code of ANTLR and found that class LexerGrammar
            turns the defaultErrorHandler off. That's not mentioned anywhere in the
            documentation! Also, I can't think of any reason why this is turned "off" by
            default!

            Many thanks for the advise!
            Stdiobe

            > -------snip---
            > Parsing...
            >
            > int aé;
            > line 1:6: unexpected char: 0xE9
            > ------snip----
            >
            > (the 0xE9 is devel snapshot output of 'unprintable' chars, parsing
            continues
            > after this error)
            >
            > This is with the following relevant options:
            >
            > charVocabulary= '\u0000' .. '\u00FF';
            > defaultErrorHandler = true;
            >
            > If you need defaultErrorHandler false you need to fix the error outside
            the
            > lexer (the reason for my aside remark above). I'm not 100% sure where in
            > the path to the parser the most practical place is to catch this beast.
            > Relevant files would be the inputbuffer/charbuffer/sharedinputstate files.
            > Maybe its better to have defaultErrorHandler set to true and override per
            > rule (for as far as possible).
            >
            > Groetsels,
            >
            > Ric
            > --
            > -----+++++*****************************************************+++++++++--
            -----
            > ---- Ric Klaren ----- klaren@... ----- +31 53 4893722 ----
            > -----+++++*****************************************************+++++++++--
            -----
            > Why don't we just invite them to dinner and massacre them all when
            they're
            > drunk? You heard the man. There's seven hundred thousand of them.
            > Ah? ... So it'd have to be something simple with pasta, then.
            > --- From: Interesting Times by Terry
            Pratchet
            > -----+++++*****************************************************+++++++++--
            -----
            >
            >
            >
            > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
            >
            >
          • Ric Klaren
            ... Doh! What version of antlr are you using one of my snapshots? According to the docs it should be on by default? Ric -- ... Why don t we just invite them to
            Message 5 of 19 , Jul 2, 2001
              On Mon, Jul 02, 2001 at 05:55:21PM +0200, Stdiobe wrote:
              > I don't use the "defaultErrorHandler" option so I'm getting the default
              > behaviour, which I thought was "on", but I just checked and it turns
              > out the default behaviour is "off"! That sucks!
              >
              > After specifying defaultErrorHandler = "true" I get the desired behaviour.
              >
              > I've checked the source code of ANTLR and found that class LexerGrammar
              > turns the defaultErrorHandler off. That's not mentioned anywhere in the
              > documentation! Also, I can't think of any reason why this is turned "off" by
              > default!

              Doh! What version of antlr are you using one of my snapshots? According to
              the docs it should be on by default?

              Ric
              --
              -----+++++*****************************************************+++++++++-------
              ---- Ric Klaren ----- klaren@... ----- +31 53 4893722 ----
              -----+++++*****************************************************+++++++++-------
              Why don't we just invite them to dinner and massacre them all when they're
              drunk? You heard the man. There's seven hundred thousand of them.
              Ah? ... So it'd have to be something simple with pasta, then.
              --- From: Interesting Times by Terry Pratchet
              -----+++++*****************************************************+++++++++-------
            • Stdiobe
              Ric, ... I m using standard 2.7.1, but it s also turned off in the patched versions (and also in version 2.6.0). I just checked the documentation again and it
              Message 6 of 19 , Jul 2, 2001
                Ric,

                > Doh! What version of antlr are you using one of my snapshots? According to
                > the docs it should be on by default?

                I'm using standard 2.7.1, but it's also turned off in the patched versions
                (and also in version 2.6.0).

                I just checked the documentation again and it seems the documentation
                doesn't literally say it's "on" but it does suggest it in "err.html" under
                "Default
                Exception Handling in the Lexer":

                "Normally you want the lexer to keep trying to get a valid token upon
                lexical error. That way, the parser doesn't have to deal with lexical
                errors
                and ask for another token ...... To get ANTLR to generate lexers that
                pass on RecognitionException's to the parser as TokenStreamException's,
                use the defaultErrorHandler=false grammar option."

                However, the documentation in "options.html" mentions that "ANTLR
                will generate default exception handling code for a parser or tree-parser
                rule".
                It doesn't say that a default error handler is generated for lexers, so it
                seems I was wrong with my assumption that Antlr will also generate
                a default error handler for lexers (although I still think the current
                behaviour
                is not logical).

                Note that the lexer of Antlr itself (in antlr.g) doesn't use a default
                error handler. Sometimes I get unmatched character exceptions with
                ANTLR where it doesn't report the linenumber. Turning error handling
                on would solve that problem.

                Again, thanks for your advise!

                Stdiobe.
              • Matthew Ford
                you can also use it to resync inside programs where there is a statement delimiter (such as ; in C and Java) ... From: Stdiobe To:
                Message 7 of 19 , Jul 2, 2001
                  you can also use it to resync inside programs where there is a statement
                  delimiter (such as ; in C and Java)

                  ----- Original Message -----
                  From: "Stdiobe" <stdiobe@...>
                  To: <antlr-interest@yahoogroups.com>
                  Sent: Monday, July 02, 2001 11:46 PM
                  Subject: Re: [antlr-interest] how do i skip unmatched characters?


                  >
                  > Matthew,
                  >
                  > thanks for your response!
                  >
                  > If I understand your solution correctly, you catch the exception when
                  > the parser exits and restart the parser until you get correct input.
                  >
                  > That wouldn't work in my case, because my parser expects a "complete"
                  > program to parse, so the lexer should NOT return an unexpected character
                  > exception to the parser, but instead should capture it itself.
                  >
                  > ----- Original Message -----
                  > From: Matthew Ford <Matthew.Ford@...>
                  > To: <antlr-interest@yahoogroups.com>
                  > Sent: Saturday, June 30, 2001 1:05 AM
                  > Subject: Re: [antlr-interest] how do i skip unmatched characters?
                  >
                  >
                  > > I had this problem when handling commands comming in via telnet
                  > > I think this should be a common requirement. There should be an FAQ
                  about
                  > > it.
                  > > matthew
                  > >
                  > > This is what I did to skip to the next ; and then continue parsing
                  > > (ErrorLog and Debug are my own reporting logs)
                  > >
                  > >
                  > > In the Parser
                  > >
                  > > options {
                  > > k = 1; // one token lookahead
                  > > defaultErrorHandler = false; // Don't generate parser error
                  handlers
                  > > buildAST = true;
                  > > importVocab = TIMEFRAMECOMMONLEXER;
                  > > exportVocab = TIMEFRAMESERVERPARSER;
                  > >
                  > > }
                  > >
                  > >
                  > > {
                  > >
                  > > ICommandGeneratorServer commandGenerator;
                  > > TimeFrameCommonLexer TFlexer;
                  > > static String nl = System.getProperty("line.separator","\n");
                  > >
                  > > final static int MAJOR = 401;
                  > > final static int TRANSLATION_ERROR = 1;
                  > > final static int FIELD_QUALIFIER_ERROR = 2;
                  > > final static int QUERY_NOT_DEFINED = 3;
                  > > final static int NO_AGE_INDEX = 4;
                  > >
                  > > public TimeFrameCommandsParser(TimeFrameCommonLexer lexer,
                  > > ICommandGeneratorServer commandGenerator, IScope iScope) {
                  > > this(lexer);
                  > > TFlexer = lexer;
                  > > this.commandGenerator = commandGenerator;
                  > > this.iScope = iScope;
                  > > }
                  > >
                  > > public void reportError(ANTLRException ex) {
                  > > Debug.out("in reportError", MAJOR, 0);
                  > >
                  > > commandGenerator.translationError(TFlexer.getLineBuffer(),
                  > >
                  > >
                  >
                  0,0,null,null,TimeFrameException.makeTimeFrameException(MAJOR,TRANSLATION_ER
                  > > ROR,ex.getMessage()));
                  > > Debug.out("******* end reportError *******", MAJOR, 0);
                  > > }
                  > >
                  > >
                  > >
                  > > public void processError(ANTLRException ex) throws
                  TokenStreamException,
                  > > CharStreamException {
                  > > // actually only throws TokenStreamIOException others caught here
                  > > int tokenType=0;
                  > > LexerSharedInputState inputState = TFlexer.getInputState();
                  > > inputState.guessing = 0; // clear guessing mode
                  > > Debug.out("in processError", MAJOR, 0);
                  > > if (!errorFlag) { // first error
                  > > reportError(ex);
                  > > errorFlag=true; // block new errors until after syncing.
                  > > }
                  > >
                  > > do {
                  > > try {
                  > > if (ex instanceof TokenStreamRecognitionException) {
                  > > TokenStreamRecognitionException rex =
                  > > (TokenStreamRecognitionException)ex;
                  > > // get underlying exception
                  > > ex = null; // have handled this one now
                  > > if ((rex.recog instanceof MismatchedCharException) ||
                  > > (rex.recog instanceof NoViableAltForCharException)) {
                  > > try {
                  > > TFlexer.consume(); // remove current error char;
                  > > } catch (CharStreamException cse) {
                  > > if ( cse instanceof CharStreamIOException ) {
                  > > throw new
                  TokenStreamIOException(((CharStreamIOException)cse).io);
                  > > } else {
                  > > throw new TokenStreamIOException(new
                  > IOException(cse.getMessage()));
                  > > }
                  > > }
                  > > }
                  > > }
                  > >
                  > > tokenType = LA(1);
                  > > if ((tokenType != EOF) && (tokenType != SEMI)) {
                  > > consume(); // remove ;
                  > > Debug.out("Input buffer:'"+TFlexer.getLineBuffer()+"'", MAJOR,
                  > 0);
                  > > }
                  > >
                  > > } catch (TokenStreamRecognitionException ex1) {
                  > > ex = ex1; // and loop
                  > > // TFlexer.consume(); // remove current error char;
                  > > Debug.out("** found :"+ ex1, MAJOR, 0);
                  > > } catch (TokenStreamRetryException ex1) {
                  > > Debug.out("** found :"+ ex1, MAJOR, 0);
                  > > throw new TokenStreamIOException(new
                  IOException(ex1.getMessage()));
                  > > }
                  > > } while ( tokenType != SEMI && tokenType != EOF && !isEOF());
                  > > Debug.out("** end processError *******", MAJOR, 0);
                  > > // if telnet print prompt again (How??)
                  > >
                  > > }
                  > >
                  > > private boolean errorFlag = false;
                  > >
                  > > private boolean eofFlag = false;
                  > >
                  > > public boolean isEOF() {
                  > > return eofFlag;
                  > > }
                  > >
                  > > private void clearErrorFlag() {
                  > > errorFlag = false;
                  > > }
                  > >
                  > >
                  > >
                  > > After the SEMI is seen I expect to find a new statement
                  > > if I do then after I find a valid statement I call clearErrorFlag()
                  > >
                  > > // SetAttribute("attribute1","attributeValue");
                  > > setattribute!
                  > > : SETATTRIBUTE
                  > > LPAREN attr:STRING_LITERAL COMMA value:STRING_LITERAL RPAREN SEMI
                  > > { clearErrorFlagAndScope();
                  > >
                  >
                  commandGenerator.setAttribute(TFlexer.getLineBuffer(),attr.getText(),value.g
                  > > etText());
                  > > }
                  > > ;
                  > >
                  > >
                  > >
                  > > Finally to tie it all together
                  > >
                  > > In the main program I start the parser like this (in its own thread)
                  > > The command the parser finds are put on a command stack to be handled by
                  > > another thread. This lets you issue cancel commands at any time.
                  > >
                  > > /**
                  > > * The method reads in commands one by one.
                  > > */
                  > > public void run() {
                  > > Debug.out(""+GlobalData.nl+" ---------- InputThread " + connectionNo +
                  "
                  > > starts.",MAJOR,0);
                  > >
                  > > try {
                  > > do {
                  > > try {
                  > > Debug.out("InputThread Call Parser",MAJOR,0);
                  > > parser.program();
                  > > } catch (RecognitionException ex) {
                  > > ErrorLog.log.println("RecognitionException: "+ ex.getMessage());
                  > > Debug.out("InputThread RecognitionException: "+
                  > > ex.getMessage(),MAJOR,0);
                  > > parser.processError(ex);
                  > > } catch (TokenStreamRecognitionException ex) {
                  > > ErrorLog.log.println("TokenStreamRecognitionException: " +
                  > > ex.getMessage());
                  > > Debug.out("InputThread TokenStreamRecognitionException: " +
                  > > ex.getMessage(),MAJOR,0);
                  > > parser.processError(ex);
                  > > } catch (TokenStreamRetryException ex) {
                  > > ErrorLog.log.println("TokenStreamRetryException: " +
                  > ex.getMessage());
                  > > Debug.out("InputThread TokenStreamRetryException: " +
                  > > ex.getMessage(),MAJOR,0);
                  > > parser.processError(ex);
                  > > } catch(TokenStreamIOException ex) {
                  > > Debug.out("InputThread TokenStreamIOException: " +
                  > > ex.getMessage(),MAJOR,0);
                  > > if (getStopped()) {
                  > > break;
                  > > }
                  > > }
                  > > if (parser.isEOF()) {
                  > > Debug.out("parser found EOF *****************",MAJOR,0);
                  > > break; // do not call the program again
                  > > }
                  > > } while (!getStopped()); // was while true
                  > >
                  > > }
                  > > catch(Exception e) { file://TokenStream IO exceptions or
                  > CharStreamExceptions
                  > > Debug.out(ExceptionBuffer.getStackTrace(e), Debug.STACKTRACE, 0);
                  > > // Close stream on IO errors
                  > > if (e instanceof antlr.TokenStreamIOException) {
                  > > Debug.out("TokenStreamIOException: one connection is lost",MAJOR,0);
                  > > listener.cancelConnection();
                  > > }
                  > > else {
                  > > Debug.out("other exception:",MAJOR,0);
                  > > TimeFrameException tfe =
                  > > TimeFrameException.makeTimeFrameException(MAJOR,EXCEPTION,e.toString() +
                  > > e.getMessage());
                  > > Errors error = new Errors(tfe);
                  > > beaconit.serverutils.CommandLog.log.printCommand(sessionNo,
                  > > connectionNo, error.toLogText());
                  > > ErrorLog.log.printError(sessionNo, connectionNo, error, "");
                  > > stack.add(error);
                  > > }
                  > > }
                  > >
                  > >
                  > > This is how I stop the input thread
                  > > /**
                  > > * This method is used to set the stop flag.
                  > > */
                  > > synchronized public void stopThread() {
                  > > Debug.out("stopThread is called in InputThread " +
                  > connectionNo,MAJOR,0);
                  > > stopped = true;
                  > > }
                  > >
                  > > /**
                  > > * Synchronized method returns the status of stop flag.
                  > > * @return the status of stop flag.
                  > > */
                  > > synchronized public boolean getStopped() {
                  > > return stopped;
                  > > }
                  > >
                  > > ----- Original Message -----
                  > > From: "Stdiobe" <stdiobe@...>
                  > > To: <antlr-interest@yahoogroups.com>
                  > > Sent: Saturday, June 30, 2001 4:47 AM
                  > > Subject: [antlr-interest] how do i skip unmatched characters?
                  > >
                  > >
                  > > >
                  > > > Hi,
                  > > >
                  > > > when the lexer generated by ANTLR encounters an unmatched character,
                  > > > it throws a TokenStreamRecognitionException which causes my lexer
                  > > > to exit (and also my parser).
                  > > >
                  > > > Does anyone know how I can skip unmatched characters in the lexer
                  > > > by reporting the error to the user (with linenumber, etc.), and have
                  > > > the lexer continue scanning for valid tokens.
                  > > >
                  > > > Stdiobe.
                  > > >
                  > > >
                  > > >
                  > > >
                  > > >
                  > > >
                  > > > Your use of Yahoo! Groups is subject to
                  > http://docs.yahoo.com/info/terms/
                  > > >
                  > > >
                  > >
                  > >
                  > >
                  > >
                  > > Your use of Yahoo! Groups is subject to
                  http://docs.yahoo.com/info/terms/
                  > >
                  > >
                  >
                  >
                  >
                  >
                  > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                  >
                  >
                • Ric Klaren
                  Hi, People should the default errorhandler be on in lexers ? (like it is with all the other parsers?) If I m not receiving counterarguments I m gonna change
                  Message 8 of 19 , Jul 3, 2001
                    Hi,

                    People should the default errorhandler be on in lexers ? (like it is with
                    all the other parsers?)

                    If I'm not receiving counterarguments I'm gonna change it. So it's
                    consistent with the rest of the behaviour of the tool.

                    (or am I jumping the gun =) )

                    Ric

                    On Mon, Jul 02, 2001 at 06:56:44PM +0200, Stdiobe wrote:
                    > > Doh! What version of antlr are you using one of my snapshots? According to
                    > > the docs it should be on by default?

                    > I just checked the documentation again and it seems the documentation
                    > doesn't literally say it's "on" but it does suggest it in "err.html" under
                    > "Default Exception Handling in the Lexer":
                    >
                    > "Normally you want the lexer to keep trying to get a valid token upon
                    > lexical error. That way, the parser doesn't have to deal with lexical
                    > errors and ask for another token ...... To get ANTLR to generate lexers
                    > that pass on RecognitionException's to the parser as
                    > TokenStreamException's, use the defaultErrorHandler=false grammar option."
                    >
                    > However, the documentation in "options.html" mentions that "ANTLR will
                    > generate default exception handling code for a parser or tree-parser rule".

                    Hmm looks like someone changed their mind somewhere along the lines.

                    > It doesn't say that a default error handler is generated for lexers, so it
                    > seems I was wrong with my assumption that Antlr will also generate a
                    > default error handler for lexers (although I still think the current
                    > behaviour is not logical).

                    I think I agree on that. It didn't bite with my lexer me because I usually
                    don't rely too much on defaults and put the stuff in the options header
                    anyway.

                    > Note that the lexer of Antlr itself (in antlr.g) doesn't use a default
                    > error handler. Sometimes I get unmatched character exceptions with ANTLR
                    > where it doesn't report the linenumber. Turning error handling on would
                    > solve that problem.

                    Which would not be a bad thing (tm)

                    Ric
                    --
                    -----+++++*****************************************************+++++++++-------
                    ---- Ric Klaren ----- klaren@... ----- +31 53 4893722 ----
                    -----+++++*****************************************************+++++++++-------
                    Why don't we just invite them to dinner and massacre them all when they're
                    drunk? You heard the man. There's seven hundred thousand of them.
                    Ah? ... So it'd have to be something simple with pasta, then.
                    --- From: Interesting Times by Terry Pratchet
                    -----+++++*****************************************************+++++++++-------
                  • Terence Parr
                    ... Unless it s a bug, we should discuss changes to behavior I think. Anyway...I m can t remember my reasons and I m foggy at the moment, but lexers are
                    Message 9 of 19 , Jul 3, 2001
                      Tuesday, July 03, 2001, Ric Klaren hath spoken:
                      > Hi,

                      > People should the default errorhandler be on in lexers ? (like it is with
                      > all the other parsers?)

                      > If I'm not receiving counterarguments I'm gonna change it. So it's
                      > consistent with the rest of the behaviour of the tool.

                      > (or am I jumping the gun =) )

                      Unless it's a bug, we should discuss changes to behavior I think.

                      Anyway...I'm can't remember my reasons and I'm foggy at the moment,
                      but lexers are different in the sense that you don't want the errors
                      to be trapped in the rules I think--all output of the lexer goes thru
                      the nextToken method. If an error is trapped in a rule, it will return
                      with bogus information and most importantly w/o knowledge that an
                      error occurred. nextToken will return bogus tokens to the parser.
                      Unless the lexer is very complicated, it's usually ok to just say
                      "this text 'xxx' is bogus on line n."

                      Note that I specifically turn ON default handling often in protected
                      rules (note these are not invoked directly by the nextToken method,
                      hence, avoiding the abovementioned problem). In these rules, such as
                      the args for an HTML tag, I often want to say "bogus image tag
                      argument on line n" and keep going.

                      So, when I want to detect errors WITHIN a token and keep going to
                      return some valid token to the parser (fault tolerance) I use the
                      default handlers or specify one for a protected rule. Ok, i've
                      convinced myself that the current behavior is appropriate. Somebody
                      could convince me though that they should be on for protected rules by
                      default, but these rules are already confusing enough for people ;)

                      Counter examples?

                      Thanks,
                      Ter
                      --
                      Chief Scientist & Co-founder, http://www.jguru.com
                      Co-founder, http://www.NoWebPatents.org -- Stop Patent Stupidity
                      parrt@...
                    • Stdiobe
                      Ter, ... With a DFA based lexer (like LEX) behaviour is simple: either a pattern matches completely or it doesn t. If it doesn t then the unmatched character
                      Message 10 of 19 , Jul 3, 2001
                        Ter,

                        > Counter examples?

                        With a DFA based lexer (like LEX) behaviour is simple: either a pattern
                        matches completely or it doesn't. If it doesn't then the unmatched character
                        will be reported and the lexer tries again.

                        This is the kind of behaviour I expect from a lexer, including the
                        Antlr lexer.

                        However, the Antlr lexer works differently:

                        - "defaultErrorHandler=false" will ensure that patterns are always matched
                        completely. However, unmatched characters are not caught by the
                        lexer, but instead result in a TokenStreamRecognitionException that will
                        be passed on to the parser (which will pass it on to the caller of the
                        parser).

                        To illustrate my point: just put an illegal character like '$' somewhere
                        in an ANTLR grammar (outside an action) and run ANTLR. It will
                        report a token stream exception, but will not mention on what line,
                        and will exit.

                        - "defaultErrorHandler=true" will ensure that unmatched characters are
                        properly caught and reported. However, it also causes protected
                        lexer rules to match bogus input which should not happen (unless
                        specifically specified by the developer, as you describe in your
                        comment).

                        For me the best solution would be to turn defaultErrorHandler "off" by
                        default, but with the possibility of catching RecognitionExceptions within
                        the nextToken method, and having the lexer retry.

                        But, I have no idea how I can catch a RecognitionException in the
                        nextToken method which is completely generated by the lexer.
                        If that would be possibly (and preferably default behaviour) then
                        defaultErrorHandler should be "off" as default.

                        However, without that possibility I think the default behaviour for
                        defaultErrorHandler should be "on". If needed I can turn it off for
                        protected rules to force exact matches (and if I don't, then mismatches
                        will at least be reported to the user so they know something is wrong,
                        and they get file/line information for locating the error).

                        Note: whatever the default behaviour, it should be well documented.
                        After reading the documentation I (incorrectly) assumed the default
                        behaviour was "on", but after reading more carefully I learned that
                        the documentation doesn't explicitly say that it is "on" (neither does
                        it explicitly say it's "off" ...).

                        Stdiobe.
                      • Matthew Mower
                        On Tue, 3 Jul 2001 23:39:24 +0200, you wrote in ... Could you subclass the generated lexer and override nextToken() to implement your exception catching?
                        Message 11 of 19 , Jul 4, 2001
                          On Tue, 3 Jul 2001 23:39:24 +0200, you wrote in
                          <014c01c10408$d5e61640$0d412d3e@daemon>:

                          >But, I have no idea how I can catch a RecognitionException in the
                          >nextToken method which is completely generated by the lexer.
                          >If that would be possibly (and preferably default behaviour) then
                          >defaultErrorHandler should be "off" as default.
                          >

                          Could you subclass the generated lexer and override nextToken() to
                          implement your exception catching?

                          Regards,

                          Matt.

                          -----
                          Matt Mower, Development Manager, MetaDyne Ltd, 44-1895-254254

                          "Never play cards with a man named 'Doc'." - Nelson Algren
                        • Stdiobe
                          Matt, ... I have thought about that idea, but due to the large number of lexers that I (will) have, I considered this to be not very practical. (I m developing
                          Message 12 of 19 , Jul 4, 2001
                            Matt,

                            > Could you subclass the generated lexer and override nextToken() to
                            > implement your exception catching?

                            I have thought about that idea, but due to the large number of lexers that
                            I (will) have, I considered this to be not very practical.

                            (I'm developing parsers for several programming languages, each
                            with multiple dialects, preprocessors, etc., so that will result in a large
                            number of lexers. I don't want to solve this problem in each (derived)
                            lexer, although that's still an option.)

                            I would prefer a solution where ANTLR generates the desired behaviour,
                            i.e. catch RecognitionExceptions in the nextToken method.
                            (I know, I could change the Antlr generator, but before I do that, I want
                            to be sure that I'm not re-inventing the wheel).

                            Stdiobe
                          • Stdiobe
                            Ter, ... I finally figured it out .... the answer was there all the time; I just didn t see it. All that was needed was: - filter = UNKNOWN_TOKEN -
                            Message 13 of 19 , Jul 4, 2001
                              Ter,

                              > However, without that possibility I think the default behaviour for
                              > defaultErrorHandler should be "on". If needed I can turn it off for
                              > protected rules to force exact matches (and if I don't, then mismatches
                              > will at least be reported to the user so they know something is wrong,
                              > and they get file/line information for locating the error).

                              I finally figured it out .... the answer was there all the time; I just
                              didn't see it.

                              All that was needed was:
                              - "filter = UNKNOWN_TOKEN"
                              - UNKNOWN_TOKEN : . {issue message} ;
                              - don't call setCommitToPath so the lexer will try again.

                              Given this solution, I must conclude that the current behaviour of the
                              lexer is indeed desired and no changes in the generator are needed.

                              However, it would be nice if:
                              - the documentation was more clear about the error handling behaviour
                              of the lexer (i.e. mention explicitly that lexer does not default to error
                              handling).
                              - the Antlr generator itself should include such a rule, so that unmatched
                              characters are reported including linenumber.

                              Sorry for the trouble, thanks for your response,

                              Stdiobe.
                            • Stdiobe
                              Gary Schaps, I finally figured out your solution. Seems you were right all along! Thanks. Stdiobe
                              Message 14 of 19 , Jul 4, 2001
                                Gary Schaps,

                                I finally figured out your solution. Seems you were right all along! Thanks.

                                Stdiobe

                                > > Something like this might work. You'll have to define
                                > > "unmatched character" more precisely of course.
                                > >
                                > > class MyLexer extends Lexer;
                                > > options {
                                > > k=2;
                                > > filter = IGNORE;
                                > > charVocabulary = '\3'..'\177';
                                > > }
                              • Ric Klaren
                                Hi, Finaly found the time to answer this one with thinking ... ... Yup! ... Ack that s why I did this RFC thing =) ... Yup. ... Aha. ... Only problem is that
                                Message 15 of 19 , Jul 6, 2001
                                  Hi,

                                  Finaly found the time to answer this one with thinking ...

                                  On Tue, Jul 03, 2001 at 11:08:50AM -0700, Terence Parr wrote:
                                  > > (or am I jumping the gun =) )

                                  Yup!

                                  > Unless it's a bug, we should discuss changes to behavior I think.

                                  Ack that's why I did this RFC thing =)

                                  > Anyway...I'm can't remember my reasons and I'm foggy at the moment, but
                                  > lexers are different in the sense that you don't want the errors to be
                                  > trapped in the rules I think--all output of the lexer goes thru the
                                  > nextToken method.

                                  Yup.

                                  > If an error is trapped in a rule, it will return with bogus information and
                                  > most importantly w/o knowledge that an error occurred. nextToken will
                                  > return bogus tokens to the parser. Unless the lexer is very complicated,
                                  > it's usually ok to just say "this text 'xxx' is bogus on line n."

                                  Aha.

                                  > So, when I want to detect errors WITHIN a token and keep going to return
                                  > some valid token to the parser (fault tolerance) I use the default handlers
                                  > or specify one for a protected rule.

                                  Only problem is that you can't specify a errorhandler for the nextToken
                                  rule... So if you want unexpected char's reported inside your lexer without
                                  going back to the parser (which is not practical in some cases). You
                                  a) have to specify defaultErrorhandler = true; and maybe in lot's of other
                                  places defaultErrorHandler = false; (AFAIK only way to get
                                  defaulterrorhandler in just the nextToken rule)
                                  b) use the filter rule 'hack' which is IMHO not the most intuitive way to deal
                                  with these things. (faq's on this topic are shortish)

                                  > Ok, i've convinced myself that the current behavior is appropriate.

                                  Me as well =) but with the above notes.

                                  I guess we should do a few documentation fixes with respect to this. Maybe
                                  add a section on skipping/reporting on unrecognized chars in the lexer.

                                  I've been thinking in extending the grammar to allow a:

                                  class MyParser extends Parser;
                                  options {
                                  ...
                                  }
                                  exception catch [ ... ] { .. }

                                  Syntax for at least (tree)parsers so you can specify a different
                                  defaultErrorhandler for all rules (this should work nicely together with
                                  Ernest's $lookaheadSet patch).

                                  For a lexer we could then modify the behaviour to change the errorhandler
                                  for nextToken?

                                  Any thoughts?

                                  Ric
                                  --
                                  -----+++++*****************************************************+++++++++-------
                                  ---- Ric Klaren ----- klaren@... ----- +31 53 4893722 ----
                                  -----+++++*****************************************************+++++++++-------
                                  Why don't we just invite them to dinner and massacre them all when they're
                                  drunk? You heard the man. There's seven hundred thousand of them.
                                  Ah? ... So it'd have to be something simple with pasta, then.
                                  --- From: Interesting Times by Terry Pratchet
                                  -----+++++*****************************************************+++++++++-------
                                • Terence Parr
                                  ... Yes, I definitely think we should fix the documentation and figure out how to specify exception handling for nextToken. NOte that somebody correctly
                                  Message 16 of 19 , Jul 19, 2001
                                    Friday, July 06, 2001, Ric Klaren hath spoken:
                                    > Hi,

                                    > Finaly found the time to answer this one with thinking ...

                                    > On Tue, Jul 03, 2001 at 11:08:50AM -0700, Terence Parr wrote:
                                    >> > (or am I jumping the gun =) )

                                    > Yup!

                                    >> Unless it's a bug, we should discuss changes to behavior I think.

                                    > Ack that's why I did this RFC thing =)

                                    >> Anyway...I'm can't remember my reasons and I'm foggy at the moment, but
                                    >> lexers are different in the sense that you don't want the errors to be
                                    >> trapped in the rules I think--all output of the lexer goes thru the
                                    >> nextToken method.

                                    > Yup.

                                    >> If an error is trapped in a rule, it will return with bogus information and
                                    >> most importantly w/o knowledge that an error occurred. nextToken will
                                    >> return bogus tokens to the parser. Unless the lexer is very complicated,
                                    >> it's usually ok to just say "this text 'xxx' is bogus on line n."

                                    > Aha.

                                    >> So, when I want to detect errors WITHIN a token and keep going to return
                                    >> some valid token to the parser (fault tolerance) I use the default handlers
                                    >> or specify one for a protected rule.

                                    > Only problem is that you can't specify a errorhandler for the nextToken
                                    > rule... So if you want unexpected char's reported inside your lexer without
                                    > going back to the parser (which is not practical in some cases). You
                                    > a) have to specify defaultErrorhandler = true; and maybe in lot's of other
                                    > places defaultErrorHandler = false; (AFAIK only way to get
                                    > defaulterrorhandler in just the nextToken rule)
                                    > b) use the filter rule 'hack' which is IMHO not the most intuitive way to deal
                                    > with these things. (faq's on this topic are shortish)

                                    >> Ok, i've convinced myself that the current behavior is appropriate.

                                    > Me as well =) but with the above notes.

                                    > I guess we should do a few documentation fixes with respect to this. Maybe
                                    > add a section on skipping/reporting on unrecognized chars in the lexer.

                                    > I've been thinking in extending the grammar to allow a:

                                    > class MyParser extends Parser;
                                    > options {
                                    > ...
                                    > }
                                    > exception catch [ ... ] { .. }

                                    > Syntax for at least (tree)parsers so you can specify a different
                                    > defaultErrorhandler for all rules (this should work nicely together with
                                    > Ernest's $lookaheadSet patch).

                                    > For a lexer we could then modify the behaviour to change the errorhandler
                                    > for nextToken?

                                    Yes, I definitely think we should fix the documentation and figure out
                                    how to specify exception handling for nextToken. NOte that somebody
                                    correctly figured out that the filter=UNKNOWN_TOKEN option gives you
                                    the desired behavior: all errors go to the UNKNOWN_TOKEN rule. Is
                                    that cool for now? I.e., add a FAQ entry / update the DOC?

                                    BTW, do people find the FAQ useful http://www.jguru.com/faq/ANTLR ?

                                    Ter
                                    --
                                    Chief Scientist & Co-founder, http://www.jguru.com
                                    Co-founder, http://www.NoWebPatents.org -- Stop Patent Stupidity
                                    parrt@...
                                  Your message has been successfully submitted and would be delivered to recipients shortly.