Loading ...
Sorry, an error occurred while loading the content.

recursive semantic scanning (recursive lookahead?)

Expand Messages
  • thereisnofreeid
    hello all, I want to do the following: 1. Split a sentence into words 2. check whether a word is equal to a certain term, or whether it is the beginning of a
    Message 1 of 2 , Apr 1 4:19 AM
    • 0 Attachment
      hello all,

      I want to do the following:

      1. Split a sentence into words
      2. check whether a word is equal to a certain term, or whether it is
      the beginning of a certain term (if the phrase is a term). In the
      latter case, checking should be recursive to detect when several words
      match one term.
      3. all words shall be counted (if a phrase is detected, it counts as one).

      I have no problems with point 1 (done in the Lexer) and 3 (done in the
      Parser). For 2, I am stuck with the following code (in the lexer):

      TERM1
      : { searchedTerms.equalsFirstTerm($getText()) }? PART_TERM1
      | { searchedTerms.equalsFirstTerm($getText()) }? WORD
      // reset token type for terms that start like but are not equal to term1
      | PART_TERM1 { $setType(Token.WORD); }
      ;

      protected PART_TERM1
      : ( pt:PART_TERM1 WS w:WORD
      { searchedTerms.firstTermStartsWith(pt.getText() + " " +
      w.getText()) }? )
      => ( PART_TERM1 )
      | ( { searchedTerms.firstTermStartsWith($getText()) }? WORD ) => (
      PART_TERM1 )
      //| ( PART_TERM1 WS )* WORD
      ;

      (I get infinite recursion messages from antlr with this code.)

      searchedTerms is an instance of a custom java class that does specific
      string operations. searchedTerms stores several terms that can match
      the so-called "first term" (which is rather a set of terms with equal
      meaning). searchedTerms will be provided during runtime, thus I cannot
      hard code any terms into the lexer/parser.

      My problems are:

      - where shall I do the checking - in the Lexer, while recognizing the
      words? or in the parser, after splitting into words?
      - how can I tell the lexer (or parser) that if the word is the
      beginning of the term (rule PART_TERM1), it shall try to match the
      term (rule TERM1), or concatenate it and try matching again (first
      with PART_TERM1, than with TERM1) and so on, until PART_TERM1 _and_
      TERM1 both "return true" (well, match). and if they do not match, all
      words shall get the type token and be send to the parser for counting.

      Or shall I rather make one (protected) rule that does all the checking
      and returns some value to indicate the result (that has just come to
      my mind).

      Maybe there is already lots of sample code around but I don't know the
      terms that would describe it and that I could use to search for it.

      Thanks for your help
      Chantal
    • thereisnofreeid
      hi again, I have come up with a new idea but I am still stuck with infinite recursion messages. I moved the term decision logic to the parser. the lexer just
      Message 2 of 2 , Apr 2 4:09 AM
      • 0 Attachment
        hi again,

        I have come up with a new idea but I am still stuck with infinite
        recursion messages. I moved the term decision logic to the parser. the
        lexer just splits the input into WORD tokens. in the parser I have the
        following rules:

        protected startTerm
        : ( { searchedTerms.firstTermStartsWith(LT(1).getText() + ' ' +
        LT(2).getText()) }?
        ( startTerm WORD ) ) => startTerm
        | ( { searchedTerms.secondTermStartsWith(LT(1).getText() + ' ' +
        LT(2).getText()) }?
        ( startTerm WORD ) ) => startTerm
        | ( { searchedTerms.firstTermStartsWith(LT(1).getText()) }?
        ( WORD ) ) => startTerm
        | ( { searchedTerms.secondTermStartsWith(LT(1).getText()) }?
        ( WORD ) ) => startTerm
        ;

        protected term1
        : ( { searchedTerms.equalsFirstTerm(LT(1).getText() + ' ' +
        LT(2).getText()) }?
        ( startTerm WORD ) ) => term1
        | ( { searchedTerms.equalsFirstTerm(LT(1).getText()) }?
        ( WORD ) ) => term1
        ;

        protected term2
        : ( { searchedTerms.equalsSecondTerm(LT(1).getText() + ' ' +
        LT(2).getText()) }?
        ( startTerm WORD ) ) => term2
        | ( { searchedTerms.equalsSecondTerm(LT(1).getText()) }?
        ( WORD ) ) => term2
        ;

        protected term
        : ( WORD ) => term
        ;

        I get inifinite recursions for term1, term2 and term. Well, I want to
        implement an recursion, but how can I tell the parser to stop the
        iteration whenever the equals methods return false? that still does
        not solve the problem that startTerm is just a helper token and if
        startTerm is not recognized as part of term1 or term2 the WORD tokens
        it consists of shall be marked as term tokens. as I write this, I
        think, it is not possible to do this in any way. or could a TreeParser
        provide some more means to do this?

        thanks for any input
        Chantal
      Your message has been successfully submitted and would be delivered to recipients shortly.