Loading ...
Sorry, an error occurred while loading the content.
 

ocamllex - return 2 tokens on eof?

Expand Messages
  • gmwass
    I have a lexer and parser that I want to modify to make the last token optional. Currently, the mly file has like the following: (***** begin mly file *****)
    Message 1 of 3 , Feb 27, 2007
      I have a lexer and parser that I want to modify to make the last token
      optional. Currently, the mly file has like the following:

      (***** begin mly file *****)
      main:
      stmtlist EOF
      ;

      stmtlist:

      | stmt stmtlist
      ;

      stmt:
      stmtend
      | exp stmtend
      (* many more rules *)
      ;

      stmtend:
      ENDTAG
      ;
      (* many more rules *)
      (***** end mly file *****)

      For my setting, ENDTAG is not strictly necessary at the end of the
      file. The grammar is pretty big, and refactoring it would be a lot of
      work. I would really like to change the rules as follows:

      (***** begin changes *****)
      main:
      stmtlist EOF1 EOF2
      | stmtlist EOF2
      ;

      stmtend:
      ENDTAG
      | EOF1
      ;
      (***** end changes *****)

      Making this change successfully would require that the lexer returns
      two tokens on eof. Currently, the mll file has a line such as:

      (***** begin mll example *****)
      | eof { EOF }
      (***** end mll example *****)

      I'm trying to replace it with something like:

      (***** begin mll changes *****)
      let hit_eof = ref false
      (* skip lots of stuff *)
      | eof {
      if !hit_eof
      then (EOF2)
      else (hit_eof := true; EOF1) }
      (***** end mll changes *****)

      However, when I compile the system with these changes and run it on a
      file (my_file) that omits "ENDTAG", I get the following error:

      Fatal error: exception Failure("Parser error (lexing: empty token):
      file "./my_file" line 144")

      Can anyone tell me how to make the lexer (compiled with ocamllex)
      return two tokens on eof? I read in another post that eof is just a
      flag that can be read multiple times, but that doesn't seem to be the
      case (or perhaps I did not understand it correctly). Thanks in advance.

      --Gary
    • Mathias Kende
      You may use something like : main: stmtlist EOF ... and just define stmt2 without an ENDTAG token. This is not much to write and should be enough.
      Message 2 of 3 , Feb 28, 2007
        You may use something like :

        main:
        stmtlist EOF
        |stmtlist stmt2 EOF

        and just define stmt2 without an ENDTAG token.
        This is not much to write and should be enough.


        Le mercredi 28 février 2007 à 01:01 +0000, gmwass a écrit :
        > I have a lexer and parser that I want to modify to make the last token
        > optional. Currently, the mly file has like the following:
        >
        > (***** begin mly file *****)
        > main:
        > stmtlist EOF
        > ;
        >
        > stmtlist:
        >
        > | stmt stmtlist
        > ;
        >
        > stmt:
        > stmtend
        > | exp stmtend
        > (* many more rules *)
        > ;
        >
        > stmtend:
        > ENDTAG
        > ;
        > (* many more rules *)
        > (***** end mly file *****)
        >
        > For my setting, ENDTAG is not strictly necessary at the end of the
        > file. The grammar is pretty big, and refactoring it would be a lot of
        > work. I would really like to change the rules as follows:
        >
        > (***** begin changes *****)
        > main:
        > stmtlist EOF1 EOF2
        > | stmtlist EOF2
        > ;
        >
        > stmtend:
        > ENDTAG
        > | EOF1
        > ;
        > (***** end changes *****)
        >
        > Making this change successfully would require that the lexer returns
        > two tokens on eof. Currently, the mll file has a line such as:
        >
        > (***** begin mll example *****)
        > | eof { EOF }
        > (***** end mll example *****)
        >
        > I'm trying to replace it with something like:
        >
        > (***** begin mll changes *****)
        > let hit_eof = ref false
        > (* skip lots of stuff *)
        > | eof {
        > if !hit_eof
        > then (EOF2)
        > else (hit_eof := true; EOF1) }
        > (***** end mll changes *****)
        >
        > However, when I compile the system with these changes and run it on a
        > file (my_file) that omits "ENDTAG", I get the following error:
        >
        > Fatal error: exception Failure("Parser error (lexing: empty token):
        > file "./my_file" line 144")
        >
        > Can anyone tell me how to make the lexer (compiled with ocamllex)
        > return two tokens on eof? I read in another post that eof is just a
        > flag that can be read multiple times, but that doesn't seem to be the
        > case (or perhaps I did not understand it correctly). Thanks in
        > advance.
        >
        > --Gary
        >
        >
        >
        >
        >
      • gmwass
        I tried this, but without success. I get many (567) shift/reduce conflicts and many (54) reduce/reduce conflicts, and the same error message comes up. I
        Message 3 of 3 , Feb 28, 2007
          I tried this, but without success. I get many (567) shift/reduce
          conflicts and many (54) reduce/reduce conflicts, and the same error
          message comes up.

          I defined stmt2 by copying all the rules for stmt and, if the rule
          ends with stmt_end, I remove the stmt_end symbol, and if it does not,
          I remove the rule.

          Probably part of the reason for the many conflicts is that there are
          rules like:

          stmt:
          (* lots of rules *)
          | WHILE LPAREN exp RPAREN stmt

          So I add in new rules like.

          | WHILE LPAREN exp RPAREN stmt2

          I may be able to rewrite the grammar to get rid of the conflicts, but
          I'd rather not risk introducing unintended changes. I would really
          like to do something more like having the lexer return two tokens on
          EOF (as I mentioned in my previous post). Any ideas?

          --Gary

          --- In ocaml_beginners@yahoogroups.com, Mathias Kende <mathias@...> wrote:
          >
          > You may use something like :
          >
          > main:
          > stmtlist EOF
          > |stmtlist stmt2 EOF
          >
          > and just define stmt2 without an ENDTAG token.
          > This is not much to write and should be enough.
          >
          >
        Your message has been successfully submitted and would be delivered to recipients shortly.