Loading ...
Sorry, an error occurred while loading the content.
 

Re: "ocaml_beginners"::[] ocamllex

Expand Messages
  • William D. Neumann
    ... Not sure I understand that... could you elaborate? ... Regarding the readfile bit, yes. You missed the advantage that that was some code I had lying
    Message 1 of 16 , Dec 2, 2005
      On Fri, 2 Dec 2005, Oliver Bandel wrote:

      > your soulution looks very low-level (in the sense of
      > going down to the basment using a candle for lighting the flowers
      > instead of switching on the electrical light...).

      Not sure I understand that... could you elaborate?

      >> {
      >> let readFile fn =
      >> (SNIP)
      >
      > well, looks like a horror. :(
      >
      > Is this really necessary?
      >
      > I don't think so.
      >
      > But maybe I missed the advantage of this solution?

      Regarding the readfile bit, yes. You missed the advantage that that was
      some code I had lying around in another .mll file on my drive. Cut and
      paste is faster than writing new code to read a file... But that's not
      the interesting part of the example anyway... The interesting part is the
      rules and the auxilliary functions in the header.

      > Here is, how I would possibly make it:
      >
      >
      > =====================================================
      > (SNIP)
      > rule atbegin = parse
      > (SNIP)
      > and
      > endthisline_nomatch = parse
      > (SNIP)
      > and
      > endthisline_withmatch = parse
      > (SNIP)
      > =====================================================
      >
      > I tried it out and it works, if I have not overseen something...

      It's a matter of scale and code reuse.

      The problem with your approach is that you will likely duplicate your
      rulesets. Imagine the simple case where you have three main types of
      tokens:
      LINENO i which represent non-negative integers at the start of a line
      INT i, which represent non-negative integers not at the start of a line or
      negative integers anywhere.
      WORD w, which matches alphabetic strings.

      Using the offset check, the ruleset is:
      (******************************************)
      let word = ['a'-'z' 'A'-'Z']+
      let num = ['0'-'9']+
      let negnum = '-'['0'-'9']+

      rule main = parse
      | num {let n = Lexing.lexeme lexbuf in
      if at_start lexbuf then (`LINENO n) else (`INT n)
      }
      | negnum { `INT (Lexing.lexeme lexbuf)}
      | word {`WORD (Lexing.lexeme lexbuf)}
      | '\n' { newline lexbuf; `EOL }
      | eof { `EOF }
      | _ {main lexbuf}
      (******************************************)

      Using your multiple rule version we need something like:

      (******************************************)
      rule atbegin = parse
      '\n' { incr count; print_newline (); atbegin lexbuf}
      | num { Printf.printf "#%s " (Lexing.lexeme lexbuf);
      not_at_begin lexbuf }
      | negnum { Printf.printf "INT (%s) " (Lexing.lexeme lexbuf);
      not_at_begin lexbuf }
      | word { Printf.printf "WORD (%s) " (Lexing.lexeme lexbuf);
      not_at_begin lexbuf }
      | eof { raise End_of_file }
      | _ { not_at_begin lexbuf }


      and not_at_begin = parse
      | '\n' { incr count; print_newline (); atbegin lexbuf}
      | num | negnum
      { Printf.printf "INT (%s) " (Lexing.lexeme lexbuf);
      not_at_begin lexbuf }
      | word { Printf.printf "WORD (%s) " (Lexing.lexeme lexbuf);
      not_at_begin lexbuf }
      | eof { raise End_of_file }
      | _ { not_at_begin lexbuf }
      (******************************************)

      Note that you have to duplicate most things because you have to check for
      them both at the start of a line with one rule, and everywhere else with
      another rule. Now imagine a case where you've got 100 different patterns
      to match...

      Also, say you want to add error handling. Let's say that any char that is
      not an alphanumeric character, whitespace, or a minus sign should print a
      warning telling the line number and position on the line where the error
      occurred, skip to the end of the line, then continue processing... unless
      it occurs at the start of the line, in which case we skip it as a comment.
      (Yeah, I know... a bit contrived, but I'm trying to keep it simple.)

      I've already got all that infrastructure in my setup. The rules now
      become:
      (******************************************)
      let word = ['a'-'z' 'A'-'Z']+
      let num = ['0'-'9']+
      let negnum = '-'['0'-'9']+
      let ws = [' ' '\t' '\r']+

      rule main = parse
      | num {let n = Lexing.lexeme lexbuf in
      if at_start lexbuf then (`LINENO n) else (`INT n)
      }
      | negnum { `INT (Lexing.lexeme lexbuf)}
      | word {`WORD (Lexing.lexeme lexbuf)}
      | '\n' { newline lexbuf; `EOL }
      | eof { `EOF }
      | ws {main lexbuf}
      | _ {if not (at_start lexbuf)
      then Printf.eprintf "Warning: Illegal character: '%s' at
      line: %d, position: %d\n"
      (Lexing.lexeme lexbuf)
      !lineno
      (Lexing.lexeme_start lexbuf-!start+1);
      skip_to_end lexbuf;
      }
      and skip_to_end = parse
      | '\n' { newline lexbuf; `EOL }
      | _ { skip_to_end lexbuf }
      (******************************************)

      I only need to change the one _ pattern into a whitespace and _ pattern in
      one spot, whereas you would need to change the _ rule in both atbegin and
      in not_at_begin to skip whitespace and to implement the different behavior
      for the illegal/comment characters.

      Additionally, you need to add a counter to provide position within a line
      information to do the error reporting, and then not use it for anything
      else.

      William D. Neumann

      ---

      "There's just so many extra children, we could just feed the
      children to these tigers. We don't need them, we're not doing
      anything with them.

      Tigers are noble and sleek; children are loud and messy."

      -- Neko Case

      Life is unfair. Kill yourself or get over it.
      -- Black Box Recorder
    Your message has been successfully submitted and would be delivered to recipients shortly.