Loading ...
Sorry, an error occurred while loading the content.

grepping a large file (part 2)

Expand Messages
  • Mihamina Rakotomandimby
    Manao ahoana, Hello, Bonjour, This is my piece of code: (* mail.log is more or less 900k lines postfix logs *) let mail_log = open_in
    Message 1 of 9 , Dec 21, 2009
    • 0 Attachment
      Manao ahoana, Hello, Bonjour,

      This is my piece of code:

      (* "mail.log" is more or less 900k lines postfix logs *)
      let mail_log = open_in "/home/mihamina/public_html/TrackMail/mail.log";;

      (* to have "true" or "false" if match or not *)
      let line_matched email line =
      try
      (Str.search_forward (Str.regexp email) line 0); true
      with
      Not_found -> false;;

      (* put matching lines in a list *)
      let rec find_email accumulator email =
      try
      let line = (input_line mail_log)
      in
      match line_matched email line with
      true -> find_email (line::accumulator) email
      | false -> find_email (accumulator) email
      with
      End_of_file -> accumulator;;

      When I launch find_email, I get:

      # find_email [] "mihamina@...";;
      Stack overflow during evaluation (looping recursion?).

      On my 4GB RAM laptop. Same result on my 8GB RAM workstation.
      the final production server will be a virtual machine of 1GB RAM.

      How to manage this: More efficient recursivity? How?

      Misaotra, Thanks, Merci.

      --
      Architecte Informatique chez Blueline/Gulfsat:
      Administration Systeme, Recherche & Developpement
      +261 34 29 155 34 / +261 33 11 207 36
    • Ashish Agarwal
      Your function is not tail-recursive because the recursive call is inside a try-with block. A purely functional solution is possible, but a while loop is also a
      Message 2 of 9 , Dec 21, 2009
      • 0 Attachment
        Your function is not tail-recursive because the recursive call is inside a
        try-with block. A purely functional solution is possible, but a while loop
        is also a reasonable solution in this case. For example, try this:

        let run email =
        let accum = ref [] in
        try
        while true do
        let line = input_line mail_log in
        if line_matched email line then
        accum := line::!accum
        done;
        assert false (* you can never get here *)
        with
        End_of_file -> List.rev (!accum) (* return the accumulator at end of
        file *)



        On Mon, Dec 21, 2009 at 2:36 PM, Mihamina Rakotomandimby <
        mihamina@...> wrote:

        >
        >
        > Manao ahoana, Hello, Bonjour,
        >
        > This is my piece of code:
        >
        > (* "mail.log" is more or less 900k lines postfix logs *)
        > let mail_log = open_in "/home/mihamina/public_html/TrackMail/mail.log";;
        >
        > (* to have "true" or "false" if match or not *)
        > let line_matched email line =
        > try
        > (Str.search_forward (Str.regexp email) line 0); true
        > with
        > Not_found -> false;;
        >
        > (* put matching lines in a list *)
        > let rec find_email accumulator email =
        > try
        > let line = (input_line mail_log)
        > in
        > match line_matched email line with
        > true -> find_email (line::accumulator) email
        > | false -> find_email (accumulator) email
        > with
        > End_of_file -> accumulator;;
        >
        > When I launch find_email, I get:
        >
        > # find_email [] "mihamina@... <mihamina%40gulfsat.mg>";;
        > Stack overflow during evaluation (looping recursion?).
        >
        > On my 4GB RAM laptop. Same result on my 8GB RAM workstation.
        > the final production server will be a virtual machine of 1GB RAM.
        >
        > How to manage this: More efficient recursivity? How?
        >
        > Misaotra, Thanks, Merci.
        >
        > --
        > Architecte Informatique chez Blueline/Gulfsat:
        > Administration Systeme, Recherche & Developpement
        > +261 34 29 155 34 / +261 33 11 207 36
        >
        >


        [Non-text portions of this message have been removed]
      • "Markus W. Weißmann"
        You could also write a wrapper around input_line like let input_line_opt x = try Some (input_line x) with End_of_file - None and so make your function tail
        Message 3 of 9 , Dec 21, 2009
        • 0 Attachment
          You could also write a wrapper around 'input_line' like

          let input_line_opt x = try Some (input_line x) with End_of_file -> None

          and so make your function tail recursive

          let rec find_email accumulator email =
          match (input_line_opt mail_log) with
          | None -> accumulator
          | Some line ->
          begin match line_matched email line with
          true -> find_email (line::accumulator) email
          | false -> find_email (accumulator) email
          end;;


          Regards,

          -Markus

          On 21 Dec 2009, at 21:54, Ashish Agarwal wrote:

          > Your function is not tail-recursive because the recursive call is inside a
          > try-with block. A purely functional solution is possible, but a while loop
          > is also a reasonable solution in this case. For example, try this:
          >
          > let run email =
          > let accum = ref [] in
          > try
          > while true do
          > let line = input_line mail_log in
          > if line_matched email line then
          > accum := line::!accum
          > done;
          > assert false (* you can never get here *)
          > with
          > End_of_file -> List.rev (!accum) (* return the accumulator at end of
          > file *)
          >
          >
          >
          > On Mon, Dec 21, 2009 at 2:36 PM, Mihamina Rakotomandimby <
          > mihamina@...> wrote:
          >
          >>
          >>
          >> Manao ahoana, Hello, Bonjour,
          >>
          >> This is my piece of code:
          >>
          >> (* "mail.log" is more or less 900k lines postfix logs *)
          >> let mail_log = open_in "/home/mihamina/public_html/TrackMail/mail.log";;
          >>
          >> (* to have "true" or "false" if match or not *)
          >> let line_matched email line =
          >> try
          >> (Str.search_forward (Str.regexp email) line 0); true
          >> with
          >> Not_found -> false;;
          >>
          >> (* put matching lines in a list *)
          >> let rec find_email accumulator email =
          >> try
          >> let line = (input_line mail_log)
          >> in
          >> match line_matched email line with
          >> true -> find_email (line::accumulator) email
          >> | false -> find_email (accumulator) email
          >> with
          >> End_of_file -> accumulator;;
          >>
          >> When I launch find_email, I get:
          >>
          >> # find_email [] "mihamina@... <mihamina%40gulfsat.mg>";;
          >> Stack overflow during evaluation (looping recursion?).
          >>
          >> On my 4GB RAM laptop. Same result on my 8GB RAM workstation.
          >> the final production server will be a virtual machine of 1GB RAM.
          >>
          >> How to manage this: More efficient recursivity? How?
          >>
          >> Misaotra, Thanks, Merci.
          >>
          >> --
          >> Architecte Informatique chez Blueline/Gulfsat:
          >> Administration Systeme, Recherche & Developpement
          >> +261 34 29 155 34 / +261 33 11 207 36
          >>
          >>
          >
          >
          > [Non-text portions of this message have been removed]
          >
          >
          >
          > ------------------------------------
          >
          > Archives up to December 31, 2008 are also downloadable at http://www.connettivo.net/cntprojects/ocaml_beginners/
          > The archives of the very official ocaml list (the seniors' one) can be found at http://caml.inria.fr
          > Attachments are banned and you're asked to be polite, avoid flames etc.Yahoo! Groups Links
          >
          >
          >

          --
          Markus Weißmann, M.Sc.
          Institut für Informatik
          Technische Universität München
          Raum 03.07.054, Boltzmannstr. 3, 85748 Garching

          Tel. +49 (89) 2 89-1 81 05
          Fax +49 (89) 2 89-1 81 07
        • Mihamina Rakotomandimby
          ... I d rather this one. ... find_email is directly reccursively called, without any pending operation that couls make it non tail Could you help me to
          Message 4 of 9 , Dec 22, 2009
          • 0 Attachment
            > "Markus W. Weißmann" <markus.weissmann@...> :
            > let input_line_opt x =
            > try Some (input_line x)
            > with End_of_file -> None
            >
            > let rec find_email accumulator email =
            > match (input_line_opt mail_log) with
            > | None -> accumulator
            > | Some line ->
            > begin match line_matched email line with
            > true -> find_email (line::accumulator) email
            > | false -> find_email (accumulator) email
            > end;;


            I'd rather this one.
            But I wonder why mine has not been tail recursive:

            >> let rec find_email accumulator email =
            >> try
            >> let line = (input_line mail_log)
            >> in
            >> match line_matched email line with
            >> true -> find_email (line::accumulator) email
            >> | false -> find_email (accumulator) email
            >> with
            >> End_of_file -> accumulator;;

            "find_email" is directly reccursively called, without any pending
            operation that couls make it "non tail"

            Could you help me to make it clearer in my mind?

            Thank you.

            --
            Architecte Informatique chez Blueline/Gulfsat:
            Administration Systeme, Recherche & Developpement
            +261 34 29 155 34 / +261 33 11 207 36
          • Frédéric van der Plancke
            ... There are hidden pending operations. As I understand it (without knowledge of what OCaml exactly does) when you enter a try block, the runtimes sets up
            Message 5 of 9 , Dec 22, 2009
            • 0 Attachment
              Mihamina Rakotomandimby wrote:
              > But I wonder why mine has not been tail recursive:
              >
              >>> let rec find_email accumulator email =
              >>> try
              >>> let line = (input_line mail_log)
              >>> in
              >>> match line_matched email line with
              >>> true -> find_email (line::accumulator) email
              >>> | false -> find_email (accumulator) email
              >>> with
              >>> End_of_file -> accumulator;;
              >>>
              >
              > "find_email" is directly reccursively called, without any pending
              > operation that couls make it "non tail"
              >
              > Could you help me to make it clearer in my mind?
              >
              > Thank you.
              >
              >
              There are hidden pending operations. As I understand it (without
              knowledge of what OCaml exactly does) when you enter a "try" block, the
              runtimes sets up your exception handler. When you leave the "try" block,
              this setup must be undone. This implies there is some work to perform
              after returning from find_email which prevent find_email to be tail-called.

              Frédéric
            • Richard Jones
              ... The try ... with block breaks tail recursion. Honestly though, the while loop version is the best version. Some thing are just better done imperatively,
              Message 6 of 9 , Dec 22, 2009
              • 0 Attachment
                On Tue, Dec 22, 2009 at 11:27:34AM +0300, Mihamina Rakotomandimby wrote:
                > But I wonder why mine has not been tail recursive:
                >
                > >> let rec find_email accumulator email =
                > >> try
                > >> let line = (input_line mail_log)
                > >> in
                > >> match line_matched email line with
                > >> true -> find_email (line::accumulator) email
                > >> | false -> find_email (accumulator) email
                > >> with
                > >> End_of_file -> accumulator;;
                >
                > "find_email" is directly reccursively called, without any pending
                > operation that couls make it "non tail"

                The try ... with block breaks tail recursion.

                Honestly though, the while loop version is the best version. Some
                thing are just better done imperatively, and looping over a list of
                input items is one of those things. Luckily OCaml is a flexible
                programming language that gives you a nice choice of styles to use!

                Rich.

                --
                Richard Jones
                Red Hat
              • Mihamina Rakotomandimby
                ... Sure. But I am still in the step of getting my eyes used with pure functions I previously coded in PHP and Python which make intensive use of for loops,
                Message 7 of 9 , Dec 22, 2009
                • 0 Attachment
                  > Richard Jones <rich@...> :
                  > Honestly though, the while loop version is the best version.

                  Sure.

                  But I am still in the step of getting my eyes used with "pure functions"
                  I previously coded in PHP and Python which make intensive use of for
                  loops, I would like to radically change, just for a moment.

                  > Some
                  > thing are just better done imperatively, and looping over a list of
                  > input items is one of those things. Luckily OCaml is a flexible
                  > programming language that gives you a nice choice of styles to use!

                  Luckily, as you said :-)

                  --
                  Architecte Informatique chez Blueline/Gulfsat:
                  Administration Systeme, Recherche & Developpement
                  +261 34 29 155 34 / +261 33 11 207 36
                • Lars Nilsson
                  On Tue, Dec 22, 2009 at 8:11 AM, Mihamina Rakotomandimby ... You could always replace the while loop with a recursive function that does the same thing, if you
                  Message 8 of 9 , Dec 22, 2009
                  • 0 Attachment
                    On Tue, Dec 22, 2009 at 8:11 AM, Mihamina Rakotomandimby
                    <mihamina@...> wrote:
                    > > Richard Jones <rich@...> :
                    >
                    > > Honestly though, the while loop version is the best version.
                    >
                    > Sure.
                    >
                    > But I am still in the step of getting my eyes used with "pure functions"
                    > I previously coded in PHP and Python which make intensive use of for
                    > loops, I would like to radically change, just for a moment.

                    You could always replace the while loop with a recursive function that
                    does the same thing, if you really wanted to..

                    let find_str str =
                    let accumulator = ref [] in
                    try
                    let rec find_str_rec () =
                    let line = input_line mail_log in
                    if line_matched str line then accumulator := line :: !accumulator;
                    find_str_rec ()
                    in find_str_rec ()
                    with End_of_file -> !accumulator

                    Lars Nilsson
                  • jshaw10
                    It s simpler IMO to pass the accumulator as an argument. Furthermore you can clean things up a bit by separating the search part from the input/error handling
                    Message 9 of 9 , Dec 30, 2009
                    • 0 Attachment
                      It's simpler IMO to pass the accumulator as an argument. Furthermore you can clean things up a bit by separating the search part from the input/error handling part.

                      let find_str str =
                      let rec get_line accumulator =
                      match try Some (input_line mail_log) with End_of_file -> None with
                      Some line -> search line accumulator
                      | None -> accumulator
                      and search line accumulator =
                      get_line
                      (
                      if line_matched str line
                      then line::accumulator
                      else accumulator
                      )
                      in
                      get_line []

                      > You could always replace the while loop with a recursive function that
                      > does the same thing, if you really wanted to..
                      >
                      > let find_str str =
                      > let accumulator = ref [] in
                      > try
                      > let rec find_str_rec () =
                      > let line = input_line mail_log in
                      > if line_matched str line then accumulator := line :: !accumulator;
                      > find_str_rec ()
                      > in find_str_rec ()
                      > with End_of_file -> !accumulator
                      >
                      > Lars Nilsson
                      >
                    Your message has been successfully submitted and would be delivered to recipients shortly.