Sorry, an error occurred while loading the content.

## grepping a large file (part 2)

Expand Messages
• Manao ahoana, Hello, Bonjour, This is my piece of code: (* mail.log is more or less 900k lines postfix logs *) let mail_log = open_in
Message 1 of 9 , Dec 21, 2009
Manao ahoana, Hello, Bonjour,

This is my piece of code:

(* "mail.log" is more or less 900k lines postfix logs *)
let mail_log = open_in "/home/mihamina/public_html/TrackMail/mail.log";;

(* to have "true" or "false" if match or not *)
let line_matched email line =
try
(Str.search_forward (Str.regexp email) line 0); true
with
Not_found -> false;;

(* put matching lines in a list *)
let rec find_email accumulator email =
try
let line = (input_line mail_log)
in
match line_matched email line with
true -> find_email (line::accumulator) email
| false -> find_email (accumulator) email
with
End_of_file -> accumulator;;

When I launch find_email, I get:

# find_email [] "mihamina@...";;
Stack overflow during evaluation (looping recursion?).

On my 4GB RAM laptop. Same result on my 8GB RAM workstation.
the final production server will be a virtual machine of 1GB RAM.

How to manage this: More efficient recursivity? How?

Misaotra, Thanks, Merci.

--
Architecte Informatique chez Blueline/Gulfsat:
Administration Systeme, Recherche & Developpement
+261 34 29 155 34 / +261 33 11 207 36
• Your function is not tail-recursive because the recursive call is inside a try-with block. A purely functional solution is possible, but a while loop is also a
Message 2 of 9 , Dec 21, 2009
Your function is not tail-recursive because the recursive call is inside a
try-with block. A purely functional solution is possible, but a while loop
is also a reasonable solution in this case. For example, try this:

let run email =
let accum = ref [] in
try
while true do
if line_matched email line then
accum := line::!accum
done;
assert false (* you can never get here *)
with
End_of_file -> List.rev (!accum) (* return the accumulator at end of
file *)

On Mon, Dec 21, 2009 at 2:36 PM, Mihamina Rakotomandimby <
mihamina@...> wrote:

>
>
> Manao ahoana, Hello, Bonjour,
>
> This is my piece of code:
>
> (* "mail.log" is more or less 900k lines postfix logs *)
> let mail_log = open_in "/home/mihamina/public_html/TrackMail/mail.log";;
>
> (* to have "true" or "false" if match or not *)
> let line_matched email line =
> try
> (Str.search_forward (Str.regexp email) line 0); true
> with
> Not_found -> false;;
>
> (* put matching lines in a list *)
> let rec find_email accumulator email =
> try
> let line = (input_line mail_log)
> in
> match line_matched email line with
> true -> find_email (line::accumulator) email
> | false -> find_email (accumulator) email
> with
> End_of_file -> accumulator;;
>
> When I launch find_email, I get:
>
> # find_email [] "mihamina@... <mihamina%40gulfsat.mg>";;
> Stack overflow during evaluation (looping recursion?).
>
> On my 4GB RAM laptop. Same result on my 8GB RAM workstation.
> the final production server will be a virtual machine of 1GB RAM.
>
> How to manage this: More efficient recursivity? How?
>
> Misaotra, Thanks, Merci.
>
> --
> Architecte Informatique chez Blueline/Gulfsat:
> Administration Systeme, Recherche & Developpement
> +261 34 29 155 34 / +261 33 11 207 36
>
>

[Non-text portions of this message have been removed]
• You could also write a wrapper around input_line like let input_line_opt x = try Some (input_line x) with End_of_file - None and so make your function tail
Message 3 of 9 , Dec 21, 2009
You could also write a wrapper around 'input_line' like

let input_line_opt x = try Some (input_line x) with End_of_file -> None

and so make your function tail recursive

let rec find_email accumulator email =
match (input_line_opt mail_log) with
| None -> accumulator
| Some line ->
begin match line_matched email line with
true -> find_email (line::accumulator) email
| false -> find_email (accumulator) email
end;;

Regards,

-Markus

On 21 Dec 2009, at 21:54, Ashish Agarwal wrote:

> Your function is not tail-recursive because the recursive call is inside a
> try-with block. A purely functional solution is possible, but a while loop
> is also a reasonable solution in this case. For example, try this:
>
> let run email =
> let accum = ref [] in
> try
> while true do
> let line = input_line mail_log in
> if line_matched email line then
> accum := line::!accum
> done;
> assert false (* you can never get here *)
> with
> End_of_file -> List.rev (!accum) (* return the accumulator at end of
> file *)
>
>
>
> On Mon, Dec 21, 2009 at 2:36 PM, Mihamina Rakotomandimby <
> mihamina@...> wrote:
>
>>
>>
>> Manao ahoana, Hello, Bonjour,
>>
>> This is my piece of code:
>>
>> (* "mail.log" is more or less 900k lines postfix logs *)
>> let mail_log = open_in "/home/mihamina/public_html/TrackMail/mail.log";;
>>
>> (* to have "true" or "false" if match or not *)
>> let line_matched email line =
>> try
>> (Str.search_forward (Str.regexp email) line 0); true
>> with
>> Not_found -> false;;
>>
>> (* put matching lines in a list *)
>> let rec find_email accumulator email =
>> try
>> let line = (input_line mail_log)
>> in
>> match line_matched email line with
>> true -> find_email (line::accumulator) email
>> | false -> find_email (accumulator) email
>> with
>> End_of_file -> accumulator;;
>>
>> When I launch find_email, I get:
>>
>> # find_email [] "mihamina@... <mihamina%40gulfsat.mg>";;
>> Stack overflow during evaluation (looping recursion?).
>>
>> On my 4GB RAM laptop. Same result on my 8GB RAM workstation.
>> the final production server will be a virtual machine of 1GB RAM.
>>
>> How to manage this: More efficient recursivity? How?
>>
>> Misaotra, Thanks, Merci.
>>
>> --
>> Architecte Informatique chez Blueline/Gulfsat:
>> Administration Systeme, Recherche & Developpement
>> +261 34 29 155 34 / +261 33 11 207 36
>>
>>
>
>
> [Non-text portions of this message have been removed]
>
>
>
> ------------------------------------
>
> Archives up to December 31, 2008 are also downloadable at http://www.connettivo.net/cntprojects/ocaml_beginners/
> The archives of the very official ocaml list (the seniors' one) can be found at http://caml.inria.fr
> Attachments are banned and you're asked to be polite, avoid flames etc.Yahoo! Groups Links
>
>
>

--
Markus Weißmann, M.Sc.
Institut für Informatik
Technische Universität München
Raum 03.07.054, Boltzmannstr. 3, 85748 Garching

Tel. +49 (89) 2 89-1 81 05
Fax +49 (89) 2 89-1 81 07
• ... I d rather this one. ... find_email is directly reccursively called, without any pending operation that couls make it non tail Could you help me to
Message 4 of 9 , Dec 22, 2009
> "Markus W. Weißmann" <markus.weissmann@...> :
> let input_line_opt x =
> try Some (input_line x)
> with End_of_file -> None
>
> let rec find_email accumulator email =
> match (input_line_opt mail_log) with
> | None -> accumulator
> | Some line ->
> begin match line_matched email line with
> true -> find_email (line::accumulator) email
> | false -> find_email (accumulator) email
> end;;

I'd rather this one.
But I wonder why mine has not been tail recursive:

>> let rec find_email accumulator email =
>> try
>> let line = (input_line mail_log)
>> in
>> match line_matched email line with
>> true -> find_email (line::accumulator) email
>> | false -> find_email (accumulator) email
>> with
>> End_of_file -> accumulator;;

"find_email" is directly reccursively called, without any pending
operation that couls make it "non tail"

Could you help me to make it clearer in my mind?

Thank you.

--
Architecte Informatique chez Blueline/Gulfsat:
Administration Systeme, Recherche & Developpement
+261 34 29 155 34 / +261 33 11 207 36
• ... There are hidden pending operations. As I understand it (without knowledge of what OCaml exactly does) when you enter a try block, the runtimes sets up
Message 5 of 9 , Dec 22, 2009
Mihamina Rakotomandimby wrote:
> But I wonder why mine has not been tail recursive:
>
>>> let rec find_email accumulator email =
>>> try
>>> let line = (input_line mail_log)
>>> in
>>> match line_matched email line with
>>> true -> find_email (line::accumulator) email
>>> | false -> find_email (accumulator) email
>>> with
>>> End_of_file -> accumulator;;
>>>
>
> "find_email" is directly reccursively called, without any pending
> operation that couls make it "non tail"
>
> Could you help me to make it clearer in my mind?
>
> Thank you.
>
>
There are hidden pending operations. As I understand it (without
knowledge of what OCaml exactly does) when you enter a "try" block, the
runtimes sets up your exception handler. When you leave the "try" block,
this setup must be undone. This implies there is some work to perform
after returning from find_email which prevent find_email to be tail-called.

Frédéric
• ... The try ... with block breaks tail recursion. Honestly though, the while loop version is the best version. Some thing are just better done imperatively,
Message 6 of 9 , Dec 22, 2009
On Tue, Dec 22, 2009 at 11:27:34AM +0300, Mihamina Rakotomandimby wrote:
> But I wonder why mine has not been tail recursive:
>
> >> let rec find_email accumulator email =
> >> try
> >> let line = (input_line mail_log)
> >> in
> >> match line_matched email line with
> >> true -> find_email (line::accumulator) email
> >> | false -> find_email (accumulator) email
> >> with
> >> End_of_file -> accumulator;;
>
> "find_email" is directly reccursively called, without any pending
> operation that couls make it "non tail"

The try ... with block breaks tail recursion.

Honestly though, the while loop version is the best version. Some
thing are just better done imperatively, and looping over a list of
input items is one of those things. Luckily OCaml is a flexible
programming language that gives you a nice choice of styles to use!

Rich.

--
Richard Jones
Red Hat
• ... Sure. But I am still in the step of getting my eyes used with pure functions I previously coded in PHP and Python which make intensive use of for loops,
Message 7 of 9 , Dec 22, 2009
> Richard Jones <rich@...> :
> Honestly though, the while loop version is the best version.

Sure.

But I am still in the step of getting my eyes used with "pure functions"
I previously coded in PHP and Python which make intensive use of for
loops, I would like to radically change, just for a moment.

> Some
> thing are just better done imperatively, and looping over a list of
> input items is one of those things. Luckily OCaml is a flexible
> programming language that gives you a nice choice of styles to use!

Luckily, as you said :-)

--
Architecte Informatique chez Blueline/Gulfsat:
Administration Systeme, Recherche & Developpement
+261 34 29 155 34 / +261 33 11 207 36
• On Tue, Dec 22, 2009 at 8:11 AM, Mihamina Rakotomandimby ... You could always replace the while loop with a recursive function that does the same thing, if you
Message 8 of 9 , Dec 22, 2009
On Tue, Dec 22, 2009 at 8:11 AM, Mihamina Rakotomandimby
<mihamina@...> wrote:
> > Richard Jones <rich@...> :
>
> > Honestly though, the while loop version is the best version.
>
> Sure.
>
> But I am still in the step of getting my eyes used with "pure functions"
> I previously coded in PHP and Python which make intensive use of for
> loops, I would like to radically change, just for a moment.

You could always replace the while loop with a recursive function that
does the same thing, if you really wanted to..

let find_str str =
let accumulator = ref [] in
try
let rec find_str_rec () =
if line_matched str line then accumulator := line :: !accumulator;
find_str_rec ()
in find_str_rec ()
with End_of_file -> !accumulator

Lars Nilsson
• It s simpler IMO to pass the accumulator as an argument. Furthermore you can clean things up a bit by separating the search part from the input/error handling
Message 9 of 9 , Dec 30, 2009
It's simpler IMO to pass the accumulator as an argument. Furthermore you can clean things up a bit by separating the search part from the input/error handling part.

let find_str str =
let rec get_line accumulator =
match try Some (input_line mail_log) with End_of_file -> None with
Some line -> search line accumulator
| None -> accumulator
and search line accumulator =
get_line
(
if line_matched str line
then line::accumulator
else accumulator
)
in
get_line []

> You could always replace the while loop with a recursive function that
> does the same thing, if you really wanted to..
>
> let find_str str =
> let accumulator = ref [] in
> try
> let rec find_str_rec () =
> let line = input_line mail_log in
> if line_matched str line then accumulator := line :: !accumulator;
> find_str_rec ()
> in find_str_rec ()
> with End_of_file -> !accumulator
>
> Lars Nilsson
>
Your message has been successfully submitted and would be delivered to recipients shortly.