Loading ...
Sorry, an error occurred while loading the content.

Ocaml and Python (was: Writing to many files)

Expand Messages
  • Johann Spies
    I have made changes to my source from what I have learnt from your emails as well as replies from Richard Jones. I have also replaced my Str-functions with
    Message 1 of 17 , Apr 1, 2008
    • 0 Attachment
      I have made changes to my source from what I have learnt from your
      emails as well as replies from Richard Jones.

      I have also replaced my Str-functions with Scanf.

      That did not bring about significant improvement in the execution
      speed.

      I did a comparison between my ocaml and python-efforts. This is not a
      scientific comparison. I am an amateur programmer who mainly uses
      his hobby to create tools for system administration. I have regularly
      used Python about 8 years ago and rarely since then. For the past 5
      years most of my programming was done in Ocaml. So I am not an expert
      using either language.

      Filesize(lines) Python Ocaml Ocaml

      342,044 0m6.706s 0m3.838s (This was done on my PC)
      469,513,283 44m+ 486m+
      7,200,684 0m38.483s 3m34.512s


      The last two tests was on a quad-core server with 8G ram. The second
      test was done over the last weekend before I made the abovementioned
      changes to my Ocaml code.

      Regards
      Johann
      --
      Johann Spies Telefoon: 021-808 4036
      Informasietegnologie, Universiteit van Stellenbosch

      "Therefore if any man be in Christ, he is a new
      creature: old things are passed away; behold, all
      things are become new."
      II Corinthians 5:17
    • William D. Neumann
      On Tue, 1 Apr 2008 10:10:59 +0200, Johann Spies wrote ... Have you done any profiling with ocamlprof or gprof/shark/etc.? Where is the code spending its time?
      Message 2 of 17 , Apr 1, 2008
      • 0 Attachment
        On Tue, 1 Apr 2008 10:10:59 +0200, Johann Spies wrote
        > I have made changes to my source from what I have learnt from your
        > emails as well as replies from Richard Jones.
        >
        > I have also replaced my Str-functions with Scanf.
        >
        > That did not bring about significant improvement in the execution
        > speed.

        Have you done any profiling with ocamlprof or gprof/shark/etc.?
        Where is the code spending its time?

        --

        William D. Neumann
      • Johann Spies
        ... I did not do any profiling. I have never used it in the past. Maybe I will try and learn how to use when I have some time at hands. Regards Johann --
        Message 3 of 17 , Apr 1, 2008
        • 0 Attachment
          On Tue, Apr 01, 2008 at 07:31:18AM -0600, William D. Neumann wrote:

          > > That did not bring about significant improvement in the execution
          > > speed.
          >
          > Have you done any profiling with ocamlprof or gprof/shark/etc.?
          > Where is the code spending its time?

          I did not do any profiling. I have never used it in the past. Maybe
          I will try and learn how to use when I have some time at hands.

          Regards
          Johann
          --
          Johann Spies Telefoon: 021-808 4036
          Informasietegnologie, Universiteit van Stellenbosch

          "Therefore if any man be in Christ, he is a new
          creature: old things are passed away; behold, all
          things are become new."
          II Corinthians 5:17
        • Jon Harrop
          ... Profiling OCaml is very easy. Just compile with -p, then run and collect profiling results using gprof: $ ocamlopt -p foo.ml -o foo $ ./foo $ gprof foo
          Message 4 of 17 , Apr 1, 2008
          • 0 Attachment
            On Tuesday 01 April 2008 15:29:19 Johann Spies wrote:
            > On Tue, Apr 01, 2008 at 07:31:18AM -0600, William D. Neumann wrote:
            > > > That did not bring about significant improvement in the execution
            > > > speed.
            > >
            > > Have you done any profiling with ocamlprof or gprof/shark/etc.?
            > > Where is the code spending its time?
            >
            > I did not do any profiling. I have never used it in the past. Maybe
            > I will try and learn how to use when I have some time at hands.

            Profiling OCaml is very easy. Just compile with -p, then run and collect
            profiling results using gprof:

            $ ocamlopt -p foo.ml -o foo
            $ ./foo
            $ gprof foo >profile.txt

            Right at the top of "profile.txt" will be the most time consuming function,
            which should be a huge help here!

            --
            Dr Jon D Harrop, Flying Frog Consultancy Ltd.
            http://www.ffconsultancy.com/products/?e
          • William D. Neumann
            On Tue, 1 Apr 2008 15:32:56 +0100, Jon Harrop wrote ... Unless you re on OS X. In which case, gprof doesn t work (or it didn t in 3.09, I never did check with
            Message 5 of 17 , Apr 1, 2008
            • 0 Attachment
              On Tue, 1 Apr 2008 15:32:56 +0100, Jon Harrop wrote

              > Profiling OCaml is very easy. Just compile with -p, then run and
              > collect profiling results using gprof:
              >
              > $ ocamlopt -p foo.ml -o foo
              > $ ./foo
              > $ gprof foo >profile.txt
              >
              > Right at the top of "profile.txt" will be the most time consuming
              > function, which should be a huge help here!

              Unless you're on OS X. In which case, gprof doesn't work (or it didn't in
              3.09, I never did check with 3.10). But that's OK, as Shark is a better
              tool, anyway. See <http://developer.apple.com/tools/shark_optimize.html>,
              <http://developer.apple.com/tools/sharkoptimize.html>, and the bottom of
              <http://wiki.cocan.org/getting_started_with_ocaml_on_mac_os_x> for more info
              on shark(plus you should have a manual on your system).

              Of course, if you're not on OS X, none of the above really helps you.

              --

              William D. Neumann
            • Johann Spies
              ... Thanks Jon! I did that. The top lines show what I have expected: the Scanf and PrintF functions dominate. That is called for each line in the files. %
              Message 6 of 17 , Apr 2, 2008
              • 0 Attachment
                On Tue, Apr 01, 2008 at 03:32:56PM +0100, Jon Harrop wrote:

                > Profiling OCaml is very easy. Just compile with -p, then run and collect
                > profiling results using gprof:
                >
                > $ ocamlopt -p foo.ml -o foo
                > $ ./foo
                > $ gprof foo >profile.txt
                >
                > Right at the top of "profile.txt" will be the most time consuming function,
                > which should be a huge help here!

                Thanks Jon!

                I did that. The top lines show what I have expected: the Scanf and
                PrintF functions dominate. That is called for each line in the files.


                % cumulative self self total
                time seconds seconds calls s/call s/call name
                6.25 208.74 208.74 939026570 0.00 0.00 camlScanf__skip_spaces_391
                4.17 347.83 139.09 3756106280 0.00 0.00 camlScanf__scan_conversion_615
                3.57 466.97 119.14 2653670522 0.00 0.00 camlBuffer__find_ident_141
                3.55 585.34 118.37 3592697092 0.00 0.00 camlScanf__loop_381
                2.46 667.32 81.98 caml_format_exception
                2.37 746.41 79.10 writeblock
                2.32 823.69 77.28 6103672696 0.00 0.00 caml_apply2
                2.30 900.29 76.61 1878053136 0.00 0.00 camlPrintf__scan_conv_275
                2.06 968.94 68.64 1878053134 0.00 0.00 extern_rec
                2.05 1037.24 68.30 1408539853 0.00 0.00 camlPrintf__scan_flags_142
                1.92 1101.21 63.96 939026570 0.00 0.00 camlScanf__entry
                1.73 1159.04 57.83 camlScanf__peek_char_112
                1.66 1214.58 55.54 3592697090 0.00 0.00 camlList__find_225
                1.49 1264.30 49.71 1878053136 0.00 0.00 camlPrintf__sub_fmt_125
                1.27 1306.84 42.54 1878053138 0.00 0.00 camlPrintf__fun_528
                1.20 1346.75 39.91 469513283 0.00 0.00 camlLees_csv__hanteer_92
                1.19 1386.57 39.82 camlPrintf__parse_format_78
                1.18 1425.99 39.42 469513285 0.00 0.00 writecode8
                1.12 1463.40 37.40 3286592991 0.00 0.00 caml_apply3
                1.05 1498.43 35.03 1408539860 0.00 0.00 caml_curry3
                1.00 1531.95 33.52 1408539853 0.00 0.00 camlPrintf__kapr_199
                0.99 1564.83 32.88 469513289 0.00 0.00 caml_MD5Transform
                0.97 1597.20 32.36 400165547 0.00 0.00 caml_ml_output_char
                0.95 1628.75 31.55 469513285 0.00 0.00 camlPervasives__scan_272


                Regards
                Johann
                --
                Johann Spies Telefoon: 021-808 4036
                Informasietegnologie, Universiteit van Stellenbosch

                "There hath no temptation taken you but such as is
                common to man: but God is faithful, who will not
                suffer you to be tempted above that ye are able; but
                will with the temptation also make a way to escape,
                that ye may be able to bear it."
                I Corinthians 10:13
              • Richard Jones
                ... OCaml-CSV, which as I said you should be using instead of attempting to parse CSV files yourself, uses an imperative state machine so it should be
                Message 7 of 17 , Apr 3, 2008
                • 0 Attachment
                  On Thu, Apr 03, 2008 at 08:52:14AM +0200, Johann Spies wrote:
                  > I did that. The top lines show what I have expected: the Scanf and
                  > PrintF functions dominate. That is called for each line in the files.

                  OCaml-CSV, which as I said you should be using instead of attempting
                  to parse CSV files yourself, uses an imperative state machine so it
                  should be reasonably fast.

                  Rich.

                  --
                  Richard Jones
                  Red Hat
                • Johann Spies
                  ... I know about OCaml-CSV and have used it in another program. In this case, however, I am only interested in the second field on each line. So I thought
                  Message 8 of 17 , Apr 4, 2008
                  • 0 Attachment
                    On Thu, Apr 03, 2008 at 09:41:20AM +0100, Richard Jones wrote:

                    > OCaml-CSV, which as I said you should be using instead of attempting
                    > to parse CSV files yourself, uses an imperative state machine so it
                    > should be reasonably fast.

                    I know about OCaml-CSV and have used it in another program. In this
                    case, however, I am only interested in the second field on each line.
                    So I thought that it will be faster to use Scanf to scan the string
                    just up to the second field than to use OCaml-CSV that will parse the whole
                    line and create a list from it.

                    I might as well experiment with OCaml-CSV in this case.

                    Regards
                    Johann
                    --
                    Johann Spies Telefoon: 021-808 4036
                    Informasietegnologie, Universiteit van Stellenbosch

                    "Thou wilt keep him in perfect peace, whose mind is
                    stayed on thee: because he trusteth in thee."
                    Isaiah 26:3
                  • Johann Spies
                    ... I have adapted your code because I want to use two different output files. My code produces an error which I do not understand: Characters 226-250: This
                    Message 9 of 17 , Apr 22, 2008
                    • 0 Attachment
                      On Fri, Mar 28, 2008 at 07:24:52AM -0600, William D. Neumann wrote:
                      > 2: If you have too many files to keep open, or if you have really long runs
                      > of the same date in your data, you can just coche the last date and channel
                      > in outputfile.
                      >

                      I have adapted your code because I want to use two different output
                      files. My code produces an error which I do not understand:

                      Characters 226-250:
                      This pattern matches values of type 'a option * 'b option * 'c option
                      but is here used to match values of type 'd option * 'e option

                      The code:


                      let last_date = ref None
                      and last_auth = ref None
                      and last_acc = ref None

                      let outputfile date =
                      match !last_date,!last_auth, !last_acc with
                      | Some d, Some oc when d = date, Some ot -> oc,ot
                      | Some d, None when d = date, None ->
                      failwith "Yipes! Cached date but no cached channel!"
                      | Some d, Some oc, Some ot ->
                      close_out oc;
                      close_out ot;
                      let new_oc = open_out_gen
                      [Open_creat;Open_append;Open_wronly]
                      0o644
                      ( date ^ "_acc.csv" ) in
                      let new_ot = open_out_gen
                      [Open_creat;Open_append;Open_wronly]
                      0o644
                      ( date ^ "_auth.csv")
                      in
                      last_date := Some date;
                      last_acc := Some new_oc;
                      last_auth := Some new_ot;
                      new_oc, new_ot
                      | None,None,_ ->
                      let new_oc = open_out_gen
                      [Open_creat;Open_append;Open_wronly]
                      0o644
                      ( date ^ "_acc.csv" ) in
                      let new_ot = open_out_gen
                      [Open_creat;Open_append;Open_wronly]
                      0o644
                      ( date ^ "_auth.csv")
                      in
                      last_date := Some date;
                      last_acc := Some new_oc;
                      last_auth := Some new_ot;
                      new_oc, new_ot

                      | _ ->
                      (* None for date, some open channel -- should never hapen *)
                      assert false


                      Can somebody enlighten me on this one please?

                      Regards
                      Johann
                      --
                      Johann Spies Telefoon: 021-808 4036
                      Informasietegnologie, Universiteit van Stellenbosch

                      "If my people, which are called by my name, shall
                      humble themselves, and pray, and seek my face, and
                      turn from their wicked ways; then will I hear from
                      heaven, and will forgive their sin, and will heal
                      their land." II Chronicles 7:14
                    • Peng Zang
                      ... Hash: SHA1 Just looking at this code without trying it in the toplevel, I think you mean this: match !last_date,!last_auth, !last_acc with ... Peng ...
                      Message 10 of 17 , Apr 22, 2008
                      • 0 Attachment
                        -----BEGIN PGP SIGNED MESSAGE-----
                        Hash: SHA1

                        Just looking at this code without trying it in the toplevel, I think you mean
                        this:

                        match !last_date,!last_auth, !last_acc with

                        | Some d, Some oc, Some ot when d = date -> oc,ot
                        | Some d, None, None when d = date ->

                        Peng


                        On Tuesday 22 April 2008 05:39:51 am Johann Spies wrote:
                        > On Fri, Mar 28, 2008 at 07:24:52AM -0600, William D. Neumann wrote:
                        > > 2: If you have too many files to keep open, or if you have really long
                        > > runs of the same date in your data, you can just coche the last date and
                        > > channel in outputfile.
                        >
                        > I have adapted your code because I want to use two different output
                        > files. My code produces an error which I do not understand:
                        >
                        > Characters 226-250:
                        > This pattern matches values of type 'a option * 'b option * 'c option
                        > but is here used to match values of type 'd option * 'e option
                        >
                        > The code:
                        >
                        >
                        > let last_date = ref None
                        > and last_auth = ref None
                        > and last_acc = ref None
                        >
                        > let outputfile date =
                        > match !last_date,!last_auth, !last_acc with
                        >
                        > | Some d, Some oc when d = date, Some ot -> oc,ot
                        > | Some d, None when d = date, None ->
                        >
                        > failwith "Yipes! Cached date but no cached channel!"
                        >
                        > | Some d, Some oc, Some ot ->
                        >
                        > close_out oc;
                        > close_out ot;
                        > let new_oc = open_out_gen
                        > [Open_creat;Open_append;Open_wronly]
                        > 0o644
                        > ( date ^ "_acc.csv" ) in
                        > let new_ot = open_out_gen
                        > [Open_creat;Open_append;Open_wronly]
                        > 0o644
                        > ( date ^ "_auth.csv")
                        > in
                        > last_date := Some date;
                        > last_acc := Some new_oc;
                        > last_auth := Some new_ot;
                        > new_oc, new_ot
                        >
                        > | None,None,_ ->
                        >
                        > let new_oc = open_out_gen
                        > [Open_creat;Open_append;Open_wronly]
                        > 0o644
                        > ( date ^ "_acc.csv" ) in
                        > let new_ot = open_out_gen
                        > [Open_creat;Open_append;Open_wronly]
                        > 0o644
                        > ( date ^ "_auth.csv")
                        > in
                        > last_date := Some date;
                        > last_acc := Some new_oc;
                        > last_auth := Some new_ot;
                        > new_oc, new_ot
                        >
                        > | _ ->
                        >
                        > (* None for date, some open channel -- should never hapen *)
                        > assert false
                        >
                        >
                        > Can somebody enlighten me on this one please?
                        >
                        > Regards
                        > Johann
                        -----BEGIN PGP SIGNATURE-----
                        Version: GnuPG v2.0.7 (GNU/Linux)

                        iD8DBQFIDeSmfIRcEFL/JewRAoAaAKDWz6duaFxSbgIZc6DQaICcV7/U4gCeJDRw
                        E9WQg4m1AoUXNeydTnwU4Y0=
                        =Kd4n
                        -----END PGP SIGNATURE-----
                      • William D. Neumann
                        On Tue, 22 Apr 2008 11:39:51 +0200, Johann Spies wrote ... The problem is in these last two lines. The when clause has to come after the entire pattern, not
                        Message 11 of 17 , Apr 22, 2008
                        • 0 Attachment
                          On Tue, 22 Apr 2008 11:39:51 +0200, Johann Spies wrote

                          > I have adapted your code because I want to use two different output
                          > files. My code produces an error which I do not understand:
                          >
                          > Characters 226-250:
                          > This pattern matches values of type 'a option * 'b option * 'c option
                          > but is here used to match values of type 'd option * 'e option
                          >
                          > The code:
                          >
                          > let last_date = ref None
                          > and last_auth = ref None
                          > and last_acc = ref None
                          >
                          > let outputfile date =
                          > match !last_date,!last_auth, !last_acc with
                          > | Some d, Some oc when d = date, Some ot -> oc,ot
                          > | Some d, None when d = date, None ->

                          The problem is in these last two lines. The when clause has to come after
                          the entire pattern, not in the middle of it. change those two lines to the
                          following and it works:

                          | Some d, Some oc , Some ot when d = date -> oc,ot
                          | Some d, None, None when d = date ->

                          See chapter 6 of the manual for more detail.

                          --

                          William D. Neumann
                        • Johann Spies
                          ... Thanks. Johann -- Johann Spies Telefoon: 021-808 4036 Informasietegnologie, Universiteit van Stellenbosch That Christ may dwell in your hearts by
                          Message 12 of 17 , Apr 23, 2008
                          • 0 Attachment
                            On Tue, Apr 22, 2008 at 08:25:10AM -0500, William D. Neumann wrote:
                            > On Tue, 22 Apr 2008 11:39:51 +0200, Johann Spies wrote
                            >
                            > > I have adapted your code because I want to use two different output
                            > > files. My code produces an error which I do not understand:
                            > >
                            > > Characters 226-250:
                            > > This pattern matches values of type 'a option * 'b option * 'c option
                            > > but is here used to match values of type 'd option * 'e option
                            > >
                            > > The code:
                            > >
                            > > let last_date = ref None
                            > > and last_auth = ref None
                            > > and last_acc = ref None
                            > >
                            > > let outputfile date =
                            > > match !last_date,!last_auth, !last_acc with
                            > > | Some d, Some oc when d = date, Some ot -> oc,ot
                            > > | Some d, None when d = date, None ->
                            >
                            > The problem is in these last two lines. The when clause has to come after
                            > the entire pattern, not in the middle of it. change those two lines to the
                            > following and it works:
                            >
                            > | Some d, Some oc , Some ot when d = date -> oc,ot
                            > | Some d, None, None when d = date ->
                            >
                            > See chapter 6 of the manual for more detail.
                            >

                            Thanks.

                            Johann
                            --
                            Johann Spies Telefoon: 021-808 4036
                            Informasietegnologie, Universiteit van Stellenbosch

                            "That Christ may dwell in your hearts by faith; that
                            ye, being rooted and grounded in love, May be able to
                            comprehend with all saints what is the breadth, and
                            length, and depth, and height; And to know the love of
                            Christ, which passeth knowledge, that ye might be
                            filled with all the fulness of God."
                            Ephesians 3:17-19
                          Your message has been successfully submitted and would be delivered to recipients shortly.