Loading ...
Sorry, an error occurred while loading the content.

Re: "ocaml_beginners"::[] reading big file -> how to make it faster?

Expand Messages
  • Michael Wohlwend
    ... here you set all elements of the matrix to the same record (not every element to a new record). You have to use a function to initialize the matrix. cheers
    Message 1 of 28 , Jun 4, 2006
    • 0 Attachment
      On Monday 05 June 2006 01:29, Oliver Bandel wrote:
      > Maybe I'm stupid today, but what is wrong here?
      >
      >
      >
      > ==================================================
      > let picture = Array.make_matrix width height {red=0;green=0;blue=0} in

      here you set all elements of the matrix to the same record (not every element
      to a new record). You have to use a function to initialize the matrix.

      cheers
      Michael
    • Oliver Bandel
      ... OK, thanks. This is a thing in OCaml, I really hate. :( The only thing I really hate in OCaml.... ...and I again and again forget this thing... :( ... OK,
      Message 2 of 28 , Jun 4, 2006
      • 0 Attachment
        On Sun, Jun 04, 2006 at 04:40:53PM -0700, Frederick Akalin wrote:
        > I've run into this before. Basically, behind the scenes, picture is an
        > array of pointers. When you first call make_matrix, you're passing in a
        > default value, but that means that every pointer in picture points to
        > the *same* record with r=g=b=0, rather than each pointer pointing to a
        > *different* record with r=g=b=0.
        >
        > In the old code, you change each pointer in picture to point to a
        > newly-created record, but in the new code you're updating the fields of
        > the same record.

        OK, thanks.


        This is a thing in OCaml, I really hate. :(

        The only thing I really hate in OCaml....
        ...and I again and again forget this thing... :(
        ...maybe I use Arrays too seldom to remember it...

        >
        > You probably want:
        >
        > Array.init (width * height) (fun i -> {red=0;green=0;blue=0})

        OK, thanks...

        ... and in that case I also can use the correct values while doing it,
        so I can throw out the later loop completely.


        Thanks,
        Oliver
      • Remi Vanicat
        ... Then you should know that is is false. I had known (but forget) of a simple example where it was obviously false. You should also probably learn that even
        Message 3 of 28 , Jun 5, 2006
        • 0 Attachment
          2006/6/4, Oliver Bandel <oliver@...-berlin.de>:

          > Compiled to native code the whole program now runs in about
          > 6.4 seconds.
          >
          > Even more strange (but nice performance now).
          >
          > How can it be that the performance of the whole
          > program in bytecode increases while the performance of
          > a part decreases by an "optimization", but the performance
          > of the native code program is extremely faster?
          >
          > I thought that both, programs created by ocamlc and ocamlopt
          > would scale in the same way in performance, when code is changed.

          Then you should know that is is false. I had known (but forget) of a
          simple example where it was obviously false.

          You should also probably learn that even with ocamlopt, some
          modification of the code might make the code faster on intel cpu, and
          slower on amd cpu (with exactly the same generated executable). Even,
          with cache miss effect, the very same modification on a similar cpu
          might make the program faster or slower depending of the size of the
          cache.
        • Oliver Bandel
          ... Some months ago someone on the list mentioned something like if bytecode and nativ code differ in performance, then I/O is the bottleneck (slow disk or
          Message 4 of 28 , Jun 5, 2006
          • 0 Attachment
            On Mon, Jun 05, 2006 at 10:27:59AM +0200, Remi Vanicat wrote:
            > 2006/6/4, Oliver Bandel <oliver@...-berlin.de>:
            >
            > > Compiled to native code the whole program now runs in about
            > > 6.4 seconds.
            > >
            > > Even more strange (but nice performance now).
            > >
            > > How can it be that the performance of the whole
            > > program in bytecode increases while the performance of
            > > a part decreases by an "optimization", but the performance
            > > of the native code program is extremely faster?
            > >
            > > I thought that both, programs created by ocamlc and ocamlopt
            > > would scale in the same way in performance, when code is changed.
            >
            > Then you should know that is is false. I had known (but forget) of a
            > simple example where it was obviously false.
            >
            > You should also probably learn that even with ocamlopt, some
            > modification of the code might make the code faster on intel cpu, and
            > slower on amd cpu (with exactly the same generated executable). Even,
            > with cache miss effect, the very same modification on a similar cpu
            > might make the program faster or slower depending of the size of the
            > cache.

            Some months ago someone on the list mentioned something like
            "if bytecode and nativ code differ in performance, then I/O
            is the bottleneck" (slow disk or something like that).
            When reading your words here, I think that this can't hold
            in general.
            So, profiling should be what is necessary.

            I will try with ocamlopt/gprof.

            Ciao,
            Oliver
          • Oliver Bandel
            On Mon, Jun 05, 2006 at 08:07:13AM +1200, Jonathan Roewen wrote: [...] ... I have changed my code so that it now uses both: Array.init and Array.make.
            Message 5 of 28 , Jun 5, 2006
            • 0 Attachment
              On Mon, Jun 05, 2006 at 08:07:13AM +1200, Jonathan Roewen wrote:
              [...]
              > Another alternative is to make the red, blue, and green components
              > mutable, and updating their values in the array directly, rather than
              > create new records to replace the initial ones. Or, if that isn't
              > feasible, use Array.init instead of Array.make_matrix.
              >
              > let init_matrix n m f = Array.init n (fun n -> Array.init m (f n))

              I have changed my code so that it now uses both: Array.init and Array.make.

              =========
              let picture = Array.init width (fun x -> Array.make height {red=0;green=0;blue=0}) in
              (...)
              do
              let start = 3 *xi in
              picture.(xi).(y).red <- int_of_char buffer.[start];
              picture.(xi).(y).green <- int_of_char buffer.[start+1];
              picture.(xi).(y).blue <- int_of_char buffer.[start+2]
              done
              =========

              I have done this, because a second init is not necessary,
              when I refill the array with the data I read.

              In native-code its between 5 and 6 seconds now.

              Maybe more optimizations are possible,
              but it's now much better than before. :)
              And for this tool I will use native code now.

              Thanks to all. :)

              Regards,
              Oliver
            • Richard Jones
              ... I think it s more likely that they said the opposite: If the same program compiled as bytecode and native code performs the same both ways, then the
              Message 6 of 28 , Jun 5, 2006
              • 0 Attachment
                On Mon, Jun 05, 2006 at 11:15:35AM +0200, Oliver Bandel wrote:
                > Some months ago someone on the list mentioned something like
                > "if bytecode and nativ code differ in performance, then I/O
                > is the bottleneck" (slow disk or something like that).

                I think it's more likely that they said the opposite: If the same
                program compiled as bytecode and native code performs the same both
                ways, then the bottleneck is likely to be I/O. It's a simple
                consequence of Amdahl's law.

                http://en.wikipedia.org/wiki/Amdahl's_law

                Rich.

                --
                Richard Jones, CTO Merjis Ltd.
                Merjis - web marketing and technology - http://merjis.com
                Team Notepad - intranets and extranets for business - http://team-notepad.com
              • Oliver Bandel
                ... [...] didn t work?! Need both init s? Or must make and init be changed? I thought the inner array can be used like above... Today I will study the OReilly
                Message 7 of 28 , Jun 5, 2006
                • 0 Attachment
                  On Mon, Jun 05, 2006 at 11:31:33AM +0200, Oliver Bandel wrote:
                  > On Mon, Jun 05, 2006 at 08:07:13AM +1200, Jonathan Roewen wrote:
                  > [...]
                  > > Another alternative is to make the red, blue, and green components
                  > > mutable, and updating their values in the array directly, rather than
                  > > create new records to replace the initial ones. Or, if that isn't
                  > > feasible, use Array.init instead of Array.make_matrix.
                  > >
                  > > let init_matrix n m f = Array.init n (fun n -> Array.init m (f n))
                  >
                  > I have changed my code so that it now uses both: Array.init and Array.make.
                  >
                  > =========
                  > let picture = Array.init width (fun x -> Array.make height {red=0;green=0;blue=0}) in
                  [...]

                  didn't work?!
                  Need both init's?
                  Or must make and init be changed? I thought the inner array can be used like above...

                  Today I will study the OReilly book on that topic...

                  ...come back later. ;-)

                  Ciao,
                  Oliver
                • Oliver Bandel
                  ... Yes, you are right, it s this way around. ... OK, thanks. Ciao, Oliver
                  Message 8 of 28 , Jun 5, 2006
                  • 0 Attachment
                    On Mon, Jun 05, 2006 at 10:34:36AM +0100, Richard Jones wrote:
                    > On Mon, Jun 05, 2006 at 11:15:35AM +0200, Oliver Bandel wrote:
                    > > Some months ago someone on the list mentioned something like
                    > > "if bytecode and nativ code differ in performance, then I/O
                    > > is the bottleneck" (slow disk or something like that).
                    >
                    > I think it's more likely that they said the opposite: If the same
                    > program compiled as bytecode and native code performs the same both
                    > ways, then the bottleneck is likely to be I/O.

                    Yes, you are right, it's this way around.


                    > It's a simple
                    > consequence of Amdahl's law.
                    >
                    > http://en.wikipedia.org/wiki/Amdahl's_law

                    OK, thanks.


                    Ciao,
                    Oliver
                  • Jonathan Roewen
                    ... Array.make works fine with primitives .. e.g. ints, bools, chars, enums, as they re not allocated blocks. All other structures that are allocated blocks
                    Message 9 of 28 , Jun 5, 2006
                    • 0 Attachment
                      > > I have changed my code so that it now uses both: Array.init and Array.make.

                      Array.make works fine with primitives .. e.g. ints, bools, chars,
                      enums, as they're not allocated blocks. All other structures that are
                      allocated blocks need Array.init.
                    • dmitry grebeniuk
                      Shalom, Richard. RJ I think it s more likely that they said the opposite: If the same RJ program compiled as bytecode and native code performs the same both
                      Message 10 of 28 , Jun 5, 2006
                      • 0 Attachment
                        Shalom, Richard.

                        RJ> I think it's more likely that they said the opposite: If the same
                        RJ> program compiled as bytecode and native code performs the same both
                        RJ> ways, then the bottleneck is likely to be I/O.

                        In ocaml there is another possible bottleneck: heavy
                        use of pervasives' comparisons (Pervasives.compare and other
                        "'a -> 'a -> bool" functions), especially for large
                        data structures.

                        --
                        WBR,
                        dmitry mailto:gds-mlsts@...
                      • Oliver Bandel
                        ... OK, thanks for the hint; this is what I now also found in the OReilly-book. Float and all datastructures that do not fit into one machine word will be
                        Message 11 of 28 , Jun 5, 2006
                        • 0 Attachment
                          On Mon, Jun 05, 2006 at 10:03:36PM +1200, Jonathan Roewen wrote:
                          > > > I have changed my code so that it now uses both: Array.init and Array.make.
                          >
                          > Array.make works fine with primitives .. e.g. ints, bools, chars,
                          > enums, as they're not allocated blocks. All other structures that are
                          > allocated blocks need Array.init.

                          OK, thanks for the hint; this is what I now also found in the OReilly-book.
                          Float and all datastructures that do not fit into one machine word will
                          be referred to by pointers.

                          (How will that be on 64 Bit machines? Is it the same?)

                          So, if I have to use Array.init inside Array.init I doubt that will be
                          faster than using make_matrix and inserting new records.

                          Maybe my first way was ok (with immutable records)?
                          I can test this again.


                          Ciao,
                          Oliver
                        • Oliver Bandel
                          ... [...] But when Array.make is called by the Array.init each time again... ..isn t it the creating a new record every time?
                          Message 12 of 28 , Jun 5, 2006
                          • 0 Attachment
                            On Mon, Jun 05, 2006 at 10:03:36PM +1200, Jonathan Roewen wrote:
                            > > > I have changed my code so that it now uses both: Array.init and Array.make.
                            >
                            > Array.make works fine with primitives .. e.g. ints, bools, chars,
                            > enums, as they're not allocated blocks. All other structures that are
                            > allocated blocks need Array.init.
                            [...]

                            But when Array.make is called by the Array.init each time again...
                            ..isn't it the creating a new record every time?

                            ============================================================
                            # let mm4 = Array.init 3 (fun x -> Array.make 3 {x=9;y=99;z=999});;
                            val mm4 : xyz array array =
                            [|[|{x = 9; y = 99; z = 999}; {x = 9; y = 99; z = 999};
                            {x = 9; y = 99; z = 999}|];
                            [|{x = 9; y = 99; z = 999}; {x = 9; y = 99; z = 999};
                            {x = 9; y = 99; z = 999}|];
                            [|{x = 9; y = 99; z = 999}; {x = 9; y = 99; z = 999};
                            {x = 9; y = 99; z = 999}|]|]
                            # mm4.(01).(1) <- { x=12;y=55;z=99};;
                            - : unit = ()
                            # mm4;;
                            - : xyz array array =
                            [|[|{x = 9; y = 99; z = 999}; {x = 9; y = 99; z = 999};
                            {x = 9; y = 99; z = 999}|];
                            [|{x = 9; y = 99; z = 999}; {x = 12; y = 55; z = 99};
                            {x = 9; y = 99; z = 999}|];
                            [|{x = 9; y = 99; z = 999}; {x = 9; y = 99; z = 999};
                            {x = 9; y = 99; z = 999}|]|]
                            #
                            ============================================================


                            Look like that it should work...?!

                            Why not in my code?

                            Ciao,
                            Oliver
                          • Frederick Akalin
                            Floats are primitive types, too, AFAIK. (i.e., not referred to by pointers) Fred
                            Message 13 of 28 , Jun 5, 2006
                            • 0 Attachment
                              Floats are primitive types, too, AFAIK. (i.e., not referred to by pointers)

                              Fred

                              Oliver Bandel wrote:
                              > On Mon, Jun 05, 2006 at 10:03:36PM +1200, Jonathan Roewen wrote:
                              > > > > I have changed my code so that it now uses both: Array.init and Array.make.
                              > >
                              > > Array.make works fine with primitives .. e.g. ints, bools, chars,
                              > > enums, as they're not allocated blocks. All other structures that are
                              > > allocated blocks need Array.init.
                              >
                              > OK, thanks for the hint; this is what I now also found in the OReilly-book.
                              > Float and all datastructures that do not fit into one machine word will
                              > be referred to by pointers.
                              >
                              > (How will that be on 64 Bit machines? Is it the same?)
                              >
                              > So, if I have to use Array.init inside Array.init I doubt that will be
                              > faster than using make_matrix and inserting new records.
                              >
                              > Maybe my first way was ok (with immutable records)?
                              > I can test this again.
                              >
                              >
                              > Ciao,
                              > Oliver
                              >
                            • Jonathan Roewen
                              ... No, floats are double-precision, and therefore boxed. But OCaml also has an optimisation of unboxing floats in arrays (and records when all fields are
                              Message 14 of 28 , Jun 5, 2006
                              • 0 Attachment
                                > Floats are primitive types, too, AFAIK. (i.e., not referred to by pointers)

                                No, floats are double-precision, and therefore boxed. But OCaml also
                                has an optimisation of unboxing floats in arrays (and records when all
                                fields are floats).
                              • Frederick Akalin
                                Because in your code you were setting the fields of the record directly. Try going mm4.(1).(1).x
                                Message 15 of 28 , Jun 5, 2006
                                • 0 Attachment
                                  Because in your code you were setting the fields of the record
                                  directly. Try going "mm4.(1).(1).x <- 1;;".

                                  Oliver Bandel wrote:
                                  > On Mon, Jun 05, 2006 at 10:03:36PM +1200, Jonathan Roewen wrote:
                                  > > > > I have changed my code so that it now uses both: Array.init and Array.make.
                                  > >
                                  > > Array.make works fine with primitives .. e.g. ints, bools, chars,
                                  > > enums, as they're not allocated blocks. All other structures that are
                                  > > allocated blocks need Array.init.
                                  > [...]
                                  >
                                  > But when Array.make is called by the Array.init each time again...
                                  > ..isn't it the creating a new record every time?
                                  >
                                  > ============================================================
                                  > # let mm4 = Array.init 3 (fun x -> Array.make 3 {x=9;y=99;z=999});;
                                  > val mm4 : xyz array array =
                                  > [|[|{x = 9; y = 99; z = 999}; {x = 9; y = 99; z = 999};
                                  > {x = 9; y = 99; z = 999}|];
                                  > [|{x = 9; y = 99; z = 999}; {x = 9; y = 99; z = 999};
                                  > {x = 9; y = 99; z = 999}|];
                                  > [|{x = 9; y = 99; z = 999}; {x = 9; y = 99; z = 999};
                                  > {x = 9; y = 99; z = 999}|]|]
                                  > # mm4.(01).(1) <- { x=12;y=55;z=99};;
                                  > - : unit = ()
                                  > # mm4;;
                                  > - : xyz array array =
                                  > [|[|{x = 9; y = 99; z = 999}; {x = 9; y = 99; z = 999};
                                  > {x = 9; y = 99; z = 999}|];
                                  > [|{x = 9; y = 99; z = 999}; {x = 12; y = 55; z = 99};
                                  > {x = 9; y = 99; z = 999}|];
                                  > [|{x = 9; y = 99; z = 999}; {x = 9; y = 99; z = 999};
                                  > {x = 9; y = 99; z = 999}|]|]
                                  > #
                                  > ============================================================
                                  >
                                  >
                                  > Look like that it should work...?!
                                  >
                                  > Why not in my code?
                                  >
                                  > Ciao,
                                  > Oliver
                                  >
                                  >
                                  > Archives up to August 22, 2005 are also downloadable at
                                  > http://www.connettivo.net/cntprojects/ocaml_beginners/
                                  > The archives of the very official ocaml list (the seniors' one) can be found at
                                  > http://caml.inria.fr
                                  > Attachments are banned and you're asked to be polite, avoid flames etc.
                                  >
                                  >
                                  >
                                  > SPONSORED LINKS
                                  > Basic programming language
                                  > <http://groups.yahoo.com/gads?t=ms&k=Basic+programming+language&w1=Basic+programming+language&w2=Computer+programming+languages&w3=Programming+languages&w4=Java+programming+language&c=4&s=126&.sig=bnac3LCZpttb3c9FvbVU-A>
                                  > Computer programming languages
                                  > <http://groups.yahoo.com/gads?t=ms&k=Computer+programming+languages&w1=Basic+programming+language&w2=Computer+programming+languages&w3=Programming+languages&w4=Java+programming+language&c=4&s=126&.sig=1Czd2hKCO9_u4KVZQperFQ>
                                  > Programming languages
                                  > <http://groups.yahoo.com/gads?t=ms&k=Programming+languages&w1=Basic+programming+language&w2=Computer+programming+languages&w3=Programming+languages&w4=Java+programming+language&c=4&s=126&.sig=TyHGCjod4YOKITrSq1xccQ>
                                  >
                                  > Java programming language
                                  > <http://groups.yahoo.com/gads?t=ms&k=Java+programming+language&w1=Basic+programming+language&w2=Computer+programming+languages&w3=Programming+languages&w4=Java+programming+language&c=4&s=126&.sig=PZAexF9LyXpKb3HDJSlB1g>
                                  >
                                  >
                                  >
                                  > --------------------------------------------------------------------------------
                                  > YAHOO! GROUPS LINKS
                                  >
                                  > * Visit your group "ocaml_beginners
                                  > <http://groups.yahoo.com/group/ocaml_beginners>" on the web.
                                  >
                                  > * To unsubscribe from this group, send an email to:
                                  > ocaml_beginners-unsubscribe@yahoogroups.com
                                  > <mailto:ocaml_beginners-unsubscribe@yahoogroups.com?subject=Unsubscribe>
                                  >
                                  > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service
                                  > <http://docs.yahoo.com/info/terms/>.
                                  >
                                  >
                                  > --------------------------------------------------------------------------------
                                  >
                                  >
                                • Frederick Akalin
                                  I looked it up, and you re right. That seems horrible, even with some of the unboxing tricks described in
                                  Message 16 of 28 , Jun 5, 2006
                                  • 0 Attachment
                                    I looked it up, and you're right. That seems horrible, even with some
                                    of the unboxing tricks described in
                                    http://caml.inria.fr/pub/old_caml_site/ocaml/numerical.html .

                                    Fred

                                    Jonathan Roewen wrote:
                                    > > Floats are primitive types, too, AFAIK. (i.e., not referred to by pointers)
                                    >
                                    > No, floats are double-precision, and therefore boxed. But OCaml also
                                    > has an optimisation of unboxing floats in arrays (and records when all
                                    > fields are floats).
                                    >
                                    >
                                    > Archives up to August 22, 2005 are also downloadable at
                                    > http://www.connettivo.net/cntprojects/ocaml_beginners/
                                    > The archives of the very official ocaml list (the seniors' one) can be found at
                                    > http://caml.inria.fr
                                    > Attachments are banned and you're asked to be polite, avoid flames etc.
                                    >
                                    >
                                    >
                                    > SPONSORED LINKS
                                    > Basic programming language
                                    > <http://groups.yahoo.com/gads?t=ms&k=Basic+programming+language&w1=Basic+programming+language&w2=Computer+programming+languages&w3=Programming+languages&w4=Java+programming+language&c=4&s=126&.sig=bnac3LCZpttb3c9FvbVU-A>
                                    > Computer programming languages
                                    > <http://groups.yahoo.com/gads?t=ms&k=Computer+programming+languages&w1=Basic+programming+language&w2=Computer+programming+languages&w3=Programming+languages&w4=Java+programming+language&c=4&s=126&.sig=1Czd2hKCO9_u4KVZQperFQ>
                                    > Programming languages
                                    > <http://groups.yahoo.com/gads?t=ms&k=Programming+languages&w1=Basic+programming+language&w2=Computer+programming+languages&w3=Programming+languages&w4=Java+programming+language&c=4&s=126&.sig=TyHGCjod4YOKITrSq1xccQ>
                                    >
                                    > Java programming language
                                    > <http://groups.yahoo.com/gads?t=ms&k=Java+programming+language&w1=Basic+programming+language&w2=Computer+programming+languages&w3=Programming+languages&w4=Java+programming+language&c=4&s=126&.sig=PZAexF9LyXpKb3HDJSlB1g>
                                    >
                                    >
                                    >
                                    > --------------------------------------------------------------------------------
                                    > YAHOO! GROUPS LINKS
                                    >
                                    > * Visit your group "ocaml_beginners
                                    > <http://groups.yahoo.com/group/ocaml_beginners>" on the web.
                                    >
                                    > * To unsubscribe from this group, send an email to:
                                    > ocaml_beginners-unsubscribe@yahoogroups.com
                                    > <mailto:ocaml_beginners-unsubscribe@yahoogroups.com?subject=Unsubscribe>
                                    >
                                    > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service
                                    > <http://docs.yahoo.com/info/terms/>.
                                    >
                                    >
                                    > --------------------------------------------------------------------------------
                                    >
                                    >
                                  • Oliver Bandel
                                    ... Ooops, thank you; I had my original code in mind; there I replaced the record from a matrix with a record that has the correct data. Sorry, confused old
                                    Message 17 of 28 , Jun 6, 2006
                                    • 0 Attachment
                                      On Mon, Jun 05, 2006 at 10:03:49AM -0700, Frederick Akalin wrote:
                                      > Because in your code you were setting the fields of the record
                                      > directly. Try going "mm4.(1).(1).x <- 1;;".


                                      Ooops, thank you; I had my original code in mind; there I replaced the
                                      record from a matrix with a record that has the correct data.

                                      Sorry, confused old and new code...

                                      Ciao,
                                      Oliver
                                    Your message has been successfully submitted and would be delivered to recipients shortly.