Loading ...
Sorry, an error occurred while loading the content.

Re: [Boost-Users] NDEBUG, affects floating point math?

Expand Messages
  • jhr.walter@t-online.de
    Hi Stephen, ... about ... is ... OK, I ve noticed that you use something like C = prod (A, B); As far as I understand, your matrices are alias free, so you
    Message 1 of 6 , Mar 1, 2003
    • 0 Attachment
      Hi Stephen,

      you wrote:

      > > > I wrote a small piece of C code to calculate a symmetric-matrix by
      > > > row-matrix product by directly accessing the memory (which btw is
      about
      > > > 1000% faster than ublas's generated code with gcc 3.2.3 and -O3, why
      is
      > > > ublas so slow?)
      > >
      > > If you have *not* defined NDEBUG, ublas uses many internal consistency
      > > checks and usually no expression templates.
      >
      > I was referring to the case when NDEBUG is defined and comparing
      > symmetric_matrix by general_matrix products. uBLAS is quite slow with this
      > operation. However, I compared my symmetric_matrix by
      > general_matrix_transpose product and ublas was only 3-4 times slower than
      > the optimized C, not too bad.

      OK, I've noticed that you use something like

      C = prod (A, B);

      As far as I understand, your matrices are alias free, so you could try

      C.assign (prod (A, B));

      instead.

      > > > And when I compile with -DNDEBUG the results differ
      > > > by around +/- 1e-16, but when its not defined the results match almost
      > > > perfectly.
      > >
      > > You've probably hit a bug, may be an old one (please consider to check
      > > against CVS, too). Could you please send or post a small sample, if the
      > > problem persists?
      >
      > I think I know the source of the problem now, and I'm not sure if it's one
      > that boost should handle or not. I attached my test code at the end of
      > this email, only 2.7k, I hope it isn't inappropriate for the list.
      >
      > 387 floating point registers are stored as 80 bits internally and rounded
      to
      > 64 bits when assigned to memory. SSE2 floating-point registers are stored
      in
      > the native 64-bit format. The following table lists the compiler flags
      used
      > and the accumulated absolute difference between the two products over 100
      > randomly generated matrices.
      >
      > gcc version 3.2.3 20030221 (Debian prerelease)
      >
      > no compiler flags
      > diff sum [5,3]((0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0))
      >
      > -DNDEBUG
      > diff sum [5,3]((0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0))
      >
      > -O1
      > diff sum
      >
      [5,3]((2.91694e-15,2.65348e-15,2.7027e-15),(4.4049e-15,3.29251e-15,3.44863e-
      15),(4.10436e-15,4.14816e-15,3.24393e-15),(3.57873e-15,3.75741e-15,2.64068e-
      15),(5.03091e-15,2.79637e-15,4.06099e-15))
      >
      > -O1 -DNDEBUG
      > diff sum [5,3]((0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0))
      >
      > -O2
      > diff sum
      >
      [5,3]((4.37584e-15,4.8659e-15,3.61516e-15),(4.61783e-15,3.77736e-15,4.04169e
      -15),(4.46691e-15,5.10876e-15,5.00576e-15),(3.64118e-15,4.91968e-15,5.17641e
      -15),(3.97273e-15,4.25528e-15,3.75394e-15))
      >
      > -O2 -DNDEBUG
      > diff sum [5,3]((0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0))
      >
      > -O3
      > diff sum
      >
      [5,3]((4.96131e-15,3.88968e-15,4.34245e-15),(5.07754e-15,5.91974e-15,4.19109
      e-15),(4.89539e-15,4.29279e-15,5.06344e-15),(4.79781e-15,4.66033e-15,3.77996
      e-15),(4.75227e-15,4.72539e-15,3.33414e-15))
      >
      > -O3 -DNDEBUG
      > diff sum
      >
      [5,3]((4.60569e-15,2.67928e-15,2.7079e-15),(3.21054e-15,2.98112e-15,2.61249e
      -15),(3.91701e-15,3.60822e-15,2.93515e-15),(3.23006e-15,3.42434e-15,3.53363e
      -15),(3.18105e-15,2.77035e-15,3.67198e-15))
      >
      > -O1 -mfpmath=sse,387 -march=pentium4 -msse2
      > diff sum [5,3]((0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0))
      >
      > -O1 -mfpmath=sse,387 -march=pentium4 -msse2 -DNDEBUG
      > diff sum [5,3]((0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0))
      >
      > -O2 -mfpmath=sse,387 -march=pentium4 -msse2
      > diff sum [5,3]((0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0))
      >
      > -O2 -mfpmath=sse,387 -march=pentium4 -msse2 -DNDEBUG
      > diff sum [5,3]((0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0))
      >
      > -O3 -mfpmath=sse,387 -march=pentium4 -msse2
      > diff sum [5,3]((0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0))
      >
      > -O3 -mfpmath=sse,387 -march=pentium4 -msse2 -DNDEBUG
      > diff sum
      [5,3]((3.62991e-15,3.74006e-15,4.62824e-15),(4.00548e-15,4.5311e-15,3.60129e
      -15),(3.48332e-15,5.11743e-15,3.08174e-15),(3.49373e-15,4.05752e-15,4.2865e-
      15),(4.81169e-15,4.51549e-15,4.18589e-15))
      >
      > Is this perhaps a compiler bug?

      At least an issue ;-) Building your program with Intel 7.0 (boost default
      settings) results in

      diff sum [5,3]((0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0))

      for both debug and release builds. Setting the magical compiler
      switch -ffloat-store seems to heal the problem for GCC, too.

      > Part of the problem actually comes from
      > whether the products are accumulated into a register by the compiler, or
      > whether they are accumulated into memory. If accumulated into a 387
      > register it has higher-precision before it is rounded back down to 64-bits
      to be
      > stored into memory. If the compiler generates code to accumulate the
      > products directly into memory then the results of each individual product
      > are rounded to 64-bits at every iteration and the result will be slightly
      > less accurate. Now, when the -msse2 flag is used, the registers are 64-bit
      > only and thus no accuracy is gained by using them, only speed.

      Thanks,

      Joerg
    • jhr.walter@t-online.de
      ... But I needed to use double *rdata = new double (n); ... delete [] rdata; instead of the VLA(?) double rdata[n]; I didn t even know, that GCC accepts that
      Message 2 of 6 , Mar 1, 2003
      • 0 Attachment
        One correction:

        > Building your program with Intel 7.0 (boost default
        > settings) results in
        >
        > diff sum [5,3]((0,0,0),(0,0,0),(0,0,0),(0,0,0),(0,0,0))
        >
        > for both debug and release builds.

        But I needed to use

        double *rdata = new double (n);
        ...
        delete [] rdata;

        instead of the VLA(?)

        double rdata[n];

        I didn't even know, that GCC accepts that already (in a C++ program ;-).

        Regards,

        Joerg
      • Thomas Maeder
        ... This dynamically creates one double object and initializes it with the value n and uses its address to intialize rdata. ... And this tells the compiler to
        Message 3 of 6 , Mar 1, 2003
        • 0 Attachment
          jhr.walter@... writes:

          > But I needed to use
          >
          > double *rdata = new double (n);

          This dynamically creates one double object and initializes it with the value
          n and uses its address to intialize rdata.


          > ...
          > delete [] rdata;

          And this tells the compiler to deallocate the dynamically allocated array
          rdata points to. Your program has undefined behavior.

          I haven't followed this thread, but the code you posted here definitely looks
          very fishy.
        • jhr.walter@t-online.de
          Hi Thomas, ... value ... Here s the original code: double *rdata = new double[n]; // stores non-contigous part of the current row of A ... looks ... Thanks for
          Message 4 of 6 , Mar 4, 2003
          • 0 Attachment
            Hi Thomas,

            you wrote:

            > > But I needed to use
            > >
            > > double *rdata = new double (n);
            >
            > This dynamically creates one double object and initializes it with the
            value
            > n and uses its address to intialize rdata.

            Here's the original code:

            double *rdata = new double[n]; // stores non-contigous part of the current
            row of A

            > > ...
            > > delete [] rdata;
            >
            > And this tells the compiler to deallocate the dynamically allocated array
            > rdata points to. Your program has undefined behavior.
            >
            > I haven't followed this thread, but the code you posted here definitely
            looks
            > very fishy.

            Thanks for the correction and sorry for the confusion,

            Joerg
          Your message has been successfully submitted and would be delivered to recipients shortly.