Loading ...
Sorry, an error occurred while loading the content.

Re: [boost] [ublas] Weakness in large sparse matrices

Expand Messages
  • jhr.walter@t-online.de
    Hi Johan, you wrote: [snip] ... On 32-bit systems the upper dimensions for sparse_matrix are 65535 x 65535 due to the internal mapping of (i, j) - (i *
    Message 1 of 2 , Feb 8, 2003
    • 0 Attachment
      Hi Johan,

      you wrote:

      [snip]

      > Let's look at just assembly for now. We start by creating an 1e5 x 1e5
      > sparse matrix. We then successively insert 1e5 elements in the diagonal.
      > What happens is that the insertion speed is linear for the first ~50000
      > elements, and then grinds to a halt. The initial insertion speed is
      > somewhere at 1e5 elements/s, after 50000 elements it sharply falls to
      > 1000 elements/s (to compare, our naive implementation gets 1e6 elements /
      > s).

      On 32-bit systems the upper dimensions for sparse_matrix are 65535 x 65535
      due to the internal mapping of (i, j) -> (i * size2+j) (or (i + j * size1).
      I've added a corresponding check to the debug version.

      For larger dimension consider to use compressed_matrix (FORTRAN CRS/CCS
      format) or coordinate_matrix (FORTRAN COO format) please. I'm also assuming
      you're checking the boost_1_29_0 release, so you'll probably have to look
      into the current CVS version for these (many changes).

      > This is quite bizarre.
      >
      > With an 1e6 x 1e6 matrix, the insertion speed is smoother, but slow
      > throughout (on the order 1000 elements / s).
      >
      > I've tested this on two different computers, one dual Athlon and one PII
      > laptop, and the result is identical (aside from the absolute speed
      > numbers). Memory is not at all full, so that can't be an issue.
      >
      > The code for this simple test is at the end.
      >
      > The compiler used was g++-2.95 (in Debian) and Boost 1.29 (also Debian).
      >
      > I've also observed a quadratic time complexity (in the non-zero elements)
      > in sparse matrix-vector multiplication. I think this has been brought up
      > before though.

      Using the experimental (means undocumented ;-( axpy_prod() functions one
      achieves the correct complexity.

      > We've also tested MTL, and while it doesn't produce these kinds of wild
      > irregularities, the performance is a factor 2 or 3 worse than our naive
      > implementation, and the memory usage is a factor 1.5-2 worse.
      >
      > This makes me question the claim that genericity does not add overhead (in
      > practice). 10-20% overhead is acceptable, but when we have 100-200%
      > overhead, both in performance and memory, then it makes it impossible to
      > justify its use.

      Alexei Novakov's benchmarks show an overhead of around 100% for the better
      ublas sparse matrix vector multiply implementations currently. He proposed
      an improvement of sparse matrix iterators already, which we'll tackle
      immediately after the upcoming 1_30_0 release.

      Thanks for your feedback,

      Joerg
    • jhr.walter@t-online.de
      Hi Johan, ... Kudos to the guys on groups.yahoo.com/group/ublas-dev. Without them sparse matrices would be as bad as in boost_1_29_0. ... Oh yes. That s my
      Message 2 of 2 , Feb 17, 2003
      • 0 Attachment
        Hi Johan,

        you wrote:

        > Thanks for all your info. I've run the tests with a Boost from CVS (from
        > january 31st), compressed_matrix and axpy_prod, and the results give
        > roughly the same speed as our implementation, and ca. 30% better memory
        > efficiency. Great!

        Kudos to the guys on groups.yahoo.com/group/ublas-dev. Without them sparse
        matrices would be as bad as in boost_1_29_0.

        > The -DNDEBUG flag also seems critical, without it
        > performance is terrible (quadratic).

        Oh yes. That's my paranoia. Without -DNDEBUG defined ublas is in debug mode
        and even double checks sparse matrix computations with a dense control
        computation. You could customize this using the BOOST_UBLAS_TYPE_CHECK
        preprocessor symbol.

        > Alexei's proposed optimizations seem interesting. I tried the axpy_prod
        > you provided, but it didn't give any significant change. I trust your
        > figures however.

        Yup. I didn't post the necessary dispatch logic. I'll later update Boost CVS
        with my current version.

        > I will propose that we start using ublas as soon as the linear complexity
        > functions appear in the stable branch.
        >
        > I provide our benchmark below for reference (with the timing calls, and
        > other dependencies stripped out):

        Thanks,

        Joerg
      Your message has been successfully submitted and would be delivered to recipients shortly.