Re: [boost] [ublas] Weakness in large sparse matrices
- Hi Johan,
> Let's look at just assembly for now. We start by creating an 1e5 x 1e5On 32-bit systems the upper dimensions for sparse_matrix are 65535 x 65535
> sparse matrix. We then successively insert 1e5 elements in the diagonal.
> What happens is that the insertion speed is linear for the first ~50000
> elements, and then grinds to a halt. The initial insertion speed is
> somewhere at 1e5 elements/s, after 50000 elements it sharply falls to
> 1000 elements/s (to compare, our naive implementation gets 1e6 elements /
due to the internal mapping of (i, j) -> (i * size2+j) (or (i + j * size1).
I've added a corresponding check to the debug version.
For larger dimension consider to use compressed_matrix (FORTRAN CRS/CCS
format) or coordinate_matrix (FORTRAN COO format) please. I'm also assuming
you're checking the boost_1_29_0 release, so you'll probably have to look
into the current CVS version for these (many changes).
> This is quite bizarre.Using the experimental (means undocumented ;-( axpy_prod() functions one
> With an 1e6 x 1e6 matrix, the insertion speed is smoother, but slow
> throughout (on the order 1000 elements / s).
> I've tested this on two different computers, one dual Athlon and one PII
> laptop, and the result is identical (aside from the absolute speed
> numbers). Memory is not at all full, so that can't be an issue.
> The code for this simple test is at the end.
> The compiler used was g++-2.95 (in Debian) and Boost 1.29 (also Debian).
> I've also observed a quadratic time complexity (in the non-zero elements)
> in sparse matrix-vector multiplication. I think this has been brought up
> before though.
achieves the correct complexity.
> We've also tested MTL, and while it doesn't produce these kinds of wildAlexei Novakov's benchmarks show an overhead of around 100% for the better
> irregularities, the performance is a factor 2 or 3 worse than our naive
> implementation, and the memory usage is a factor 1.5-2 worse.
> This makes me question the claim that genericity does not add overhead (in
> practice). 10-20% overhead is acceptable, but when we have 100-200%
> overhead, both in performance and memory, then it makes it impossible to
> justify its use.
ublas sparse matrix vector multiply implementations currently. He proposed
an improvement of sparse matrix iterators already, which we'll tackle
immediately after the upcoming 1_30_0 release.
Thanks for your feedback,
- Hi Johan,
> Thanks for all your info. I've run the tests with a Boost from CVS (fromKudos to the guys on groups.yahoo.com/group/ublas-dev. Without them sparse
> january 31st), compressed_matrix and axpy_prod, and the results give
> roughly the same speed as our implementation, and ca. 30% better memory
> efficiency. Great!
matrices would be as bad as in boost_1_29_0.
> The -DNDEBUG flag also seems critical, without itOh yes. That's my paranoia. Without -DNDEBUG defined ublas is in debug mode
> performance is terrible (quadratic).
and even double checks sparse matrix computations with a dense control
computation. You could customize this using the BOOST_UBLAS_TYPE_CHECK
> Alexei's proposed optimizations seem interesting. I tried the axpy_prodYup. I didn't post the necessary dispatch logic. I'll later update Boost CVS
> you provided, but it didn't give any significant change. I trust your
> figures however.
with my current version.
> I will propose that we start using ublas as soon as the linear complexityThanks,
> functions appear in the stable branch.
> I provide our benchmark below for reference (with the timing calls, and
> other dependencies stripped out):