- Hi Johan,

you wrote:

[snip]

> Let's look at just assembly for now. We start by creating an 1e5 x 1e5

On 32-bit systems the upper dimensions for sparse_matrix are 65535 x 65535

> sparse matrix. We then successively insert 1e5 elements in the diagonal.

> What happens is that the insertion speed is linear for the first ~50000

> elements, and then grinds to a halt. The initial insertion speed is

> somewhere at 1e5 elements/s, after 50000 elements it sharply falls to

> 1000 elements/s (to compare, our naive implementation gets 1e6 elements /

> s).

due to the internal mapping of (i, j) -> (i * size2+j) (or (i + j * size1).

I've added a corresponding check to the debug version.

For larger dimension consider to use compressed_matrix (FORTRAN CRS/CCS

format) or coordinate_matrix (FORTRAN COO format) please. I'm also assuming

you're checking the boost_1_29_0 release, so you'll probably have to look

into the current CVS version for these (many changes).

> This is quite bizarre.

Using the experimental (means undocumented ;-( axpy_prod() functions one

>

> With an 1e6 x 1e6 matrix, the insertion speed is smoother, but slow

> throughout (on the order 1000 elements / s).

>

> I've tested this on two different computers, one dual Athlon and one PII

> laptop, and the result is identical (aside from the absolute speed

> numbers). Memory is not at all full, so that can't be an issue.

>

> The code for this simple test is at the end.

>

> The compiler used was g++-2.95 (in Debian) and Boost 1.29 (also Debian).

>

> I've also observed a quadratic time complexity (in the non-zero elements)

> in sparse matrix-vector multiplication. I think this has been brought up

> before though.

achieves the correct complexity.

> We've also tested MTL, and while it doesn't produce these kinds of wild

Alexei Novakov's benchmarks show an overhead of around 100% for the better

> irregularities, the performance is a factor 2 or 3 worse than our naive

> implementation, and the memory usage is a factor 1.5-2 worse.

>

> This makes me question the claim that genericity does not add overhead (in

> practice). 10-20% overhead is acceptable, but when we have 100-200%

> overhead, both in performance and memory, then it makes it impossible to

> justify its use.

ublas sparse matrix vector multiply implementations currently. He proposed

an improvement of sparse matrix iterators already, which we'll tackle

immediately after the upcoming 1_30_0 release.

Thanks for your feedback,

Joerg - Hi Johan,

you wrote:

> Thanks for all your info. I've run the tests with a Boost from CVS (from

Kudos to the guys on groups.yahoo.com/group/ublas-dev. Without them sparse

> january 31st), compressed_matrix and axpy_prod, and the results give

> roughly the same speed as our implementation, and ca. 30% better memory

> efficiency. Great!

matrices would be as bad as in boost_1_29_0.

> The -DNDEBUG flag also seems critical, without it

Oh yes. That's my paranoia. Without -DNDEBUG defined ublas is in debug mode

> performance is terrible (quadratic).

and even double checks sparse matrix computations with a dense control

computation. You could customize this using the BOOST_UBLAS_TYPE_CHECK

preprocessor symbol.

> Alexei's proposed optimizations seem interesting. I tried the axpy_prod

Yup. I didn't post the necessary dispatch logic. I'll later update Boost CVS

> you provided, but it didn't give any significant change. I trust your

> figures however.

with my current version.

> I will propose that we start using ublas as soon as the linear complexity

Thanks,

> functions appear in the stable branch.

>

> I provide our benchmark below for reference (with the timing calls, and

> other dependencies stripped out):

Joerg