-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1

On Wednesday 21 April 2004 17:22, you wrote:

> Please see:

>

> http://www.mersenneforum.org/showthread.php?p=26755#post26755

>

> SSE2 experts, can anything be done?

Sure there is. As GW himself says, pick a different memory layout that favors

both the multiplication and modulo operations, and rewrite the whole code to

use it (which is an Herculean undertaking, trust me). And please understand

that a new layout would be a compromise; the multiplication step would be

slowed down slightly, while the modulo step would be speeded up considerably,

while the net result would be a speedup whenever the program needed both

multiplication and modulo routines. Of course, if (as I understand) GW's

interest is primarily in Mersennes, and since a new layout would slow down

the multiplication routines, then he would have no use for this code. This

would basically be a `fork': GW would implement some improvement to his code,

and someone else would have to port it to the new codebase.

Cache/memory problems are indeed a huge issue. If the problem is as serious as

GW claims (only 16 out of 128 bytes of each cacheline are used each time),

then basically the program is memory bound instead of computation-power

bound. So given proper coding, it should make little difference whether

you're using x87 or SSE2.

This isn't an SSE2 problem, this a memory problem and ultimately a

how-much-effort-are-you-really-willing-to-throw-at-it problem.

Décio

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAhuoIFXvAfvngkOIRAg+PAJ42wJAlZUbFcg6EzraNnSwMcX63XACghJo9

lZGwqfjEOcXQcUmcD0wso04=

=hoBy

-----END PGP SIGNATURE-----