fixing SSE2 in LLR/PRP
- Please see:
SSE2 experts, can anything be done?
- -----BEGIN PGP SIGNED MESSAGE-----
On Wednesday 21 April 2004 17:22, you wrote:
> Please see:
> SSE2 experts, can anything be done?
Sure there is. As GW himself says, pick a different memory layout that favors
both the multiplication and modulo operations, and rewrite the whole code to
use it (which is an Herculean undertaking, trust me). And please understand
that a new layout would be a compromise; the multiplication step would be
slowed down slightly, while the modulo step would be speeded up considerably,
while the net result would be a speedup whenever the program needed both
multiplication and modulo routines. Of course, if (as I understand) GW's
interest is primarily in Mersennes, and since a new layout would slow down
the multiplication routines, then he would have no use for this code. This
would basically be a `fork': GW would implement some improvement to his code,
and someone else would have to port it to the new codebase.
Cache/memory problems are indeed a huge issue. If the problem is as serious as
GW claims (only 16 out of 128 bytes of each cacheline are used each time),
then basically the program is memory bound instead of computation-power
bound. So given proper coding, it should make little difference whether
you're using x87 or SSE2.
This isn't an SSE2 problem, this a memory problem and ultimately a
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
-----END PGP SIGNATURE-----