Loading ...
Sorry, an error occurred while loading the content.

fixing SSE2 in LLR/PRP

Expand Messages
  • William Garnett III
    Please see: http://www.mersenneforum.org/showthread.php?p=26755#post26755 SSE2 experts, can anything be done? regards, william
    Message 1 of 2 , Apr 21, 2004
    • 0 Attachment
      Please see:

      http://www.mersenneforum.org/showthread.php?p=26755#post26755

      SSE2 experts, can anything be done?

      regards,
      william
    • Décio Luiz Gazzoni Filho
      ... Hash: SHA1 ... Sure there is. As GW himself says, pick a different memory layout that favors both the multiplication and modulo operations, and rewrite the
      Message 2 of 2 , Apr 21, 2004
      • 0 Attachment
        -----BEGIN PGP SIGNED MESSAGE-----
        Hash: SHA1

        On Wednesday 21 April 2004 17:22, you wrote:
        > Please see:
        >
        > http://www.mersenneforum.org/showthread.php?p=26755#post26755
        >
        > SSE2 experts, can anything be done?

        Sure there is. As GW himself says, pick a different memory layout that favors
        both the multiplication and modulo operations, and rewrite the whole code to
        use it (which is an Herculean undertaking, trust me). And please understand
        that a new layout would be a compromise; the multiplication step would be
        slowed down slightly, while the modulo step would be speeded up considerably,
        while the net result would be a speedup whenever the program needed both
        multiplication and modulo routines. Of course, if (as I understand) GW's
        interest is primarily in Mersennes, and since a new layout would slow down
        the multiplication routines, then he would have no use for this code. This
        would basically be a `fork': GW would implement some improvement to his code,
        and someone else would have to port it to the new codebase.

        Cache/memory problems are indeed a huge issue. If the problem is as serious as
        GW claims (only 16 out of 128 bytes of each cacheline are used each time),
        then basically the program is memory bound instead of computation-power
        bound. So given proper coding, it should make little difference whether
        you're using x87 or SSE2.

        This isn't an SSE2 problem, this a memory problem and ultimately a
        how-much-effort-are-you-really-willing-to-throw-at-it problem.

        Décio
        -----BEGIN PGP SIGNATURE-----
        Version: GnuPG v1.2.4 (GNU/Linux)

        iD8DBQFAhuoIFXvAfvngkOIRAg+PAJ42wJAlZUbFcg6EzraNnSwMcX63XACghJo9
        lZGwqfjEOcXQcUmcD0wso04=
        =hoBy
        -----END PGP SIGNATURE-----
      Your message has been successfully submitted and would be delivered to recipients shortly.