Loading ...
Sorry, an error occurred while loading the content.

Re: [PrimeNumbers] Infallible prime tests for n<2^64 and n<2^32, C code

Expand Messages
  • Phil Carmody
    ... For the fixed g==2 case, this is suboptimal, running at about 1.3x (31/24 = (3+7/8)/3) the cost it could have. And 1.3 * 1.6 = 2, as expected. Use balanced
    Message 1 of 7 , Mar 18, 2013
    • 0 Attachment
      --- On Mon, 3/18/13, WarrenS <warren.wds@...> wrote:
      > Note my Lehmer test actually ran in "1.60 Selfridges"
      > experimentally
      ...
      >   while(j>0){
      >     A = mulmod64(A, A, m);
      >     A = mulmod64(A, A, m);
      >     A = mulmod64(A, A, m);
      >     j -= 3;
      >     es = (e>>j)&7;
      >     //prefacing this line with "if(es)" is
      > optional; the "if" gives 2-3% speedup on my machine: 
      >    
      >     if(es){ A = mulmod64(A, gpow[es], m); }
      >   }

      For the fixed g==2 case, this is suboptimal, running at about 1.3x (31/24 = (3+7/8)/3) the cost it could have. And 1.3 * 1.6 = 2, as expected.

      Use balanced residues (-p/2 .. +p/2), and your multiply/squaring can be accompanied by a doubling (multiplication by g=2) almost for free. Then at each single bit step you either do a mulmod() or a mulmoddouble().

      Balanced residues work best when using the mixed floating-point/fixed-point mulmod algorithm. (Find a*b*(1/p) on the FPU, round to nearest, multiply by p in integers, subtract its bottom bits from the bottom bits of integer (a*b).)

      Note, however, that if you're interested in speed, you'll want a sqmod() operation as well as a mulmod() operation, there's no point in creating unnecessary duplicates. A good C compiler might be able to perform this optimisation for you if all the functions are inline and written in a compiler-friendly way, but sometimes it's just best to keep things explicitly simple.

      Phil
      --
      () ASCII ribbon campaign () Hopeless ribbon campaign
      /\ against HTML mail /\ against gratuitous bloodshed

      [stolen with permission from Daniel B. Cristofani]
    • WarrenS
      ... --I also thought of the idea of making a special modular exponentiation routine (B^x mod m) for the special case B=2, but using different ideas than
      Message 2 of 7 , Mar 18, 2013
      • 0 Attachment
        > For the fixed g==2 case, this is suboptimal,

        --I also thought of the idea of making a special modular exponentiation routine
        (B^x mod m) for the special case B=2, but using different ideas than Carmody suggested.
        His ideas were based on a special doubling routine. Mine were based on bitshifting.

        (He also muttered about both-sign integers and use of floating point, but I avoided those
        ideas entirely.)

        Turned out my approach is 3% faster than
        Carmody's for 32-bit uint 2^x mod m based primetest, both unsurprisingly
        were speedups versus the general-base routine.
        Also my approach is 6% faster than Carmody's for 64 bits.
        On my machine anyhow... one might be able to play with compiler and
        loop unrolling and such to make his approach faster, they are close, or might be
        better to hybridize both ideas.

        Carmody also suggested making a special squaring routine, not just multiply.
        This seems to yield a small (perhaps nonexistent?) speedup.

        Hence 2^x mod m now runs in only about "0.68 selfridges" while B^x mod m
        would be 1 selfridge if B>=3.

        As a result, my 32-bit and 64-bit prime tests now both faster than Worley's 32-bit
        test on average!
        Indeed my 32-bit test currently 35% faster on average (random odd numbers) although
        presumably 2X slower in worst case (primes only) than Worley's.
      Your message has been successfully submitted and would be delivered to recipients shortly.