> For the fixed g==2 case, this is suboptimal,
--I also thought of the idea of making a special modular exponentiation routine
(B^x mod m) for the special case B=2, but using different ideas than Carmody suggested.
His ideas were based on a special doubling routine. Mine were based on bitshifting.
(He also muttered about both-sign integers and use of floating point, but I avoided those
Turned out my approach is 3% faster than
Carmody's for 32-bit uint 2^x mod m based primetest, both unsurprisingly
were speedups versus the general-base routine.
Also my approach is 6% faster than Carmody's for 64 bits.
On my machine anyhow... one might be able to play with compiler and
loop unrolling and such to make his approach faster, they are close, or might be
better to hybridize both ideas.
Carmody also suggested making a special squaring routine, not just multiply.
This seems to yield a small (perhaps nonexistent?) speedup.
Hence 2^x mod m now runs in only about "0.68 selfridges" while B^x mod m
would be 1 selfridge if B>=3.
As a result, my 32-bit and 64-bit prime tests now both faster than Worley's 32-bit
test on average!
Indeed my 32-bit test currently 35% faster on average (random odd numbers) although
presumably 2X slower in worst case (primes only) than Worley's.