> For the fixed g==2 case, this is suboptimal,

--I also thought of the idea of making a special modular exponentiation routine

(B^x mod m) for the special case B=2, but using different ideas than Carmody suggested.

His ideas were based on a special doubling routine. Mine were based on bitshifting.

(He also muttered about both-sign integers and use of floating point, but I avoided those

ideas entirely.)

Turned out my approach is 3% faster than

Carmody's for 32-bit uint 2^x mod m based primetest, both unsurprisingly

were speedups versus the general-base routine.

Also my approach is 6% faster than Carmody's for 64 bits.

On my machine anyhow... one might be able to play with compiler and

loop unrolling and such to make his approach faster, they are close, or might be

better to hybridize both ideas.

Carmody also suggested making a special squaring routine, not just multiply.

This seems to yield a small (perhaps nonexistent?) speedup.

Hence 2^x mod m now runs in only about "0.68 selfridges" while B^x mod m

would be 1 selfridge if B>=3.

As a result, my 32-bit and 64-bit prime tests now both faster than Worley's 32-bit

test on average!

Indeed my 32-bit test currently 35% faster on average (random odd numbers) although

presumably 2X slower in worst case (primes only) than Worley's.