Sorry, an error occurred while loading the content.

## Re: Infallible prime tests for n<2^64 and n<2^32, C code

Expand Messages
• ... This is almost the test 3.6.7 from C&P, but I can t figure out how you are choosing D=P^2-4*Q. It could be similar to Selfridge s choice or Pomerance s
Message 1 of 7 , Mar 17, 2013
--- In primenumbers@yahoogroups.com, "WarrenS" <warren.wds@...> wrote:

> } //Step 4: Lehmer test main computation:
> A0 = 2; A1 = C;
> while(j>=0){
> es = e>>j;
> if(es&1){ //unlike for modular exponentiation, both the 0 and 1 cases take same time
> A0 = mulmod64(A0, A1, n); A0 = submod64(A0, C, n);
> A1 = mulmod64(A1, A1, n); A1 = submod64(A1, 2ULL, n);
> }else{ //hence any attempt (like for mod.exp.) to go to, e.g, radix-8, is pointless.
> A1 = mulmod64(A0, A1, n); A1 = submod64(A1, C, n);
> A0 = mulmod64(A0, A0, n); A0 = submod64(A0, 2ULL, n);
> }
> j--;
> }
> if( mulmod64(C, A0, n) != mulmod64(2, A1, n) ){ return(FALSE); }
> return(TRUE); //In place of this line, it also is
> //possible to add more code here which takes almost no extra time but
> //would improve the truth rate. But for 64-bit purposes we already get zero failures
> //so to hell with it.
> }
>

This is almost the test 3.6.7 from C&P, but I can't figure out how you are choosing D=P^2-4*Q. It could be similar to Selfridge's choice or Pomerance's choice or Pari-gp's ispsedoprime() test,

Paul
• ... --it is this. There is a sequence of odd numbers, I used -3, 5, -7, 9, -11... but other sequences also work and which is the best is not clear, and I have
Message 2 of 7 , Mar 17, 2013
> This is almost the test 3.6.7 from C&P, but I can't figure out how you are choosing D=P^2-4*Q. It could be similar to Selfridge's choice or Pomerance's choice or Pari-gp's ispsedoprime() test,
> Paul Underwood

--it is this.
There is a sequence of odd numbers, I used -3, 5, -7, 9, -11...
but other sequences also work and which is the best is not clear,
and I have not tried many experiments, basically since my first experiment worked.
You might want to try other experiments to see which yields the fewest Lehmer
pseudoprimes.

I pick the first eligible D in that sequence and fix Q=2.
"Eligible" means jacobi( D*(D+8), n ) = -1.

What you are calling P^2 is then determined by D and Q, namely P^2=D+8.
Note P^2 is not necessarily a squared integer for me -- in fact usually will not be --
indeed it need not even be a square mod n.

What advantages accrue from this choice of D, P^2, and Q?
1. Q is not +1 and not -1, which both are good to avoid since empirically yield a lot of pseudoprimes.
2. Q's multiplicative inverse is obviously (n+1)/2 hence need not be computed.
3. I only ask for one jacobi condition to be satisfied for eligibility, not two.
Facts 1-3 keep the code simple and fast, not that it matters hugely.
And it turns out this is good enough to yield zero failures for n<2^64.

Were we to use basically the same algorithm for larger n>2^64, then
it becomes of interest to add a few extra bells and whistles to my Lehmer test
to decrease its number of pseudoprimes in the hope of making
the failure-free region for n, be as large as possible. This would be partly
an experimental effort. I have already done some experiments of that nature, but
I consider them too incomplete to talk about as yet. Identifying the first
failure-n for the combined test should be extremely difficult.
Even deciding whether any failure-n exist, also is likely to be difficult, and I suspect the
first will be at least 100 digits long.

The top reason all this code is so slow seems to be that my (and many) computers
have a slow as molasses "divide" instruction.

>
• ... For the fixed g==2 case, this is suboptimal, running at about 1.3x (31/24 = (3+7/8)/3) the cost it could have. And 1.3 * 1.6 = 2, as expected. Use balanced
Message 3 of 7 , Mar 18, 2013
--- On Mon, 3/18/13, WarrenS <warren.wds@...> wrote:
> Note my Lehmer test actually ran in "1.60 Selfridges"
> experimentally
...
>   while(j>0){
>     A = mulmod64(A, A, m);
>     A = mulmod64(A, A, m);
>     A = mulmod64(A, A, m);
>     j -= 3;
>     es = (e>>j)&7;
>     //prefacing this line with "if(es)" is
> optional; the "if" gives 2-3% speedup on my machine:
>
>     if(es){ A = mulmod64(A, gpow[es], m); }
>   }

For the fixed g==2 case, this is suboptimal, running at about 1.3x (31/24 = (3+7/8)/3) the cost it could have. And 1.3 * 1.6 = 2, as expected.

Use balanced residues (-p/2 .. +p/2), and your multiply/squaring can be accompanied by a doubling (multiplication by g=2) almost for free. Then at each single bit step you either do a mulmod() or a mulmoddouble().

Balanced residues work best when using the mixed floating-point/fixed-point mulmod algorithm. (Find a*b*(1/p) on the FPU, round to nearest, multiply by p in integers, subtract its bottom bits from the bottom bits of integer (a*b).)

Note, however, that if you're interested in speed, you'll want a sqmod() operation as well as a mulmod() operation, there's no point in creating unnecessary duplicates. A good C compiler might be able to perform this optimisation for you if all the functions are inline and written in a compiler-friendly way, but sometimes it's just best to keep things explicitly simple.

Phil
--
() ASCII ribbon campaign () Hopeless ribbon campaign
/\ against HTML mail /\ against gratuitous bloodshed

[stolen with permission from Daniel B. Cristofani]
• ... --I also thought of the idea of making a special modular exponentiation routine (B^x mod m) for the special case B=2, but using different ideas than
Message 4 of 7 , Mar 18, 2013
> For the fixed g==2 case, this is suboptimal,

--I also thought of the idea of making a special modular exponentiation routine
(B^x mod m) for the special case B=2, but using different ideas than Carmody suggested.
His ideas were based on a special doubling routine. Mine were based on bitshifting.

(He also muttered about both-sign integers and use of floating point, but I avoided those
ideas entirely.)

Turned out my approach is 3% faster than
Carmody's for 32-bit uint 2^x mod m based primetest, both unsurprisingly
were speedups versus the general-base routine.
Also my approach is 6% faster than Carmody's for 64 bits.
On my machine anyhow... one might be able to play with compiler and
loop unrolling and such to make his approach faster, they are close, or might be
better to hybridize both ideas.

Carmody also suggested making a special squaring routine, not just multiply.
This seems to yield a small (perhaps nonexistent?) speedup.

Hence 2^x mod m now runs in only about "0.68 selfridges" while B^x mod m
would be 1 selfridge if B>=3.

As a result, my 32-bit and 64-bit prime tests now both faster than Worley's 32-bit
test on average!
Indeed my 32-bit test currently 35% faster on average (random odd numbers) although
presumably 2X slower in worst case (primes only) than Worley's.
Your message has been successfully submitted and would be delivered to recipients shortly.