On Tue, 03 April 2001, "Paul Leyland" wrote:

> If the number is very large, it's usually worthwhile reducing it modulo

> a few small numbers and checking to see whether that is a quadratic

> residue. As a simple example, we know instantly that

> 89798739759179673149047609731096790317607 is not a perfect square

> because the last two digits are 07 and only 00, 25, e1, e4, o6 and e9

> (where e is an even digit and o is an odd digit) are squares modulo 100.

>

> Using the bases 64, 63, 65 and 11 (in that order) will remove 709/715

> (i.e. over 99%) of non-square candidates with four remainder operations

> and four table look-ups and no heavy square root computations. The

> first remainder might be easy to implemnent with a logical AND.

Yup, GMP uses 256 rather than 64. In a base-2 representation of your numbers this is simply looking at the lowest byte.

63 is 3^2*7

65 is 5*13

The reason these were chosen is that they

a) share no common factors

b) have small factors (each makes 2 independent judgements)

c) are calculable using shift and add.

However, I'd deviate slightly from Paul's recommendation, if your processor is sufficiently capable (almost anything past an 8086 should do).

63*65 = (2^6-1)*(2^6+1) = 2^12-1 = 4095

This number too has the property that the residue can be calculated with just shifts and adds.

X (mod 4095) == ([_X/4096_] + (X rem 4096)) (mod 4095)

(with Knuthian [_ _] notation for 'floor' or 'round down', and

'rem' = remainder)

e.g. for a bignum B with hex value ....PQRSTUVW

B (mod 4095) == (....PQRST*4096 + UVW) (mod 4095)

== (....PQRST*1 + UVW) (mod 4095)

iteratively == (... + .PQ + RST + UVW) (mod 4095)

so basically, you keep accumulating 12 bit chunks of the number

(UVW, RST, .PQ, ... are all 12 bits) in a running total.

Note, however, that as you're summing you must make sure you don't overflow.

This is almost always 'memory bound' (i.e the processor can do the shifts and adds faster than the memory can provide the data) on a modern CPU. So basically the optimisation is to sum the 'best' modulus that will fit into your registers nicely.

'best' here means 'has the largest number of discrete odd factors. Pauls examples have 2. Mine has 4.

That _doesn't_ mean my method will be faster. If you can implement Paul's modulae ~25% faster than my modulae, then do Paul's method, as the 63 will give you a reject answer quicker than me 75% of the time. The other 25% of the time he takes twice as long.

Expectation(6-bit modulae) = .75*t6 + .25*(2*t6) = 1.25*t6

Expectation(12-bit modulus) = t12

Paul's modulae immediately give you answers which can be checked using a 64-bit (or 64byte) lookup table (in the mod 65 case don't check to see if 0 is valid, it's obviously a possible square).

My modulus leaves you with a 12 bit value, which you don't want to use in a lookup table. However, you can apply Paul's modulae to my residue in constant time, and end up with the same answer as Paul.

Some time in the past I did create a list of all 2^x+/-1 which are useful for calculating residues, and post it to this list (or was it the primeform list?). However, between {63,65} and {4095}, I'm sure that you have enough to get started, and maybe you can experiment with other modulae too.

Phil

Mathematics should not have to involve martyrdom;

Support Eric Weisstein, see

http://mathworld.wolfram.com
Find the best deals on the web at AltaVista Shopping!

http://www.shopping.altavista.com