- Jan 1, 2012In this post I explain R.S.Lehman's factorization algorithm, provide a proof and runtime analysis, and compare with SQUFOF.

1. The algorithm. Let T>0 be a tunable constant. [I will prove validity if T>=1.]

Given odd integer N>2, finds nontrivial factor (or proves N prime).

B = floor(N^(1/3) * T);

Trial divide N up to bound B.

if(factor found) return(Factor);

k=0;

LoopForever{

k++; if(k*T^3 > B){ return(N is prime); }

if(k even){ m=2; r=1; }else{ m=4; r=(k+N)%4; }

for(all integers a with 0<=a*a-4*k*N<=B*B and with a%m==r){

c = a*a-4*k*N;

Test whether c is square;

if(issquare){

b=sqrt(c);

return( gcd(a+b, N) );

}

}

}

2. Proof of validity (mostly due to Don Zagier, with some improvements by me):

If we get past trial division up to N^(1/3) then N has <=2 prime factors.

If 0<T<1 however then N could have 3 or more prime factors (which is also ok).

If N>2 is prime then c cannot ever be square:

indeed if 4*k*N divides a^2-b^2 then N does and

if N is PRIME then N must divide either a-b or a+b hence N<=a+b.

We know a*a<4*N^(4/3)/T^2+T^2*N^(2/3)

and we know b<=a hence this cannot happen

if N is large enough with T fixed.

With T=1 it suffices to check by hand validity for

N=3,5,7,11,13,17,19,23 and the general argument

then works for all N>=29.

If we decrease T below 1 and get too small then we may fail for some N;

I am not sure what the minimum allowable T would be.

So N=p*q with p<=q and p prime. We have

T*N^(1/3) <= p <= q = N/p <= T^(-1)*N^(2/3)

1 <= q/p <= T^(-2)*N^(1/3)

Now consider the regular continued fraction (RCF) for q/p.

Let r/s and r'/s' be two successive convergents

such that r*s <= T^(-2)*N^(1/3) < r'*s' <= N.

Then s/r and s'/r' also are successive convergents to p/q.

|q/p - r/s| <= |r'/s' - r/s| = 1/(s*s') hence |p*r-q*s| <= p/s'

|p/q - s/r| <= |s'/r' - s/r| = 1/(r*r') hence |p*r-q*s| <= q/r'

hence geometric mean shows

|p*r-q*s| <= sqrt( (p*q)/(r'*s') ) = sqrt(N/(r'*s')) <= N^(1/3) * T

and with

k=r*s

a=p*s+q*r

we have

a*a-4*k*N=(p*s-q*r)^2 <= N^(2/3) * T^2 = B*B

k*T^3 <= T*N^(1/3) = B

hence the squareness test succeeds (the congruences for a are

easily checked by considerations mod 4).

QED.

3. Runtime analysis.

For each value of k there are at most

(1/2) * (1 + sqrt(4*k*N + T^2 * N^(2/3)) - sqrt(4*k*N))

values of a to try, which is both upper bounded by (using

or monotonicity of sqrt and its the derivative)

and asymptotically equal to

1/2 + T^2*N^(1/6)*k^(-1/2)/8, [side note: Lehman agrees]

and on average over both k-parities only 3/8 + T^2*N^(1/6)*k^(-1/2)*3/32.

And since sum(k<=x)k^(-1/2)<=2*(x+1)^(1/2) we see the total number of squareness tests should be <=(1/8)*T*N^(1/3) + N^(1/3)/(2*T^2) + 1 even in the worst case.

And should be <=(3/16)*T*N^(1/3) + (3/8)*N^(1/3)/T^2 + 1 averaged over hard N.

The latter is minimized by choosing T=2^(2/3)=1.5874 getting #squareness tests<=0.4465*N^(1/3) + 1.

The former minimized by same T but now getting

#squareness tests (averaged)<=0.2977*N^(1/3) + 1.

These are rigorous upper bounds on the number of squareness tests.

Average time (even averaged over hard N meaning N for which the trial division stage yields nothing) ought to be at least a factor 2 faster than these

worst case bounds.

If T=1 (or any fixed T) then the trial division stage should be faster

by a log or loglog factor than the N^(1/3) time for

the square-testing stage (since primes are sparse in the integers by

a log factor) so is neglectible in comparison.

It might not really be neglectible since N is not large enough for logN or loglogN

to be large enough for neglectibility, though, in which case

multiply the runtime estimate by maybe 1.5.

4. Programming tricks:

A. In the inner loop c is bounded between 0 and T^2*N^(2/3)

and can be updated additively by adding stepsize*(anew+aold) to it

when we update a to anew=a+stepsize. This saves a multiplicaiton in

the inner loop.

B. The square-testing can use modular info to avoid computing sqrt most of the time.

C. It helps on typical N to try the k with many divisors first.

Any ordering of the k's is valid, but if you try k's with large numbers of

divisors first, then you'll usually factor N sooner.

Lehman argues that k with D divisors has likelyhood proportional to D of

yielding a factor on any particular squarity test.

The average order of the #divisors(N) function is lnN so one

might conjecture that Lehman average time would be sped up

by a log factor by such tricks.

Lehman suggests one way to accomplish such a favorable ordering, but I

think a better programmer could do a better job of this than Lehman did.

5. Comparison with Shanks SQUFOF.

Shanks in unpublished work completed by J.Gower & S.Wagstaff

argues that for average "hard" N (meaning N=product of two primes and N=3 mod 4)

the "average number of forms SQUFOF must examine before finding proper

square form" is 1.775*N^(1/4) asymptotically.

CROSSOVER: The N which causes

#ShanksSQUFOF and #Lehman average squareness tests to be equal,

is N=2^(42.9).

For larger N, SQUFOF should win; for smaller N, Lehman should win.

However, "runtime" and "#squareness tests" are not exactly the same thing.

Shanks seems to have a nastier inner loop than Lehman since it has more operations and

includes among them, a division. On the brighter side, Shanks can live with narrower words, most numbers are N^(1/2) at most with Shanks, while with Lehman it is N^(2/3).

If we were to assume Shanks was 4 times nastier at same number of squareness tests, then the crossover point would move to N=2^(66.9).

The true crossover then (I guess) - for hard N would be somewhere between

43 and 67 bits long.

For "easy" N I would think Lehman would enjoy further advantages versus SQUFOF

because the trial division would then be a win.

Also even if we assume SQUFOF pays no

penalty for its nastier inner loop and demand Lehman use its worst case bound while SQUFOF is allowed to use the average-N bound, we'd get a crossover N = 23.9 bits long,

and 16.9 bits if Lehman is multipled by 1.5 to account for trial division labor not being negligible.

So I regard it as completely certain Lehman is better than SQUFOF for small

enough N and the crossover ought to be 17 to 69 bits.

J.Milan in his INRIA experimental study found SQUFOF was the best factoring method among those he tried on hard N <=59 bits

long, while for hard N>=59 bits long, best was Quadratic Sieve.

Milan did not try Lehman. My analysis here indicates this was a mistake since Lehman

ought to dominate SQUFOF over much of, if not all of, its home range.

I would be interested if anybody follows up on this by actually programming and testing.

--Warren D. Smith - Next post in topic >>