- View SourceIn this post I explain R.S.Lehman's factorization algorithm, provide a proof and runtime analysis, and compare with SQUFOF.

1. The algorithm. Let T>0 be a tunable constant. [I will prove validity if T>=1.]

Given odd integer N>2, finds nontrivial factor (or proves N prime).

B = floor(N^(1/3) * T);

Trial divide N up to bound B.

if(factor found) return(Factor);

k=0;

LoopForever{

k++; if(k*T^3 > B){ return(N is prime); }

if(k even){ m=2; r=1; }else{ m=4; r=(k+N)%4; }

for(all integers a with 0<=a*a-4*k*N<=B*B and with a%m==r){

c = a*a-4*k*N;

Test whether c is square;

if(issquare){

b=sqrt(c);

return( gcd(a+b, N) );

}

}

}

2. Proof of validity (mostly due to Don Zagier, with some improvements by me):

If we get past trial division up to N^(1/3) then N has <=2 prime factors.

If 0<T<1 however then N could have 3 or more prime factors (which is also ok).

If N>2 is prime then c cannot ever be square:

indeed if 4*k*N divides a^2-b^2 then N does and

if N is PRIME then N must divide either a-b or a+b hence N<=a+b.

We know a*a<4*N^(4/3)/T^2+T^2*N^(2/3)

and we know b<=a hence this cannot happen

if N is large enough with T fixed.

With T=1 it suffices to check by hand validity for

N=3,5,7,11,13,17,19,23 and the general argument

then works for all N>=29.

If we decrease T below 1 and get too small then we may fail for some N;

I am not sure what the minimum allowable T would be.

So N=p*q with p<=q and p prime. We have

T*N^(1/3) <= p <= q = N/p <= T^(-1)*N^(2/3)

1 <= q/p <= T^(-2)*N^(1/3)

Now consider the regular continued fraction (RCF) for q/p.

Let r/s and r'/s' be two successive convergents

such that r*s <= T^(-2)*N^(1/3) < r'*s' <= N.

Then s/r and s'/r' also are successive convergents to p/q.

|q/p - r/s| <= |r'/s' - r/s| = 1/(s*s') hence |p*r-q*s| <= p/s'

|p/q - s/r| <= |s'/r' - s/r| = 1/(r*r') hence |p*r-q*s| <= q/r'

hence geometric mean shows

|p*r-q*s| <= sqrt( (p*q)/(r'*s') ) = sqrt(N/(r'*s')) <= N^(1/3) * T

and with

k=r*s

a=p*s+q*r

we have

a*a-4*k*N=(p*s-q*r)^2 <= N^(2/3) * T^2 = B*B

k*T^3 <= T*N^(1/3) = B

hence the squareness test succeeds (the congruences for a are

easily checked by considerations mod 4).

QED.

3. Runtime analysis.

For each value of k there are at most

(1/2) * (1 + sqrt(4*k*N + T^2 * N^(2/3)) - sqrt(4*k*N))

values of a to try, which is both upper bounded by (using

or monotonicity of sqrt and its the derivative)

and asymptotically equal to

1/2 + T^2*N^(1/6)*k^(-1/2)/8, [side note: Lehman agrees]

and on average over both k-parities only 3/8 + T^2*N^(1/6)*k^(-1/2)*3/32.

And since sum(k<=x)k^(-1/2)<=2*(x+1)^(1/2) we see the total number of squareness tests should be <=(1/8)*T*N^(1/3) + N^(1/3)/(2*T^2) + 1 even in the worst case.

And should be <=(3/16)*T*N^(1/3) + (3/8)*N^(1/3)/T^2 + 1 averaged over hard N.

The latter is minimized by choosing T=2^(2/3)=1.5874 getting #squareness tests<=0.4465*N^(1/3) + 1.

The former minimized by same T but now getting

#squareness tests (averaged)<=0.2977*N^(1/3) + 1.

These are rigorous upper bounds on the number of squareness tests.

Average time (even averaged over hard N meaning N for which the trial division stage yields nothing) ought to be at least a factor 2 faster than these

worst case bounds.

If T=1 (or any fixed T) then the trial division stage should be faster

by a log or loglog factor than the N^(1/3) time for

the square-testing stage (since primes are sparse in the integers by

a log factor) so is neglectible in comparison.

It might not really be neglectible since N is not large enough for logN or loglogN

to be large enough for neglectibility, though, in which case

multiply the runtime estimate by maybe 1.5.

4. Programming tricks:

A. In the inner loop c is bounded between 0 and T^2*N^(2/3)

and can be updated additively by adding stepsize*(anew+aold) to it

when we update a to anew=a+stepsize. This saves a multiplicaiton in

the inner loop.

B. The square-testing can use modular info to avoid computing sqrt most of the time.

C. It helps on typical N to try the k with many divisors first.

Any ordering of the k's is valid, but if you try k's with large numbers of

divisors first, then you'll usually factor N sooner.

Lehman argues that k with D divisors has likelyhood proportional to D of

yielding a factor on any particular squarity test.

The average order of the #divisors(N) function is lnN so one

might conjecture that Lehman average time would be sped up

by a log factor by such tricks.

Lehman suggests one way to accomplish such a favorable ordering, but I

think a better programmer could do a better job of this than Lehman did.

5. Comparison with Shanks SQUFOF.

Shanks in unpublished work completed by J.Gower & S.Wagstaff

argues that for average "hard" N (meaning N=product of two primes and N=3 mod 4)

the "average number of forms SQUFOF must examine before finding proper

square form" is 1.775*N^(1/4) asymptotically.

CROSSOVER: The N which causes

#ShanksSQUFOF and #Lehman average squareness tests to be equal,

is N=2^(42.9).

For larger N, SQUFOF should win; for smaller N, Lehman should win.

However, "runtime" and "#squareness tests" are not exactly the same thing.

Shanks seems to have a nastier inner loop than Lehman since it has more operations and

includes among them, a division. On the brighter side, Shanks can live with narrower words, most numbers are N^(1/2) at most with Shanks, while with Lehman it is N^(2/3).

If we were to assume Shanks was 4 times nastier at same number of squareness tests, then the crossover point would move to N=2^(66.9).

The true crossover then (I guess) - for hard N would be somewhere between

43 and 67 bits long.

For "easy" N I would think Lehman would enjoy further advantages versus SQUFOF

because the trial division would then be a win.

Also even if we assume SQUFOF pays no

penalty for its nastier inner loop and demand Lehman use its worst case bound while SQUFOF is allowed to use the average-N bound, we'd get a crossover N = 23.9 bits long,

and 16.9 bits if Lehman is multipled by 1.5 to account for trial division labor not being negligible.

So I regard it as completely certain Lehman is better than SQUFOF for small

enough N and the crossover ought to be 17 to 69 bits.

J.Milan in his INRIA experimental study found SQUFOF was the best factoring method among those he tried on hard N <=59 bits

long, while for hard N>=59 bits long, best was Quadratic Sieve.

Milan did not try Lehman. My analysis here indicates this was a mistake since Lehman

ought to dominate SQUFOF over much of, if not all of, its home range.

I would be interested if anybody follows up on this by actually programming and testing.

--Warren D. Smith - View SourceIt turns out William B. Hart in a preprint

implemented his own heuristic factoring algorithm

which he calls "one line factoring algorithm" because it is very simple.

Except actually it isn't that simple when you try to optimize its performance.

http://137.205.37.242/onelinefactor.pdf

It heuristically runs in O(N^(1/3)) time, except it can and will fail on some N.

Hart then points out that a good way to implement Lehman's algorithm involving

"turning the loops inside out" in the part of the search space where Lehman's inner loop

only does 1 iteration... basically coincides with his "one line" algorithm in that part of the search space.

If this is done, Hart finds that Lehman runs in essentially exactly

the same average time as his "one line" method, but is better in the worst case since Lehman always succeeds while the "one line" method can fail.

And then he finds that Lehman is competitive with Pari's built in factorer for N<=42 bits long, and is superior to Pari from 26 to 41 bits. Above 41 Pari (apparently using SQUFOF)

is better. Below 26, Pari is usually better, but never by as much as a factor of 2,

and it is probably just since Pari implemented trial division better than Hart did.

So CONCLUSION

this programming work by Hart totally confirms my theoretical analysis last post

where I predicted the Lehman/SQUFOF crossover would be at 42 bits; he finds

it is 41 bits.

Hart also finds a better way to code Lehman that reduces loop-boundary overhead.