Browse Groups

• ## Lehman factorization algorithm, theoretical comparison with SQUFOF

(2)
• NextPrevious
• In this post I explain R.S.Lehman s factorization algorithm, provide a proof and runtime analysis, and compare with SQUFOF. 1. The algorithm. Let T 0 be a
Message 1 of 2 , Jan 1, 2012
View Source
In this post I explain R.S.Lehman's factorization algorithm, provide a proof and runtime analysis, and compare with SQUFOF.

1. The algorithm. Let T>0 be a tunable constant. [I will prove validity if T>=1.]

Given odd integer N>2, finds nontrivial factor (or proves N prime).
B = floor(N^(1/3) * T);
Trial divide N up to bound B.
if(factor found) return(Factor);
k=0;
LoopForever{
k++; if(k*T^3 > B){ return(N is prime); }
if(k even){ m=2; r=1; }else{ m=4; r=(k+N)%4; }
for(all integers a with 0<=a*a-4*k*N<=B*B and with a%m==r){
c = a*a-4*k*N;
Test whether c is square;
if(issquare){
b=sqrt(c);
return( gcd(a+b, N) );
}
}
}

2. Proof of validity (mostly due to Don Zagier, with some improvements by me):
If we get past trial division up to N^(1/3) then N has <=2 prime factors.
If 0<T<1 however then N could have 3 or more prime factors (which is also ok).

If N>2 is prime then c cannot ever be square:
indeed if 4*k*N divides a^2-b^2 then N does and
if N is PRIME then N must divide either a-b or a+b hence N<=a+b.
We know a*a<4*N^(4/3)/T^2+T^2*N^(2/3)
and we know b<=a hence this cannot happen
if N is large enough with T fixed.
With T=1 it suffices to check by hand validity for
N=3,5,7,11,13,17,19,23 and the general argument
then works for all N>=29.

If we decrease T below 1 and get too small then we may fail for some N;
I am not sure what the minimum allowable T would be.

So N=p*q with p<=q and p prime. We have
T*N^(1/3) <= p <= q = N/p <= T^(-1)*N^(2/3)
1 <= q/p <= T^(-2)*N^(1/3)
Now consider the regular continued fraction (RCF) for q/p.
Let r/s and r'/s' be two successive convergents
such that r*s <= T^(-2)*N^(1/3) < r'*s' <= N.
Then s/r and s'/r' also are successive convergents to p/q.
|q/p - r/s| <= |r'/s' - r/s| = 1/(s*s') hence |p*r-q*s| <= p/s'
|p/q - s/r| <= |s'/r' - s/r| = 1/(r*r') hence |p*r-q*s| <= q/r'
hence geometric mean shows
|p*r-q*s| <= sqrt( (p*q)/(r'*s') ) = sqrt(N/(r'*s')) <= N^(1/3) * T
and with
k=r*s
a=p*s+q*r
we have
a*a-4*k*N=(p*s-q*r)^2 <= N^(2/3) * T^2 = B*B
k*T^3 <= T*N^(1/3) = B
hence the squareness test succeeds (the congruences for a are
easily checked by considerations mod 4).
QED.

3. Runtime analysis.
For each value of k there are at most
(1/2) * (1 + sqrt(4*k*N + T^2 * N^(2/3)) - sqrt(4*k*N))
values of a to try, which is both upper bounded by (using
or monotonicity of sqrt and its the derivative)
and asymptotically equal to
1/2 + T^2*N^(1/6)*k^(-1/2)/8, [side note: Lehman agrees]
and on average over both k-parities only 3/8 + T^2*N^(1/6)*k^(-1/2)*3/32.
And since sum(k<=x)k^(-1/2)<=2*(x+1)^(1/2) we see the total number of squareness tests should be <=(1/8)*T*N^(1/3) + N^(1/3)/(2*T^2) + 1 even in the worst case.
And should be <=(3/16)*T*N^(1/3) + (3/8)*N^(1/3)/T^2 + 1 averaged over hard N.
The latter is minimized by choosing T=2^(2/3)=1.5874 getting #squareness tests<=0.4465*N^(1/3) + 1.
The former minimized by same T but now getting
#squareness tests (averaged)<=0.2977*N^(1/3) + 1.
These are rigorous upper bounds on the number of squareness tests.

Average time (even averaged over hard N meaning N for which the trial division stage yields nothing) ought to be at least a factor 2 faster than these
worst case bounds.

If T=1 (or any fixed T) then the trial division stage should be faster
by a log or loglog factor than the N^(1/3) time for
the square-testing stage (since primes are sparse in the integers by
a log factor) so is neglectible in comparison.
It might not really be neglectible since N is not large enough for logN or loglogN
to be large enough for neglectibility, though, in which case
multiply the runtime estimate by maybe 1.5.

4. Programming tricks:
A. In the inner loop c is bounded between 0 and T^2*N^(2/3)
when we update a to anew=a+stepsize. This saves a multiplicaiton in
the inner loop.

B. The square-testing can use modular info to avoid computing sqrt most of the time.

C. It helps on typical N to try the k with many divisors first.
Any ordering of the k's is valid, but if you try k's with large numbers of
divisors first, then you'll usually factor N sooner.
Lehman argues that k with D divisors has likelyhood proportional to D of
yielding a factor on any particular squarity test.
The average order of the #divisors(N) function is lnN so one
might conjecture that Lehman average time would be sped up
by a log factor by such tricks.

Lehman suggests one way to accomplish such a favorable ordering, but I
think a better programmer could do a better job of this than Lehman did.

5. Comparison with Shanks SQUFOF.
Shanks in unpublished work completed by J.Gower & S.Wagstaff
argues that for average "hard" N (meaning N=product of two primes and N=3 mod 4)
the "average number of forms SQUFOF must examine before finding proper
square form" is 1.775*N^(1/4) asymptotically.

CROSSOVER: The N which causes
#ShanksSQUFOF and #Lehman average squareness tests to be equal,
is N=2^(42.9).

For larger N, SQUFOF should win; for smaller N, Lehman should win.

However, "runtime" and "#squareness tests" are not exactly the same thing.

Shanks seems to have a nastier inner loop than Lehman since it has more operations and
includes among them, a division. On the brighter side, Shanks can live with narrower words, most numbers are N^(1/2) at most with Shanks, while with Lehman it is N^(2/3).
If we were to assume Shanks was 4 times nastier at same number of squareness tests, then the crossover point would move to N=2^(66.9).

The true crossover then (I guess) - for hard N would be somewhere between
43 and 67 bits long.

For "easy" N I would think Lehman would enjoy further advantages versus SQUFOF
because the trial division would then be a win.

Also even if we assume SQUFOF pays no
penalty for its nastier inner loop and demand Lehman use its worst case bound while SQUFOF is allowed to use the average-N bound, we'd get a crossover N = 23.9 bits long,
and 16.9 bits if Lehman is multipled by 1.5 to account for trial division labor not being negligible.
So I regard it as completely certain Lehman is better than SQUFOF for small
enough N and the crossover ought to be 17 to 69 bits.

J.Milan in his INRIA experimental study found SQUFOF was the best factoring method among those he tried on hard N <=59 bits
long, while for hard N>=59 bits long, best was Quadratic Sieve.
Milan did not try Lehman. My analysis here indicates this was a mistake since Lehman
ought to dominate SQUFOF over much of, if not all of, its home range.

I would be interested if anybody follows up on this by actually programming and testing.
--Warren D. Smith
• It turns out William B. Hart in a preprint implemented his own heuristic factoring algorithm which he calls one line factoring algorithm because it is very
Message 1 of 2 , Jan 1, 2012
View Source
It turns out William B. Hart in a preprint
implemented his own heuristic factoring algorithm
which he calls "one line factoring algorithm" because it is very simple.
Except actually it isn't that simple when you try to optimize its performance.
http://137.205.37.242/onelinefactor.pdf

It heuristically runs in O(N^(1/3)) time, except it can and will fail on some N.

Hart then points out that a good way to implement Lehman's algorithm involving
"turning the loops inside out" in the part of the search space where Lehman's inner loop
only does 1 iteration... basically coincides with his "one line" algorithm in that part of the search space.

If this is done, Hart finds that Lehman runs in essentially exactly
the same average time as his "one line" method, but is better in the worst case since Lehman always succeeds while the "one line" method can fail.

And then he finds that Lehman is competitive with Pari's built in factorer for N<=42 bits long, and is superior to Pari from 26 to 41 bits. Above 41 Pari (apparently using SQUFOF)
is better. Below 26, Pari is usually better, but never by as much as a factor of 2,
and it is probably just since Pari implemented trial division better than Hart did.

So CONCLUSION
this programming work by Hart totally confirms my theoretical analysis last post
where I predicted the Lehman/SQUFOF crossover would be at 42 bits; he finds
it is 41 bits.

Hart also finds a better way to code Lehman that reduces loop-boundary overhead.
Your message has been successfully submitted and would be delivered to recipients shortly.
• Changes have not been saved
Press OK to abandon changes or Cancel to continue editing
• Your browser is not supported
Kindly note that Groups does not support 7.0 or earlier versions of Internet Explorer. We recommend upgrading to the latest Internet Explorer, Google Chrome, or Firefox. If you are using IE 9 or later, make sure you turn off Compatibility View.