Loading ...
Sorry, an error occurred while loading the content.

23964Lehman factorization algorithm, theoretical comparison with SQUFOF

Expand Messages
  • WarrenS
    Jan 1, 2012
      In this post I explain R.S.Lehman's factorization algorithm, provide a proof and runtime analysis, and compare with SQUFOF.

      1. The algorithm. Let T>0 be a tunable constant. [I will prove validity if T>=1.]

      Given odd integer N>2, finds nontrivial factor (or proves N prime).
      B = floor(N^(1/3) * T);
      Trial divide N up to bound B.
      if(factor found) return(Factor);
      k++; if(k*T^3 > B){ return(N is prime); }
      if(k even){ m=2; r=1; }else{ m=4; r=(k+N)%4; }
      for(all integers a with 0<=a*a-4*k*N<=B*B and with a%m==r){
      c = a*a-4*k*N;
      Test whether c is square;
      return( gcd(a+b, N) );

      2. Proof of validity (mostly due to Don Zagier, with some improvements by me):
      If we get past trial division up to N^(1/3) then N has <=2 prime factors.
      If 0<T<1 however then N could have 3 or more prime factors (which is also ok).

      If N>2 is prime then c cannot ever be square:
      indeed if 4*k*N divides a^2-b^2 then N does and
      if N is PRIME then N must divide either a-b or a+b hence N<=a+b.
      We know a*a<4*N^(4/3)/T^2+T^2*N^(2/3)
      and we know b<=a hence this cannot happen
      if N is large enough with T fixed.
      With T=1 it suffices to check by hand validity for
      N=3,5,7,11,13,17,19,23 and the general argument
      then works for all N>=29.

      If we decrease T below 1 and get too small then we may fail for some N;
      I am not sure what the minimum allowable T would be.

      So N=p*q with p<=q and p prime. We have
      T*N^(1/3) <= p <= q = N/p <= T^(-1)*N^(2/3)
      1 <= q/p <= T^(-2)*N^(1/3)
      Now consider the regular continued fraction (RCF) for q/p.
      Let r/s and r'/s' be two successive convergents
      such that r*s <= T^(-2)*N^(1/3) < r'*s' <= N.
      Then s/r and s'/r' also are successive convergents to p/q.
      |q/p - r/s| <= |r'/s' - r/s| = 1/(s*s') hence |p*r-q*s| <= p/s'
      |p/q - s/r| <= |s'/r' - s/r| = 1/(r*r') hence |p*r-q*s| <= q/r'
      hence geometric mean shows
      |p*r-q*s| <= sqrt( (p*q)/(r'*s') ) = sqrt(N/(r'*s')) <= N^(1/3) * T
      and with
      we have
      a*a-4*k*N=(p*s-q*r)^2 <= N^(2/3) * T^2 = B*B
      k*T^3 <= T*N^(1/3) = B
      hence the squareness test succeeds (the congruences for a are
      easily checked by considerations mod 4).

      3. Runtime analysis.
      For each value of k there are at most
      (1/2) * (1 + sqrt(4*k*N + T^2 * N^(2/3)) - sqrt(4*k*N))
      values of a to try, which is both upper bounded by (using
      or monotonicity of sqrt and its the derivative)
      and asymptotically equal to
      1/2 + T^2*N^(1/6)*k^(-1/2)/8, [side note: Lehman agrees]
      and on average over both k-parities only 3/8 + T^2*N^(1/6)*k^(-1/2)*3/32.
      And since sum(k<=x)k^(-1/2)<=2*(x+1)^(1/2) we see the total number of squareness tests should be <=(1/8)*T*N^(1/3) + N^(1/3)/(2*T^2) + 1 even in the worst case.
      And should be <=(3/16)*T*N^(1/3) + (3/8)*N^(1/3)/T^2 + 1 averaged over hard N.
      The latter is minimized by choosing T=2^(2/3)=1.5874 getting #squareness tests<=0.4465*N^(1/3) + 1.
      The former minimized by same T but now getting
      #squareness tests (averaged)<=0.2977*N^(1/3) + 1.
      These are rigorous upper bounds on the number of squareness tests.

      Average time (even averaged over hard N meaning N for which the trial division stage yields nothing) ought to be at least a factor 2 faster than these
      worst case bounds.

      If T=1 (or any fixed T) then the trial division stage should be faster
      by a log or loglog factor than the N^(1/3) time for
      the square-testing stage (since primes are sparse in the integers by
      a log factor) so is neglectible in comparison.
      It might not really be neglectible since N is not large enough for logN or loglogN
      to be large enough for neglectibility, though, in which case
      multiply the runtime estimate by maybe 1.5.

      4. Programming tricks:
      A. In the inner loop c is bounded between 0 and T^2*N^(2/3)
      and can be updated additively by adding stepsize*(anew+aold) to it
      when we update a to anew=a+stepsize. This saves a multiplicaiton in
      the inner loop.

      B. The square-testing can use modular info to avoid computing sqrt most of the time.

      C. It helps on typical N to try the k with many divisors first.
      Any ordering of the k's is valid, but if you try k's with large numbers of
      divisors first, then you'll usually factor N sooner.
      Lehman argues that k with D divisors has likelyhood proportional to D of
      yielding a factor on any particular squarity test.
      The average order of the #divisors(N) function is lnN so one
      might conjecture that Lehman average time would be sped up
      by a log factor by such tricks.

      Lehman suggests one way to accomplish such a favorable ordering, but I
      think a better programmer could do a better job of this than Lehman did.

      5. Comparison with Shanks SQUFOF.
      Shanks in unpublished work completed by J.Gower & S.Wagstaff
      argues that for average "hard" N (meaning N=product of two primes and N=3 mod 4)
      the "average number of forms SQUFOF must examine before finding proper
      square form" is 1.775*N^(1/4) asymptotically.

      CROSSOVER: The N which causes
      #ShanksSQUFOF and #Lehman average squareness tests to be equal,
      is N=2^(42.9).

      For larger N, SQUFOF should win; for smaller N, Lehman should win.

      However, "runtime" and "#squareness tests" are not exactly the same thing.

      Shanks seems to have a nastier inner loop than Lehman since it has more operations and
      includes among them, a division. On the brighter side, Shanks can live with narrower words, most numbers are N^(1/2) at most with Shanks, while with Lehman it is N^(2/3).
      If we were to assume Shanks was 4 times nastier at same number of squareness tests, then the crossover point would move to N=2^(66.9).

      The true crossover then (I guess) - for hard N would be somewhere between
      43 and 67 bits long.

      For "easy" N I would think Lehman would enjoy further advantages versus SQUFOF
      because the trial division would then be a win.

      Also even if we assume SQUFOF pays no
      penalty for its nastier inner loop and demand Lehman use its worst case bound while SQUFOF is allowed to use the average-N bound, we'd get a crossover N = 23.9 bits long,
      and 16.9 bits if Lehman is multipled by 1.5 to account for trial division labor not being negligible.
      So I regard it as completely certain Lehman is better than SQUFOF for small
      enough N and the crossover ought to be 17 to 69 bits.

      J.Milan in his INRIA experimental study found SQUFOF was the best factoring method among those he tried on hard N <=59 bits
      long, while for hard N>=59 bits long, best was Quadratic Sieve.
      Milan did not try Lehman. My analysis here indicates this was a mistake since Lehman
      ought to dominate SQUFOF over much of, if not all of, its home range.

      I would be interested if anybody follows up on this by actually programming and testing.
      --Warren D. Smith
    • Show all 2 messages in this topic