Loading ...
Sorry, an error occurred while loading the content.

Lehman factorization algorithm, theoretical comparison with SQUFOF

Expand Messages
  • WarrenS
    In this post I explain R.S.Lehman s factorization algorithm, provide a proof and runtime analysis, and compare with SQUFOF. 1. The algorithm. Let T 0 be a
    Message 1 of 2 , Jan 1, 2012
      In this post I explain R.S.Lehman's factorization algorithm, provide a proof and runtime analysis, and compare with SQUFOF.

      1. The algorithm. Let T>0 be a tunable constant. [I will prove validity if T>=1.]

      Given odd integer N>2, finds nontrivial factor (or proves N prime).
      B = floor(N^(1/3) * T);
      Trial divide N up to bound B.
      if(factor found) return(Factor);
      k++; if(k*T^3 > B){ return(N is prime); }
      if(k even){ m=2; r=1; }else{ m=4; r=(k+N)%4; }
      for(all integers a with 0<=a*a-4*k*N<=B*B and with a%m==r){
      c = a*a-4*k*N;
      Test whether c is square;
      return( gcd(a+b, N) );

      2. Proof of validity (mostly due to Don Zagier, with some improvements by me):
      If we get past trial division up to N^(1/3) then N has <=2 prime factors.
      If 0<T<1 however then N could have 3 or more prime factors (which is also ok).

      If N>2 is prime then c cannot ever be square:
      indeed if 4*k*N divides a^2-b^2 then N does and
      if N is PRIME then N must divide either a-b or a+b hence N<=a+b.
      We know a*a<4*N^(4/3)/T^2+T^2*N^(2/3)
      and we know b<=a hence this cannot happen
      if N is large enough with T fixed.
      With T=1 it suffices to check by hand validity for
      N=3,5,7,11,13,17,19,23 and the general argument
      then works for all N>=29.

      If we decrease T below 1 and get too small then we may fail for some N;
      I am not sure what the minimum allowable T would be.

      So N=p*q with p<=q and p prime. We have
      T*N^(1/3) <= p <= q = N/p <= T^(-1)*N^(2/3)
      1 <= q/p <= T^(-2)*N^(1/3)
      Now consider the regular continued fraction (RCF) for q/p.
      Let r/s and r'/s' be two successive convergents
      such that r*s <= T^(-2)*N^(1/3) < r'*s' <= N.
      Then s/r and s'/r' also are successive convergents to p/q.
      |q/p - r/s| <= |r'/s' - r/s| = 1/(s*s') hence |p*r-q*s| <= p/s'
      |p/q - s/r| <= |s'/r' - s/r| = 1/(r*r') hence |p*r-q*s| <= q/r'
      hence geometric mean shows
      |p*r-q*s| <= sqrt( (p*q)/(r'*s') ) = sqrt(N/(r'*s')) <= N^(1/3) * T
      and with
      we have
      a*a-4*k*N=(p*s-q*r)^2 <= N^(2/3) * T^2 = B*B
      k*T^3 <= T*N^(1/3) = B
      hence the squareness test succeeds (the congruences for a are
      easily checked by considerations mod 4).

      3. Runtime analysis.
      For each value of k there are at most
      (1/2) * (1 + sqrt(4*k*N + T^2 * N^(2/3)) - sqrt(4*k*N))
      values of a to try, which is both upper bounded by (using
      or monotonicity of sqrt and its the derivative)
      and asymptotically equal to
      1/2 + T^2*N^(1/6)*k^(-1/2)/8, [side note: Lehman agrees]
      and on average over both k-parities only 3/8 + T^2*N^(1/6)*k^(-1/2)*3/32.
      And since sum(k<=x)k^(-1/2)<=2*(x+1)^(1/2) we see the total number of squareness tests should be <=(1/8)*T*N^(1/3) + N^(1/3)/(2*T^2) + 1 even in the worst case.
      And should be <=(3/16)*T*N^(1/3) + (3/8)*N^(1/3)/T^2 + 1 averaged over hard N.
      The latter is minimized by choosing T=2^(2/3)=1.5874 getting #squareness tests<=0.4465*N^(1/3) + 1.
      The former minimized by same T but now getting
      #squareness tests (averaged)<=0.2977*N^(1/3) + 1.
      These are rigorous upper bounds on the number of squareness tests.

      Average time (even averaged over hard N meaning N for which the trial division stage yields nothing) ought to be at least a factor 2 faster than these
      worst case bounds.

      If T=1 (or any fixed T) then the trial division stage should be faster
      by a log or loglog factor than the N^(1/3) time for
      the square-testing stage (since primes are sparse in the integers by
      a log factor) so is neglectible in comparison.
      It might not really be neglectible since N is not large enough for logN or loglogN
      to be large enough for neglectibility, though, in which case
      multiply the runtime estimate by maybe 1.5.

      4. Programming tricks:
      A. In the inner loop c is bounded between 0 and T^2*N^(2/3)
      and can be updated additively by adding stepsize*(anew+aold) to it
      when we update a to anew=a+stepsize. This saves a multiplicaiton in
      the inner loop.

      B. The square-testing can use modular info to avoid computing sqrt most of the time.

      C. It helps on typical N to try the k with many divisors first.
      Any ordering of the k's is valid, but if you try k's with large numbers of
      divisors first, then you'll usually factor N sooner.
      Lehman argues that k with D divisors has likelyhood proportional to D of
      yielding a factor on any particular squarity test.
      The average order of the #divisors(N) function is lnN so one
      might conjecture that Lehman average time would be sped up
      by a log factor by such tricks.

      Lehman suggests one way to accomplish such a favorable ordering, but I
      think a better programmer could do a better job of this than Lehman did.

      5. Comparison with Shanks SQUFOF.
      Shanks in unpublished work completed by J.Gower & S.Wagstaff
      argues that for average "hard" N (meaning N=product of two primes and N=3 mod 4)
      the "average number of forms SQUFOF must examine before finding proper
      square form" is 1.775*N^(1/4) asymptotically.

      CROSSOVER: The N which causes
      #ShanksSQUFOF and #Lehman average squareness tests to be equal,
      is N=2^(42.9).

      For larger N, SQUFOF should win; for smaller N, Lehman should win.

      However, "runtime" and "#squareness tests" are not exactly the same thing.

      Shanks seems to have a nastier inner loop than Lehman since it has more operations and
      includes among them, a division. On the brighter side, Shanks can live with narrower words, most numbers are N^(1/2) at most with Shanks, while with Lehman it is N^(2/3).
      If we were to assume Shanks was 4 times nastier at same number of squareness tests, then the crossover point would move to N=2^(66.9).

      The true crossover then (I guess) - for hard N would be somewhere between
      43 and 67 bits long.

      For "easy" N I would think Lehman would enjoy further advantages versus SQUFOF
      because the trial division would then be a win.

      Also even if we assume SQUFOF pays no
      penalty for its nastier inner loop and demand Lehman use its worst case bound while SQUFOF is allowed to use the average-N bound, we'd get a crossover N = 23.9 bits long,
      and 16.9 bits if Lehman is multipled by 1.5 to account for trial division labor not being negligible.
      So I regard it as completely certain Lehman is better than SQUFOF for small
      enough N and the crossover ought to be 17 to 69 bits.

      J.Milan in his INRIA experimental study found SQUFOF was the best factoring method among those he tried on hard N <=59 bits
      long, while for hard N>=59 bits long, best was Quadratic Sieve.
      Milan did not try Lehman. My analysis here indicates this was a mistake since Lehman
      ought to dominate SQUFOF over much of, if not all of, its home range.

      I would be interested if anybody follows up on this by actually programming and testing.
      --Warren D. Smith
    • WarrenS
      It turns out William B. Hart in a preprint implemented his own heuristic factoring algorithm which he calls one line factoring algorithm because it is very
      Message 2 of 2 , Jan 1, 2012
        It turns out William B. Hart in a preprint
        implemented his own heuristic factoring algorithm
        which he calls "one line factoring algorithm" because it is very simple.
        Except actually it isn't that simple when you try to optimize its performance.

        It heuristically runs in O(N^(1/3)) time, except it can and will fail on some N.

        Hart then points out that a good way to implement Lehman's algorithm involving
        "turning the loops inside out" in the part of the search space where Lehman's inner loop
        only does 1 iteration... basically coincides with his "one line" algorithm in that part of the search space.

        If this is done, Hart finds that Lehman runs in essentially exactly
        the same average time as his "one line" method, but is better in the worst case since Lehman always succeeds while the "one line" method can fail.

        And then he finds that Lehman is competitive with Pari's built in factorer for N<=42 bits long, and is superior to Pari from 26 to 41 bits. Above 41 Pari (apparently using SQUFOF)
        is better. Below 26, Pari is usually better, but never by as much as a factor of 2,
        and it is probably just since Pari implemented trial division better than Hart did.

        this programming work by Hart totally confirms my theoretical analysis last post
        where I predicted the Lehman/SQUFOF crossover would be at 42 bits; he finds
        it is 41 bits.

        Hart also finds a better way to code Lehman that reduces loop-boundary overhead.
      Your message has been successfully submitted and would be delivered to recipients shortly.