--- On Wed, 2/15/12, Paul Leyland <

paul@...> wrote:

> Phil Carmody wrote:

> > > Easier said than done. First, you have to get Arjen's database.

> >

> > Presumably you get it from the same places that Arjen got it?

>

> That's one way. It's also consistent with my statement

> that it's easier said than done.

All hail the great godess Obscurity, saving the day once again!

> > > Once you have it, finding factors through GCDs is a non-trivial

> > > task. That is the limit to the help which I am prepared to give.

> >

> > I think I can do it in about the cost of about a dozen half-gigabyte

> > GCDs.

>

> If you think you can do it, you can easily verify whether you can do it

> in practice. The paper gives statistics on how many keys were found,

> the distribution of key lengths and the distribution of keys which share

> at least one prime. A few hours with a good random number generator

> should enable you to produce a synthetic data set with the same

> properties as the keys found in the wild. Once you've got your data go

> ahead and find the factorizations, then report back on how hard it was.

Well, I don't have a machine with multiple gigabytes of RAM, which would be necessary for the GCDs I have in mind. But what's wrong with just modelling the problem?

Do you agree that if the number of keys massively dominates the number of common factors, that the amount of work once you have the product of all common factors is negligible compared to the work required to generate the product of all common factors?

The algorithm I have in mind is asymptotically faster than the following, but this is trivial. Do agree that this finds the product of all common primes:

Number all keys K_0 .. K_{N-1}

For b in [0..ceil(lg(N))]

Build products L and R such that

L = prod {i=0..N-1} (K_i if i has bit b set, else 1)

R = ditto, test flipped

Compute G_b = GCD(L, R).

Asymptotically, the GCD and the multiplications should be similar in cost,

and the arguments are large enough that deviations from the asyptotes are relatively minor.

Pari/GP on this old crate can do GCDs as follows.

1M digits 6.5s

2M digits 15.5s (factor ~2.4)

4M digits 37.2s (factor ~2.35)

8M digits 98.3s (factor ~2.32)

16M digits 205s (factor ~2.3)

Assuming pessimistically that the exponent is 2.3, that can be extrapolated (we're already out of L3, so there shouldn't be any surprising jumps) to 30000s for a gigadigit GCD.

The tree multiplication can similarly be done in approximately (I calculated the time for 1 layer, and multiplied by the depth. Asuming you code for the non-linear access times of the memory heirarchy, this is a clear over-estimate.)

1M digits 8s

2M digits 18.9s (factor ~2.36)

4M digits 44.2s (factor ~2.34)

8M digits 99s (factor ~2.24)

16M digits 218s (factor ~2.2)

Assuming again pessimisatically that the exponent is 2.2, extrapolation again gives us about 24000s for gigadigit product generation.

That's clearly less than a day per bit, and there are probably machines which are between 5 and 10 times faster than this machine. I think also that the George's GIMPS bignum library would dominate GMP for most of the operations. So this phase (generating the product of all and only duplicated factors) is no more than 3 days work on a single core. And of course the above is embarassingly parallel.

And that's the "naive" version of what I have in mind. Some of the multiplcations are redundant, for example, and with only a small constant memory increase (add another gig for this size problem) would reduce the mul cost by a factor of 2. A bit more RAM, and the stage after this, extracting the individual factors and identifying which key they come from becomes trivial, cheaper than one 'b' step above.

Are there any estimations above that you disagree with? (Or do you think any two keys are not getting GCD-ed with each other in the above process?)

And is the title of the paper also shrouded in obscurity that you're sworn not to reveal, or was that just an oversight in your attempt to swiftly assert that the algorithm I hadn't even yet told you about could never work? Does it just refer to the fact that the *R*SA with multiple prime secrets is "wrong" compared the single prime secred of *D*H?

Phil