- Oct 2, 2001Dear All,

I have records from over 4,000,000 ECM trials on composites from various

projects. For my prime GRU project, about 1.5M. As David said, I use

RPB's scheme of B1=330*(Trial No.).

Here are some stats on 14219 factors of N-1 where N is a PrP GRU.

I use Lg() for the Base-10 Log.

Factor <--- ECM TRIALS --->

Digits # Mean 83pc 98pc Fit50

5 1,037 1.00 1.07 1.13 1.54

6 1,608 1.01 1.10 1.19 1.95

7 1,386 1.07 1.37 1.68 2.47

8 1,297 1.20 1.80 2.39 3.12

9 1,081 1.65 2.97 4.30 3.95

10 1,006 2.52 4.50 6.48 4.99

11 861 3.81 6.85 9.89 6.31

12 749 5.61 9.67 13.72 7.99

13 660 8.24 13.33 18.41 10.10

14 580 11.70 18.72 25.73 12.78

15 554 15.05 24.53 34.01 16.17

16 505 21.60 33.49 45.39 20.46

17 487 27.95 44.96 61.97 25.88

18 417 36.38 56.02 75.66 32.74

19 359 48.52 75.29 102.06 41.42

20 340 60.56 90.97 121.39 52.40

21 297 75.64 117.41 159.18 66.29

22 207 88.12 135.28 182.44 83.87

23 204 107.65 174.91 242.17 106.10

24 159 146.75 231.01 315.27 134.22

25 108 197.58 337.07 476.56 169.80

26 78 218.05 353.13 488.21 214.81

27 70 275.24 431.47 587.70 271.75

28 49 394.73 653.13 911.52 343.79

29 53 464.87 747.24 1,029.61 434.92

30 29 508.48 777.16 1,045.84 550.20

31 26 796.00 1,135.91 1,475.81 696.05

32 14 709.14 1,082.72 1,456.30 880.56

33 11 881.09 1,300.32 1,719.54 1,113.98

34 4 387.25 495.08 602.90 1,409.27

35 7 1,129.43 1,605.23 2,081.03 1,782.84

36 3 663.00 1,120.64 1,578.29 2,255.43

37 1 255.00 - - 2,853.29

38 1 1,166.00 - - 3,609.64

39 2 2,678.50 3,086.50 3,494.50 4,566.48

44 1 958.00 - - 14,796.84

Mean and STD are calculated from the raw data. "83pc" = Mean+STD.

"98pc" = Mean+2*STD. "Fit50" is the expected number of trials to

find a factor of "Digits" digits, derived from this formula (obtained

by linear regression on the 3986 factors of 15+ digits found):

Lg(Trials) = 0.102117 * Lg(Factor) - 0.323000

Note: As the work involved in [Trials] is roughly proportional to

[Trials]^2, the 0.102089 above agrees quite well with [Factor]^(1/5)

(which would give 1/10 exactly) from the theory of the method.

<arms="waving">

It seems that STD < Mean, so running 2*[Fit50] trials should garner over

98% of factors of the given size.

<arms="at sides">

As I don't want to write, export and run batch files for one or two

trials for each composite, I noticed (imagined?) this pattern:

Lg 98%+ Rounded

15 2*16.17 35

20 2*52.40 3*35=105

25 2*169.80 10*35=350

30 2*550.20 32*35=1120

(The multiplier for an extra five digits is theoretically 10^0.5)

So, I always create batches that take composites to a multiple of 35

trials.

Even the largest comps get 105 trials for 20 digit factors.

The 100 shortest comps are given to two old machines and ECM'd to

exhaustion - currently about 1200 trials each, but this will rise.

This is not so much becuse they are vital or even useful for proofs, but

because I want more data points for 30+ digit factors for the above

modelling :-)

Contrariwise, anything under 80 digits gets ECM for no more than twice

the expected run time for the Quadratic Sieve, whether "PPMP" or "PPSI",

both courtesy of Tomabechi-san. I don't want to skew the ECM stats by

including composites that are the product of two similar-sized primes,

each of which latterly has a comparable chance of being found by a

particular ECM trial.

Middling-sized composites are dealt with on the basis of equal time

being given to each. In practice, this means that each day one of my

K6-2 machines runs another 35 ECM trials on each of the 1% of the

composites that have had the least time spent on them.

There are almost 4,000 composite N-1 factors in my database. Minimum

ECM time spent is about 1.1 hours, median about 2 hours.

Timings:

~~~~~~~~

Expressed in hours on an AMD K6/2/450 (as I have 6 of these).

Modelled by running GMP-ECM4c on 32 smallest primes above 10^n

for 56<=n<=1759 at B1=1000, sigma = 16011963. Attempted to fit a curve

of the form Hours=coeff*Lg(Number)^expo. Best fit found by using two

curves, one for Lg(Number) <=700 and one for Lg(Number) >700.

If Lg(Number) <= 700 Then

coeff = 0.0000000001013

expo = 1.578

Else

coeff = 0.00000000002428

expo = 1.794

End If

Monte Carlo or Bust

~~~~~~~~~~~~~~~~~~~

It seems that Bouk and I agree on increasing the randomness of a search

on a particular composite. Let us take a composite with identifier "ID"

of, say, 110+ digits so that none of the sieve methods is trivial and

which is important for an N-1 proof:

My current plan (as yet unimplemented) is:

1. Do the first 350 trials. If no result then the composite has less than

a 2% chance of having a factor less than 25 digits.

2. Note the next trial number NT (i.e. 351)

3. Generate records for a table:

ID Long Integer (so keying into my database)

B1 Long Integer

Reserved Boolean

Completed Boolean

with B1 = 330*temp with temp ranging from NT until sufficient "Hours"

are included.

4. For each PC that you want to allocate to the task, build a batch file

by SQL selecting the above table (where (NOT Reserved) and (NOT Completed))

in order of B1 and moving forwards by [RecordCount]*Random()*Random()

until the time needed exceeds the time you want to allocate on that PC.

Using Random() twice will skew the distribution towards the smaller B1s.

As you add each trial to the batch, mark it as "Reserved". If

insufficient hours left, goto 2.

5. Run the batch on the chosen PC. Read the Log file back into the

master PC.

6. When you parse the log file from each PC, mark each trial that ran

without finding a factor as "Completed" and NOT "Reserved" (or delete

the record, if space is tight).

7. If factor found then goto pub. If [product(prime factors)^(10/3) > N]

and [Lg(N) >13034] then buy drinks for everyone.

8. Goto 4

[Hmmm: needs some work before it makes the next edition of Knuth ;-) ]

Hope this rambling is of some interest. If not, apologies.

Raw data and graphs available on request.

Best regards to all,

Andy - Next post in topic >>