- The discussion on this topic has surprised me.

I had expected that the order of the upper bound on the run time for the

Miller-Rabin test

to be a simple and well understood function of the number of digits of the

number to be tested.

Each test of a witness requires less than 2 log_2(z) multiplications mod Z,

where Z is the integer to be tested.

Kermit < kermit@... > - --- Kermit Rose <kermit@...> wrote:
> The discussion on this topic has surprised me.

Indeed. The real question is 'how long does a modular multiply take',

>

> I had expected that the order of the upper bound on the run time for the

> Miller-Rabin test

> to be a simple and well understood function of the number of digits of the

> number to be tested.

>

>

> Each test of a witness requires less than 2 log_2(Z) multiplications mod Z,

> where Z is the integer to be tested.

as I indicated in my reply. And as Jack indicated in his, there's not

necesarily a simple answer, depending on the algorithms used.

If your code is optimised for using no more than just the smallest N levels of

the memory heirarchy, then as soon as your problem size outgrows level N, there

can be an enormous, and without accurate modelling pretty unpredictable,

slowdown. (You might be lucky, the cache associativity might behigh enough or

low enough to let you off the hook a bit.)

"Almost all programming can be seen as an exercise in caching."

-- Terje Mathisen

Phil

() ASCII ribbon campaign () Hopeless ribbon campaign

/\ against HTML mail /\ against gratuitous bloodshed

[stolen with permission from Daniel B. Cristofani]

__________________________________________________

Do You Yahoo!?

Tired of spam? Yahoo! Mail has the best spam protection around

http://mail.yahoo.com - --- Phil Carmody wrote:
>

When I first started coding prime-constellation-searching sieves

>

> Indeed. The real question is 'how long does a modular multiply

> take', as I indicated in my reply. And as Jack indicated in his,

> there's not necesarily a simple answer, depending on the

> algorithms used.

>

> "Almost all programming can be seen as an exercise in caching."

> -- Terje Mathisen

>

many years ago, I found that contrary to my intuition, I got faster

searching in many cases by keeping the sieve depth fairly shallow,

and using a small window (so I'd run the main sieve loop a lot, but

each time was very fast). I did a lot more primality tests, but by

keeping the core sieve data in the processor's first-level cache,

the sieve ran blazingly fast, more than enough to make up for the

extra primality tests.

I have no doubt that my optimized program executed more instructions,

more arithmetic operations, and more memory accesses, compared to

a deep sieve with a large window. But it was also faster, because

the whole entire program, instruction + data, was small enough to

fit in the processor's fast cache. - jbrennen wrote:
> When I first started coding prime-constellation-searching sieves

I often sieve prime constellations in 2 phases.

> many years ago, I found that contrary to my intuition, I got faster

> searching in many cases by keeping the sieve depth fairly shallow,

> and using a small window.

First sieve a small cache-friendly interval by small primes.

Sieve many small consecutive intervals like this next to eachother in memory.

Then treat the concatenated intervals as one large interval and

sieve it by larger primes.

I also have a 3rd phase: Individual trial factoring by the smallest

primes when so few candidates remain that sieving would waste

almost all the time on already eliminated candidates.

--

Jens Kruse Andersen - --- jbrennen <jb@...> wrote:
> --- Phil Carmody wrote:

If it's testimonial time, then I'd like to pull up an example from

> > "Almost all programming can be seen as an exercise in caching."

> > -- Terje Mathisen

>

> When I first started coding prime-constellation-searching sieves

> many years ago, I found that contrary to my intuition, I got faster

> searching in many cases by keeping the sieve depth fairly shallow,

> and using a small window (so I'd run the main sieve loop a lot, but

> each time was very fast). I did a lot more primality tests, but by

> keeping the core sieve data in the processor's first-level cache,

> the sieve ran blazingly fast, more than enough to make up for the

> extra primality tests.

mid July 2006.

I noticed that my primorial sieve could do some of my tests blazingly

quickly, and others fairly slowly. I noticed, for example, that

sieving n#+/-1 from n=500000-1000000 was slower than first sieving

n=500000-750000 and then sieving n=750000-1000000 to the same depth.

This is despite the fact that the split technique necessitates the

calculation of 500000# % p twice for every p.

Testing intermediate lengths, and smaller, and larger, proved to

me that the cache was a vital factor, so I made some minor tweaks

to improve L2 usage, and instantly got a 25% boost in the cases

where I was obviously being slowed before the tweaks. And yes, the

500000-1000000 range had also become quicker to do in one swell foop.

Hence my repeated quoting of Terje's .sig.

Phil

() ASCII ribbon campaign () Hopeless ribbon campaign

/\ against HTML mail /\ against gratuitous bloodshed

[stolen with permission from Daniel B. Cristofani]

__________________________________________________

Do You Yahoo!?

Tired of spam? Yahoo! Mail has the best spam protection around

http://mail.yahoo.com