- If you are programming a new sieve, I'd rather like it if you tried this so you could do something new.

Naive Eratosthenes:

runtime=NloglogN word-ops

memory=N bits

cache behavior=very bad

Mission: find the primes below N in increasing order;

assumed each prime fits in a single word.

Segmented Eratosthenes:

runtime=NloglogN word-ops

memory=N^(1/2) bits

cache behavior=bad

Warren Smith's New method:

runtime=f(K)*N

memory=(N^(1/K))*logN bits where K=2,3,4,5,... is constant.

cache behavior=best, can choose K to optimize it.

This is clearly superior asymptotically. Whether asymptopia ever comes, in practical

range of N, is not so clear... it might only become superior at unreachably large N.

The method involves

1. Pre-construct a "wheel" describing the numbers relatively prime to 2,3,5,...,Pj

where j maximal so that 2*3*5*...*Pj<N^(1/K). These are the "candidate primes"

and the set of them below N has cardinality of order N/loglogN if K is fixed.

2. Sieve out "good" multiples of the primes below N^(1/K).

What is left is either prime, or a product of j primes for some j<K,

and further the cardinality of this unsieved set is <=f(K)*N/logN.

Here "good" means that the multiplier is a candidate prime and greater

than the prime it multiplies, and we only sieve within the set of candidate primes, and we sieve within segments consisting of N^(1/k)*logN*loglogN numbers stored in 1 bit per candidate prime. There are order N^(1/k)*logN candidate primes in each segment.

3. Each unsieved number is tested for primality using the "$640 test"

by Baillie, Pomerance, Selfridge, and Wagstaff

http://www.trnicely.net/misc/bpsw.html

which is known never to fail if N<2^64. (We assume N is never going to be

large enough to make that test fail, which ought to be true for most or all practical

purposes.) Each such test runs in a constant number of modular exponentiations, and may be implemented in O(logN/loglogN) operations if Brouwer's exponentiation algorithm

is used.

Note, step 1 takes tiny time of order N^(1/k)*logs.

Step 3 takes f(K)*N/loglogN time, which is sublinear.

Step 2 is the bottleneck taking time of order N.

Because the memory use can be made as small as we want by making K large enough,

we can make everything fit into cache.

In practice even though step 2's runtime is "infinitely" greater than step 3's... since loglogN grows rather slowly to infinity, and since step 3 needs to use double precision modular multiplies, in practice step 3 might consume more time since it has a large

constant factor.

There then will be a tradeoff. Making K large is good since it cuts memory use and speeds up memory access by allowing everything to fit in cache. But making K small is good since

fewer calls to the BPSW primality test then are needed. (In fact if K=2 then we do

not need to use the BPSW prime tests at all, step 3 can be skipped.)

The best K must therefore be experimentally determined and will probably vary as

a function of both N and the computer. E.g. if you had special purpose hardware to do BPSW tests that'd help.

Warren D. Smith --- March 2013. - --- On Wed, 3/6/13, WarrenS <warren.wds@...> wrote:

> From: WarrenS <warren.wds@...>

Please forgive my curtness, but have you read all the material by, for example, Sorenson?

> Subject: [PrimeNumbers] Sieve superior to Eratosthenes

> To: primenumbers@yahoogroups.com

> Date: Wednesday, March 6, 2013, 11:42 PM

> If you are programming a new sieve,

> I'd rather like it if you tried this so you could do

> something new.

>

> Naive Eratosthenes:

> runtime=NloglogN word-ops

> memory=N bits

> cache behavior=very bad

> Mission: find the primes below N in increasing order;

> assumed each prime fits in a single word.

>

> Segmented Eratosthenes:

> runtime=NloglogN word-ops

> memory=N^(1/2) bits

> cache behavior=bad

>

> Warren Smith's New method:

> runtime=f(K)*N

> memory=(N^(1/K))*logN bits where K=2,3,4,5,... is

> constant.

> cache behavior=best, can choose K to optimize it.

>

> This is clearly superior asymptotically. Whether

> asymptopia ever comes, in practical

> range of N, is not so clear... it might only become superior

> at unreachably large N.

>

> The method involves

> 1. Pre-construct a "wheel" describing the numbers relatively

> prime to 2,3,5,...,Pj

> where j maximal so that

> 2*3*5*...*Pj<N^(1/K). These are the

> "candidate primes"

> and the set of them below N has cardinality of order

> N/loglogN if K is fixed.

>

> 2. Sieve out "good" multiples of the primes below N^(1/K).

> What is left is either prime, or a product of j primes for

> some j<K,

> and further the cardinality of this unsieved set is

> <=f(K)*N/logN.

> Here "good" means that the multiplier is a candidate prime

> and greater

> than the prime it multiplies, and we only sieve within the

> set of candidate primes, and we sieve within segments

> consisting of N^(1/k)*logN*loglogN numbers stored in 1 bit

> per candidate prime. There are order N^(1/k)*logN

> candidate primes in each segment.

>

> 3. Each unsieved number is tested for primality using the

> "$640 test"

> by Baillie, Pomerance, Selfridge, and Wagstaff

> http://www.trnicely.net/misc/bpsw.html

> which is known never to fail if N<2^64. (We assume

> N is never going to be

> large enough to make that test fail, which ought to be true

> for most or all practical

> purposes.) Each such test runs in a constant number of

> modular exponentiations, and may be implemented in

> O(logN/loglogN) operations if Brouwer's exponentiation

> algorithm

> is used.

>

> Note, step 1 takes tiny time of order N^(1/k)*logs.

> Step 3 takes f(K)*N/loglogN time, which is sublinear.

> Step 2 is the bottleneck taking time of order N.

> Because the memory use can be made as small as we want by

> making K large enough,

> we can make everything fit into cache.

>

> In practice even though step 2's runtime is "infinitely"

> greater than step 3's... since loglogN grows rather

> slowly to infinity, and since step 3 needs to use double

> precision modular multiplies, in practice step 3 might

> consume more time since it has a large

> constant factor.

>

> There then will be a tradeoff. Making K large is good

> since it cuts memory use and speeds up memory access by

> allowing everything to fit in cache. But

> making K small is good since

> fewer calls to the BPSW primality test then are

> needed. (In fact if K=2 then we do

> not need to use the BPSW prime tests at all, step 3 can be

> skipped.)

> The best K must therefore be experimentally determined and

> will probably vary as

> a function of both N and the computer. E.g. if you had

> special purpose hardware to do BPSW tests that'd help.

--

() ASCII ribbon campaign () Hopeless ribbon campaign

/\ against HTML mail /\ against gratuitous bloodshed

[stolen with permission from Daniel B. Cristofani] - My description included the wrong claim that step 3 would take asymptotically negligible time compared to (the big kahuna) step 2. My justification for that was

"Brouwer exponentiation algorithm." By which I actually meant, "A.T.Brauer's addition chain algorithm" which actually shows how to compute X^N in lg(N)+O(lg(N)/loglog(N))

multiplications, where lg(N)=log(N)/log(2).

Anyhow, when you fix this claim, the repaired result is that step 3 takes the same amount of time as step 2 (up to a constant factor) and hence probably step 3 is the biggest kahuna.

But in any event, I am still correct far as I can see, that the whole algorithm does improve

vs Eratosthenes in that it runs in O(N) operations (i.e. faster by loglogN factor) and it uses way less memory and hence can stay in cache. How far one needs to go before these overcome

Eratosthenes in practice, is an experimental question. I personally suspect it will be the best way to reach new sieving records BUT only with the use of special purpose hardware.

I'm writing this since Firth was posting here saying he was interested in writing new prime sieves. He can contact me at warren.wds AT gmail.com. > Please forgive my curtness, but have you read all the material by, for example, Sorenson?

--no. However, prodded by you, I just looked at

> --Phil Carmody

"The pseudosquares prime sieve" by Jonathan P. Sorensen 2006 and some other stuff

he cites.

The particular algorithm advanced by JPS in this paper looks like a clearly bad idea.

His review of other work, contains these claims:

JPS CLAIM 1. the fastest known sieves take O(N/loglogN) time.

This is even faster than mine, which in turn is faster than Eratosthenes.

To back this up he cites: P.Pritchard CACM 24,1 (1981) 18-23 & 772

and B.Dunten, J.Jones, JP Sorensen: Info Proc Lett 59 (1996) 79-84.

I have not seen Pritchard but did see D+J+S 1996 and the latter is completely useless

because it requires huge memory.

JPS CLAIM 2. Memory in some (other) sieves can be reduced to O(N^(1/3)) or even O(N^(1/4)), perhaps with aid of conjectures. I'm doing better than that, I can reduce to O(N^(1/K)) for any fixed integer K.

The idea behind my sieve is extremely simple. Simply sieve enough to get a set of primes

and nonprimes that is small (no more than a constant times the primes alone) and then test these using the $640 primality test. (I also use a "wheel" idea in 2 ways to save loglogN time factors during the sieving to prevent the time use from being superlinear.)

"Pritchard's Wheel Sieve" as described here

http://programmingpraxis.com/2012/01/06/pritchards-wheel-sieve/

and this description is pretty much exactly the same as "step 2" in my algorithm, and takes order N time.

This contradicts Sorensen's claim that Pritchard achieves O(N/loglogN) time.

So there must have been another idea too. I can see one way to accomplish sublinear time

(perhaps the idea Pritchard had?) which is (basically) that you only sieve out multiples M*P of primes P where M is still possibly-prime according to your current partially-sieved table. This would be a pyrrhic victory since you'd need immense memory consumption to do it.

QUESTION:

Is any method known that achieves sublinear time and way-sublinear memory simultaneously? I know of none.

The best I know is way-sublinear memory and linear time (my method) but I only achieve that by using the $640 primality test which probably ultimately will fail, hence my algorithm actually is presumably wrong. This in practice seems irrelevant since the first failure is way larger than anybody will ever sieve.

[Incidentally in my algorithm step 2, one could omit the wheel and just use Eratosthenes, thus suffering a loglogN factor time-use expansion in theory, but in practice it might be faster for a long way. There's a lot of games like that one could play to speed up

programs non-asymptotically.]