Sorry, an error occurred while loading the content.

## Sieve superior to Eratosthenes

Expand Messages
• If you are programming a new sieve, I d rather like it if you tried this so you could do something new. Naive Eratosthenes: runtime=NloglogN word-ops memory=N
Message 1 of 4 , Mar 6, 2013
If you are programming a new sieve, I'd rather like it if you tried this so you could do something new.

Naive Eratosthenes:
runtime=NloglogN word-ops
memory=N bits
Mission: find the primes below N in increasing order;
assumed each prime fits in a single word.

Segmented Eratosthenes:
runtime=NloglogN word-ops
memory=N^(1/2) bits

Warren Smith's New method:
runtime=f(K)*N
memory=(N^(1/K))*logN bits where K=2,3,4,5,... is constant.
cache behavior=best, can choose K to optimize it.

This is clearly superior asymptotically. Whether asymptopia ever comes, in practical
range of N, is not so clear... it might only become superior at unreachably large N.

The method involves
1. Pre-construct a "wheel" describing the numbers relatively prime to 2,3,5,...,Pj
where j maximal so that 2*3*5*...*Pj<N^(1/K). These are the "candidate primes"
and the set of them below N has cardinality of order N/loglogN if K is fixed.

2. Sieve out "good" multiples of the primes below N^(1/K).
What is left is either prime, or a product of j primes for some j<K,
and further the cardinality of this unsieved set is <=f(K)*N/logN.
Here "good" means that the multiplier is a candidate prime and greater
than the prime it multiplies, and we only sieve within the set of candidate primes, and we sieve within segments consisting of N^(1/k)*logN*loglogN numbers stored in 1 bit per candidate prime. There are order N^(1/k)*logN candidate primes in each segment.

3. Each unsieved number is tested for primality using the "\$640 test"
by Baillie, Pomerance, Selfridge, and Wagstaff
http://www.trnicely.net/misc/bpsw.html
which is known never to fail if N<2^64. (We assume N is never going to be
large enough to make that test fail, which ought to be true for most or all practical
purposes.) Each such test runs in a constant number of modular exponentiations, and may be implemented in O(logN/loglogN) operations if Brouwer's exponentiation algorithm
is used.

Note, step 1 takes tiny time of order N^(1/k)*logs.
Step 3 takes f(K)*N/loglogN time, which is sublinear.
Step 2 is the bottleneck taking time of order N.
Because the memory use can be made as small as we want by making K large enough,
we can make everything fit into cache.

In practice even though step 2's runtime is "infinitely" greater than step 3's... since loglogN grows rather slowly to infinity, and since step 3 needs to use double precision modular multiplies, in practice step 3 might consume more time since it has a large
constant factor.

There then will be a tradeoff. Making K large is good since it cuts memory use and speeds up memory access by allowing everything to fit in cache. But making K small is good since
fewer calls to the BPSW primality test then are needed. (In fact if K=2 then we do
not need to use the BPSW prime tests at all, step 3 can be skipped.)
The best K must therefore be experimentally determined and will probably vary as
a function of both N and the computer. E.g. if you had special purpose hardware to do BPSW tests that'd help.

Warren D. Smith --- March 2013.
• ... Please forgive my curtness, but have you read all the material by, for example, Sorenson? -- () ASCII ribbon campaign () Hopeless ribbon campaign
Message 2 of 4 , Mar 7, 2013
--- On Wed, 3/6/13, WarrenS <warren.wds@...> wrote:

> From: WarrenS <warren.wds@...>
> Subject: [PrimeNumbers] Sieve superior to Eratosthenes
> Date: Wednesday, March 6, 2013, 11:42 PM
> If you are programming a new sieve,
> I'd rather like it if you tried this so you could do
> something new.
>
> Naive Eratosthenes:
> runtime=NloglogN word-ops
> memory=N bits
> cache behavior=very bad
> Mission: find the primes below N in increasing order;
> assumed each prime fits in a single word.
>
> Segmented Eratosthenes:
> runtime=NloglogN word-ops
> memory=N^(1/2) bits
>
> Warren Smith's New method:
> runtime=f(K)*N
> memory=(N^(1/K))*logN bits where K=2,3,4,5,...  is
> constant.
> cache behavior=best, can choose K to optimize it.
>
> This is clearly superior asymptotically.  Whether
> asymptopia ever comes, in practical
> range of N, is not so clear... it might only become superior
> at unreachably large N.
>
> The method involves
> 1. Pre-construct a "wheel" describing the numbers relatively
> prime to 2,3,5,...,Pj
> where j maximal so that
> 2*3*5*...*Pj<N^(1/K).   These are the
> "candidate primes"
> and the set of them below N has cardinality of order
> N/loglogN if K is fixed.
>
> 2. Sieve out "good" multiples of the primes below N^(1/K).
> What is left is either prime, or a product of j primes for
> some j<K,
> and further the cardinality of this unsieved set is
> <=f(K)*N/logN.
> Here "good" means that the multiplier is a candidate prime
> and greater
> than the prime it multiplies, and we only sieve within the
> set of candidate primes, and we sieve within segments
> consisting of N^(1/k)*logN*loglogN numbers stored in 1 bit
> per candidate prime.  There are order N^(1/k)*logN
> candidate primes in each segment.
>
> 3. Each unsieved number is tested for primality using the
> "\$640 test"
> by Baillie, Pomerance, Selfridge, and Wagstaff
>     http://www.trnicely.net/misc/bpsw.html
> which is known never to fail if N<2^64.  (We assume
> N is never going to be
> large enough to make that test fail, which ought to be true
> for most or all practical
> purposes.)  Each such test runs in a constant number of
> modular exponentiations, and may be implemented in
> O(logN/loglogN) operations if Brouwer's exponentiation
> algorithm
> is used.
>
> Note, step 1 takes tiny time of order N^(1/k)*logs.
> Step 3 takes f(K)*N/loglogN time, which is sublinear.
> Step 2 is the bottleneck taking time of order N.
> Because the memory use can be made as small as we want by
> making K large enough,
> we can make everything fit into cache.
>
> In practice even though step 2's runtime is "infinitely"
> greater than step 3's...  since loglogN grows rather
> slowly to infinity, and since step 3 needs to use double
> precision modular multiplies, in practice step 3 might
> consume more time since it has a large
> constant factor.
>
> There then will be a tradeoff.  Making K large is good
> since it cuts memory use and speeds up memory access by
> allowing everything to fit in cache.   But
> making K small is good since
> fewer calls to the BPSW primality test then are
> needed.  (In fact if K=2 then we do
> not need to use the BPSW prime tests at all, step 3 can be
> skipped.)
> The best K must therefore be experimentally determined and
> will probably vary as
> a function of both N and the computer.  E.g. if you had
> special purpose hardware to do BPSW tests that'd help.

Please forgive my curtness, but have you read all the material by, for example, Sorenson?

--
() ASCII ribbon campaign () Hopeless ribbon campaign
/\ against HTML mail /\ against gratuitous bloodshed

[stolen with permission from Daniel B. Cristofani]
• My description included the wrong claim that step 3 would take asymptotically negligible time compared to (the big kahuna) step 2. My justification for that
Message 3 of 4 , Mar 7, 2013
My description included the wrong claim that step 3 would take asymptotically negligible time compared to (the big kahuna) step 2. My justification for that was
"Brouwer exponentiation algorithm." By which I actually meant, "A.T.Brauer's addition chain algorithm" which actually shows how to compute X^N in lg(N)+O(lg(N)/loglog(N))
multiplications, where lg(N)=log(N)/log(2).

Anyhow, when you fix this claim, the repaired result is that step 3 takes the same amount of time as step 2 (up to a constant factor) and hence probably step 3 is the biggest kahuna.

But in any event, I am still correct far as I can see, that the whole algorithm does improve
vs Eratosthenes in that it runs in O(N) operations (i.e. faster by loglogN factor) and it uses way less memory and hence can stay in cache. How far one needs to go before these overcome
Eratosthenes in practice, is an experimental question. I personally suspect it will be the best way to reach new sieving records BUT only with the use of special purpose hardware.

I'm writing this since Firth was posting here saying he was interested in writing new prime sieves. He can contact me at warren.wds AT gmail.com.
• ... --no. However, prodded by you, I just looked at The pseudosquares prime sieve by Jonathan P. Sorensen 2006 and some other stuff he cites. The particular
Message 4 of 4 , Mar 7, 2013
> Please forgive my curtness, but have you read all the material by, for example, Sorenson?
> --Phil Carmody

--no. However, prodded by you, I just looked at
"The pseudosquares prime sieve" by Jonathan P. Sorensen 2006 and some other stuff
he cites.

The particular algorithm advanced by JPS in this paper looks like a clearly bad idea.
His review of other work, contains these claims:

JPS CLAIM 1. the fastest known sieves take O(N/loglogN) time.
This is even faster than mine, which in turn is faster than Eratosthenes.
To back this up he cites: P.Pritchard CACM 24,1 (1981) 18-23 & 772
and B.Dunten, J.Jones, JP Sorensen: Info Proc Lett 59 (1996) 79-84.
I have not seen Pritchard but did see D+J+S 1996 and the latter is completely useless
because it requires huge memory.

JPS CLAIM 2. Memory in some (other) sieves can be reduced to O(N^(1/3)) or even O(N^(1/4)), perhaps with aid of conjectures. I'm doing better than that, I can reduce to O(N^(1/K)) for any fixed integer K.

The idea behind my sieve is extremely simple. Simply sieve enough to get a set of primes
and nonprimes that is small (no more than a constant times the primes alone) and then test these using the \$640 primality test. (I also use a "wheel" idea in 2 ways to save loglogN time factors during the sieving to prevent the time use from being superlinear.)

"Pritchard's Wheel Sieve" as described here
http://programmingpraxis.com/2012/01/06/pritchards-wheel-sieve/
and this description is pretty much exactly the same as "step 2" in my algorithm, and takes order N time.
This contradicts Sorensen's claim that Pritchard achieves O(N/loglogN) time.
So there must have been another idea too. I can see one way to accomplish sublinear time
(perhaps the idea Pritchard had?) which is (basically) that you only sieve out multiples M*P of primes P where M is still possibly-prime according to your current partially-sieved table. This would be a pyrrhic victory since you'd need immense memory consumption to do it.

QUESTION:
Is any method known that achieves sublinear time and way-sublinear memory simultaneously? I know of none.

The best I know is way-sublinear memory and linear time (my method) but I only achieve that by using the \$640 primality test which probably ultimately will fail, hence my algorithm actually is presumably wrong. This in practice seems irrelevant since the first failure is way larger than anybody will ever sieve.

[Incidentally in my algorithm step 2, one could omit the wheel and just use Eratosthenes, thus suffering a loglogN factor time-use expansion in theory, but in practice it might be faster for a long way. There's a lot of games like that one could play to speed up
programs non-asymptotically.]
Your message has been successfully submitted and would be delivered to recipients shortly.