Expand Messages
• ...
Message 1 of 23 , Dec 27, 2002
--- Peter Boos <peter_boos@...> wrote:
>
> This might be an easy question, I think.
> Ehm I am thinking of writing a program, what should do
> sometihng with prime well the point is I wonder what
> is the optimum way of storing primes.
> I mean placing them in a file will require much space
> on HD. So I wondered then perhaps creating a single
> binary string of data in which a 1 represents a prime
> an a zero represent a non prime. wel that would allow
> me to place more primes in a file (although I am not
> sure as primes will ocure less with higher numbers.
> This binary prime file i will place it in a compressed
> folder. (a zip folder or an ntfs comressed folder)*.
>
> Wel my question is, what would be an better method for
> storing primes, then this binary way ?.

<!--
Liguististic aside, compare the etymology of
THAN (OE. _�anne_, _�onne_, _��nne_, also _�an_, _�on_; with
THEN (OE. _�anne_, _�onne_, _��nne_).
They used to be the same word, and may well return to that state!
-->

On the whole, no.

Some have looked into storing gaps between primes, but it's trivial to get a
bitmap representaton to out-perform (size-wise) such delta-coding, as long
as you permit yourself some special cases.

If you permit yourself to just 'know' that 2 is prime, then the bitmap
doesn't need to store all the unnecessary even numbers, which you know will
be all composite. => half size
If you permit yourself to also just 'know' that 3 is a prime, then the bitmap
no longer needs to contain numbers of the form 6n+3, as you know they're
divisible by 3. => 2/3rds of half size = 1/3.
If you also permit yourself to just 'know' that 5 is a prime, then the
bitmap no longer needs to contain numbers of the form 30n+5 and 30n+25, as
you know they're divisible by 5. => 4/5ths of 1/3 size = 4/15

Stopping here lets you store a bitmap range corresponding to 30 numbers in 8
bits, i.e. a byte.
Byte 0 contains bits corresponding to { 1, 7, 11, 13, 17, 19, 23, 29 }
Byte 1 contains bits corresponding to { 31, 37, 41, 43, 47, 49, 53, 59 }
...
Byte n contains bits corresponding to { 30n+1, 30n+7, ........ 30n+29 }

This is called using a "wheel". The wheel in this case has circumference 30

0------------------------------------->- Number line

29 1
######
##########
############ _
23 ############## 7 |\ roll
############## | this
############## / way
############
19 ########## 11
######
17 13

It's possible to use larger wheels,
2 -> 1 from 2
2.3 -> 2 from 6
2.3.5 -> 8 from 30
2.3.5.7 -> 48 from 210
2.3.5.7.11 -> 480 from 2310

Note, however, that it rarely makes sense to store tables of primes on
the hard disk, as in most computer languages it's almost always always
just as fast to calculate them on the fly. Hard disks are slow compared
to CPUs, on the whole.

> * As far as i now compression acka information
> theorie this might be verry intresting question.
> Because it's how to store information of x primes with
> the less amount of storing data, so it would contain
> the essentials of primes... or something like that.

Information-theoretically, all the primes can be stored in only a couple of
hundred bits, assuming that the primitives like division are atomic.
If you can describe the primes in a line of text, then that proves that the
information content is < 80*7 bits.
"Natural numbers with no proper divisors greater than 1" = 385 bits
"{p in|N:!Eq in|N,1<q<p:q|p}" = 196 bits

However, I've assumed that "|N" is known to mean "the natural numbers", and
":" means "such that", "!E" means "there does not exist" etc.

> Ohyeah
>
> A happy and peacefull newyear to all of you ! ! !

Thank you; and to you.

Phil

=====
I Pledge Allegiance to the flag
That appears on my Desktop Startup Screen.
And to the Monopoly for which it Stands:
One Operating System over all, inescapable,
With Freedom and Privacy for none. -- Telecommando on /.

__________________________________________________
Do you Yahoo!?
http://mailplus.yahoo.com
• ... It depends on how many you want to store. If you consider the odd numbers only, storing a 1 for a prime and a 0 for a composite is called a bit vector ,
Message 2 of 23 , Dec 27, 2002
At 12:37 AM 12/27/2002 -0800, Peter Boos wrote:

>Wel my question is, what would be an better method for
>storing primes, then this binary way ?.

It depends on how many you want to store. If you consider the odd numbers
only, storing a 1 for a prime and a 0 for a composite is called a "bit
vector", but this is not always the best way. And which method is most
efficient depends on what your upper limit is. But for a practical
example, consider primes < 2^32.

Storage method File size, rounded (bytes)

32-bit unsigned 813,000,000
Bit-vector 268,000,000
Semi-gaps 203,000,000

Semi-gaps is where you store the difference between successive primes
divided by 2. For primes < 2^32, you can store the semigaps in a byte.

Each method has good and bad properties. With the bit vector, you can
easily access it anywhere to see if a particular number is prime. With
the gaps, you have to process the whole file. etc.

+---------------------------------------------------------+
| Jud McCranie |
| |
| Programming Achieved with Structure, Clarity, And Logic |
+---------------------------------------------------------+

[Non-text portions of this message have been removed]
• ... I think *eventually* the gap method takes over because the primes get more and more sparse. ... It depends. On my system I can read in a bit vector file
Message 3 of 23 , Dec 27, 2002
At 06:26 AM 12/27/2002 -0800, Phil Carmody wrote:

>Some have looked into storing gaps between primes, but it's trivial to get a
>bitmap representaton to out-perform (size-wise) such delta-coding, as long
>as you permit yourself some special cases.

I think *eventually* the gap method takes over because the primes get more
and more sparse.

>Note, however, that it rarely makes sense to store tables of primes on
>the hard disk, as in most computer languages it's almost always always
>just as fast to calculate them on the fly. Hard disks are slow compared
>to CPUs, on the whole.

It depends. On my system I can read in a bit vector file (odd numbers
only) for up to 3^32 quicker than I can generate them in a sieve, including
the time to unpack the bits into actual numbers. (The unpacking takes

+---------------------------------------------------------+
| Jud McCranie |
| |
| Programming Achieved with Structure, Clarity, And Logic |
+---------------------------------------------------------+

[Non-text portions of this message have been removed]
• ... Eventually, yes, I m sure that s the case. Most people would code gaps with a flat distribution, but that can be improved upon. 1) Firstly you could use a
Message 4 of 23 , Dec 27, 2002
--- Jud McCranie <judmccr@...> wrote:
> At 06:26 AM 12/27/2002 -0800, Phil Carmody wrote:
>
> >Some have looked into storing gaps between primes, but it's trivial to get a
> >bitmap representaton to out-perform (size-wise) such delta-coding, as long
> >as you permit yourself some special cases.
>
> I think *eventually* the gap method takes over because the primes get more
> and more sparse.

Eventually, yes, I'm sure that's the case.

Most people would code gaps with a flat distribution, but that can be
improved upon.
1) Firstly you could use a static huffman code. I looked at that, once,
maybe I posted my figures here, it gives a worthwhile compression,
assuming you don't mind paying for the bit-twiddling overhead.
You could use a differentmodel for different bands, for slightly
improved compression.

2) You use some kind of universal code with the right view of the
entropy of the sequence. I've not worked out which universal code
that would be. Maybe a Golomb code with a logarithmically increasing
parameter (in steps/bands, or maybe continuously changing). Maybe a
start-stop code would be better.

What distribution do gaps have? Knowing that would be the best way of
answering the question of which universal code is best... Something's
saying that gaps in a poisson process are exponential, and therefore
the discrete analogue (woh! you don't find discrete analogues that
often!) would be geometric, and Golomb is a perfect match for geometric
distribution.

There are a couple of tricks that can make the gap method compress even
better. No technique that doesn't take into account knowledge about
primality will ever be better than one based on the same compression
principles but also taking into account knowledge about primes.

> >Note, however, that it rarely makes sense to store tables of primes on
> >the hard disk, as in most computer languages it's almost always always
> >just as fast to calculate them on the fly. Hard disks are slow compared
> >to CPUs, on the whole.
>
> It depends. On my system I can read in a bit vector file (odd numbers
> only) for up to 3^32 quicker than I can generate them in a sieve, including
> the time to unpack the bits into actual numbers. (The unpacking takes

You have faster hard disks than me, and I think I have a faster sieve than
you. A couple of factors of 50% can make a huge difference to the outcome of
such comparisons.

A typical SoE will be superlinear, and therefore eventually precaclulation
will win, as restoration from precalculation is linear.

It's a shame that the scale constant in Pritchard+Wheel is high, otherwise a
the sub-linear sieve might have some chance of winning. With the right kind
of memory/DMA/blitting architecture, a Pritchard can be done outragously
quickly. You'd probably want bit-addressed memory though, so we're not talking
a typical CPU here by any means.

Phil

=====
I Pledge Allegiance to the flag
That appears on my Desktop Startup Screen.
And to the Monopoly for which it Stands:
One Operating System over all, inescapable,
With Freedom and Privacy for none. -- Telecommando on /.

__________________________________________________
Do you Yahoo!?
http://mailplus.yahoo.com
• ... 268MB?!?!?!?! Unfair! you re reserving space for 3, 9, 15, 21, 27,.... 2^32 bits is 4Gb = 536870912 bytes 2-wheel 268435456 bytes 6-wheel
Message 5 of 23 , Dec 27, 2002
--- Jud McCranie <judmccr@...> wrote:
> At 12:37 AM 12/27/2002 -0800, Peter Boos wrote:
>
> >Wel my question is, what would be an better method for
> >storing primes, then this binary way ?.
>
> It depends on how many you want to store. If you consider the odd numbers
> only, storing a 1 for a prime and a 0 for a composite is called a "bit
> vector", but this is not always the best way. And which method is most
> efficient depends on what your upper limit is. But for a practical
> example, consider primes < 2^32.
>
> Storage method File size, rounded (bytes)
>
> 32-bit unsigned 813,000,000
> Bit-vector 268,000,000
> Semi-gaps 203,000,000

268MB?!?!?!?! Unfair! you're reserving space for 3, 9, 15, 21, 27,....

2^32 bits is 4Gb = 536870912 bytes
2-wheel 268435456 bytes
6-wheel 178956970 bytes
30-wheel 143165576 bytes <- commonly used
210-wheel 130150524 bytes <- I've toyed with this

The bitmap based on a wheel mod 30 is 30% smaller than the
otherwise-uncompressed semigaps.

> Semi-gaps is where you store the difference between successive primes
> divided by 2. For primes < 2^32, you can store the semigaps in a byte.

Yup, a byte can cope with semigaps for primes up to 300G.

Other length encodings can be useful over other ranges, depending on how
much extra bit-twiddling you want to do

length range range+ range++ size
8 bits 300G 40T 1600T 203MB
7 bits 400M 20G 240G 178MB
6 bits 1M 20M 190M x

I seem to have lost my data on how well a static huffman would do, if you
cared more about size than compression/decompression speed. With this kind
of data set there's practically no difference between Huffman and Golomb,
I'd guess, except that with Golomb you don't need to remember individual
frequencies or codes, just one parameter works fine.

> Each method has good and bad properties. With the bit vector, you can
> easily access it anywhere to see if a particular number is prime. With
> the gaps, you have to process the whole file. etc.

Or do it in bands, so you only have to decompress the band where your
start-point is.

Peeling out all primes with a modular property related to your wheel size is
vastly easier in bitmap form. However, that's just repeated abuse of the
fact that there's random access.

Phil

=====
I Pledge Allegiance to the flag
That appears on my Desktop Startup Screen.
And to the Monopoly for which it Stands:
One Operating System over all, inescapable,
With Freedom and Privacy for none. -- Telecommando on /.

__________________________________________________
Do you Yahoo!?
http://mailplus.yahoo.com
• ... These are ones I actually use. I know that it isn t hard to make it twice as small, but that adds complication in the program. Anyhow, my point was that
Message 6 of 23 , Dec 27, 2002
At 12:10 PM 12/27/2002 -0800, Phil Carmody wrote:

>268MB?!?!?!?! Unfair! you're reserving space for 3, 9, 15, 21, 27,....

These are ones I actually use. I know that it isn't hard to make it twice
as small, but that adds complication in the program. Anyhow, my point was
that gaps will eventually beat a bit vector for any fixed set of moduli.

>Other length encodings can be useful over other ranges, depending on how
>much extra bit-twiddling you want to do
>
>length range range+ range++ size
>8 bits 300G 40T 1600T 203MB
>7 bits 400M 20G 240G 178MB
>6 bits 1M 20M 190M x

What do you mean by range+ and range++, using an extra 6/7/8 bits where needed?

>Or do it in bands, so you only have to decompress the band where your
>start-point is.

Well, I try to keep it simple, to reduce the chance of me making an
error. Like the saying "do you want it done fast or do you want it done
right?" And unless you're doing something pretty simple with the primes,
getting the primes is the smaller part of it. Most of the time I read the
primes as 32-bit unsigned integers from a disk file because (1) that is
fast enough, (2) it is less prone to error. I know the disk file is right,
and it is simpler to do that than to plug the processing into my sieve
program (less chance of error). And I don't have to worry about pulling
the wrong version of my sieve, one that might have an error.

+---------------------------------------------------------+
| Jud McCranie |
| |
| Programming Achieved with Structure, Clarity, And Logic |
+---------------------------------------------------------+

[Non-text portions of this message have been removed]
• ... Consider getting the primes less than some limit. Reading a bit vector is linear in time and disk space. A sieve is super-linear and I m pretty sure
Message 7 of 23 , Dec 27, 2002
At 10:55 AM 12/27/2002 -0800, Phil Carmody wrote:

> I think *eventually* the gap method takes over because the primes get more
> > and more sparse.
>
>Eventually, yes, I'm sure that's the case.

Consider getting the primes less than some limit. Reading a bit vector is
linear in time and disk space. A sieve is super-linear and I'm pretty sure
reading a file of gaps is sub-linear.

>What distribution do gaps have?

I don't know, but undoubtedly someone has worked on that. A gap of < 512
(so it can be stored in a byte) works into the hundreds of billions. IIRC.

>You have faster hard disks than me, and I think I have a faster sieve than
>you. A couple of factors of 50% can make a huge difference to the outcome of
>such comparisons.

you're probably right. I could probably speed up my sieve, but when I'm
using it to do something with the primes, generally that takes longer than
the sieve itself. Since the sieve isn't the bottleneck, speeding it up
won't help the overall time so much.

I have two HDs, the faster one reads about 51,000,000 bytes per
second. Subsequent reads of the same file are about 3.5 times faster than
that.

+---------------------------------------------------------+
| Jud McCranie |
| |
| Programming Achieved with Structure, Clarity, And Logic |
+---------------------------------------------------------+

[Non-text portions of this message have been removed]
• ... Some some as yet unknown value of eventually. Where do you think that my 48/210 bitmap will be beaten by semi-gaps, size-wise? ... range is the prime
Message 8 of 23 , Dec 27, 2002
--- Jud McCranie <judmccr@...> wrote:
> At 12:10 PM 12/27/2002 -0800, Phil Carmody wrote:
>
> >268MB?!?!?!?! Unfair! you're reserving space for 3, 9, 15, 21, 27,....
>
> These are ones I actually use. I know that it isn't hard to make it twice
> as small, but that adds complication in the program. Anyhow, my point was
> that gaps will eventually beat a bit vector for any fixed set of moduli.

Some some as yet unknown value of eventually.
Where do you think that my 48/210 bitmap will be beaten by semi-gaps,
size-wise?

> >Other length encodings can be useful over other ranges, depending on how
> >much extra bit-twiddling you want to do
> >
> >length range range+ range++ size
> >8 bits 300G 40T 1600T 203MB
> >7 bits 400M 20G 240G 178MB
> >6 bits 1M 20M 190M x
>
> What do you mean by range+ and range++, using an extra 6/7/8 bits where needed?

"range" is the prime range that can be reached using semigaps encoded as X-bits
"range+" is the prime range that can be reached using a very small extra bit
twiddle, but still based on semi-gaps. Consider it Markov modelled. Each
codeword is still X-bits.
"range++" is the prime range that can be reached us you're prepared to pull
out a bigger Markov model, probably a table-driven scheme. Still X-bits
codewords.

The 8-bit, 7-bit and 6-bit rows were for the codeword length. You're
using 8-bits. /If/ size is an issue more than speed, then using 7-bit
or 6-bit codewords might make more sense, and the prime range is compatible
with the codeword length.

The 'x' in the 6-bit column is slightly misleading as it is possible to code
primes up to 4G using 6-bit gaps, as long as you reserve an escape code to
indicate that a gap is over-sized. However, that leads you towards schemes
like start-stop codes and eventually to Golomb codes.
An m_i=3 SSC might be interesting to try, as its decoding speed is pretty
good due to the simplicity of using nybbles.

> >Or do it in bands, so you only have to decompress the band where your
> >start-point is.
>
> Well, I try to keep it simple,

That's no fun! Recursive Elias codes are fun. Ulam s-additive codes are even
more fun, OEIS A002858. I bet you didn't realise you could efficiently
(asymptotically) encode prime number gaps using it when you added your
comment to that sequence! Not that I'd ever chose to use them for that
purpose myself.

> to reduce the chance of me making an
> error. Like the saying "do you want it done fast or do you want it done
> right?" And unless you're doing something pretty simple with the primes,
> getting the primes is the smaller part of it. Most of the time I read the
> primes as 32-bit unsigned integers from a disk file because (1) that is
> fast enough, (2) it is less prone to error. I know the disk file is right,
> and it is simpler to do that than to plug the processing into my sieve
> program (less chance of error). And I don't have to worry about pulling
> the wrong version of my sieve, one that might have an error.

Don't you like surprises?
;-)

Phil

=====
I Pledge Allegiance to the flag
That appears on my Desktop Startup Screen.
And to the Monopoly for which it Stands:
One Operating System over all, inescapable,
With Freedom and Privacy for none. -- Telecommando on /.

__________________________________________________
Do you Yahoo!?
http://mailplus.yahoo.com
• You know newpgen would work a LOT better if it used some of these idea s. For k*2^n+-1 searches if you knew which k was divisible by 3 you could hold 33%
Message 9 of 23 , Dec 27, 2002
You know newpgen would work a LOT better if it used some of these
idea's. For k*2^n+-1 searches if you knew which k was divisible by
3 you could hold 33% more values in the bitmap toss in 5,7,11 and you
could hold 2.40625 (480/1155) times more k values, when searching for
really sparce forms like CC3 etc this could shave weeks off sieving
times.

Markus

>
> It's possible to use larger wheels,
> 2 -> 1 from 2
> 2.3 -> 2 from 6
> 2.3.5 -> 8 from 30
> 2.3.5.7 -> 48 from 210
> 2.3.5.7.11 -> 480 from 2310
>
>
> Note, however, that it rarely makes sense to store tables of primes
on
> the hard disk, as in most computer languages it's almost always
always
> just as fast to calculate them on the fly. Hard disks are slow
compared
> to CPUs, on the whole.
>
> > * As far as i now compression acka information
> > theorie this might be verry intresting question.
> > Because it's how to store information of x primes with
> > the less amount of storing data, so it would contain
> > the essentials of primes... or something like that.
>
> Information-theoretically, all the primes can be stored in only a
couple of
> hundred bits, assuming that the primitives like division are atomic.
> If you can describe the primes in a line of text, then that proves
that the
> information content is < 80*7 bits.
> "Natural numbers with no proper divisors greater than 1" = 385 bits
> "{p in|N:!Eq in|N,1<q<p:q|p}" = 196 bits
>
> However, I've assumed that "|N" is known to mean "the natural
numbers", and
> ":" means "such that", "!E" means "there does not exist" etc.
>
> > Ohyeah
> >
> > A happy and peacefull newyear to all of you ! ! !
>
> Thank you; and to you.
>
> Phil
>
>
> =====
> I Pledge Allegiance to the flag
> That appears on my Desktop Startup Screen.
> And to the Monopoly for which it Stands:
> One Operating System over all, inescapable,
> With Freedom and Privacy for none. -- Telecommando on /.
>
• ... If you assume 1 byte per gap, about 4.5 x 10^15. ... No, I didn t. ... It depends, of course. I don t like being surprised by finding out that an error in
Message 10 of 23 , Dec 27, 2002
At 02:38 PM 12/27/2002 -0800, Phil Carmody wrote:

> Some some as yet unknown value of eventually.
>Where do you think that my 48/210 bitmap will be beaten by semi-gaps,
>size-wise?

If you assume 1 byte per gap, about 4.5 x 10^15.

>That's no fun! Recursive Elias codes are fun. Ulam s-additive codes are even
>more fun, OEIS A002858. I bet you didn't realise you could efficiently
>(asymptotically) encode prime number gaps using it when you added your
>comment to that sequence!

No, I didn't.

>Don't you like surprises?
>;-)

It depends, of course. I don't like being surprised by finding out that an
error in my program caused it to produce incorrect results, whish happens
all to often. :-( ;-)

+---------------------------------------------------------+
| Jud McCranie |
| |
| Programming Achieved with Structure, Clarity, And Logic |
+---------------------------------------------------------+

[Non-text portions of this message have been removed]
• ... I typod (did 10/11 rather than 6/7), the 210-wheel is 122,713,352 bytes The 2310-wheel (which is the first one I d actually treat as a wheel, and thing
Message 11 of 23 , Dec 27, 2002
--- "jim_fougeron <jfoug@...>" <jfoug@...> wrote:
> >> example, consider primes < 2^32.
> >>
> >> Storage method File size, rounded (bytes)
> >>
> >> 32-bit unsigned 813,000,000
> >> Bit-vector 268,000,000
> >> Semi-gaps 203,000,000
> >
> > 268MB?!?!?!?! Unfair! you're reserving space for 3, 9, 15, 21,
> 27,....
>
> True Phil, but wasting 8 bits on each gap is also unfair. I have
> done a compression in CPAPSieve using 5 bits per gap (and escape
> sequences for gaps that are too large for 5 bits). The above
> 203mb should also be stated as 126 MB. More efficiency can be
> obtained than my algorithm can give, but my algorithm was very
> very fast (and btw, it is better compression than your mod-210
> wheel method)

I typod (did 10/11 rather than 6/7), the 210-wheel is 122,713,352 bytes
The 2310-wheel (which is the first one I'd actually treat as a wheel,
and thing smaller is just bit-twiddling) would be 111,557,593.
Hmmm, actually, I could probably do 2310 just as bit-twiddling, but I
wouldn't want to.

Ignore my 210-wheel in the pfgw source, that was just proof of concept,
and left as pretty dumb code (fewer surprises that way). Its loop can
be unrolled into something that's not much longer than the current loop
by working a word at a time rather than a byte at a time. On the alpha
or using MMX, the whole loop, all 6 iterations of it, could be replaced

Don't get me wrong, for repeated sequential access, then gaps is the way to
go. I'm more of a random-access merchant myself, and for one-off sequential
access I just use my tweaked version of Bernstein's primegen. i.e. I almost
never have a table of primes. Horses for courses.

Phil

=====
I Pledge Allegiance to the flag
That appears on my Desktop Startup Screen.
And to the Monopoly for which it Stands:
One Operating System over all, inescapable,
With Freedom and Privacy for none. -- Telecommando on /.

__________________________________________________
Do you Yahoo!?
http://mailplus.yahoo.com
• ... Pretty low. The size is very small where the gaps method (coded correctly with escapes) beats your mod-30 wheel method. It only takes 5 bits to encode
Message 12 of 23 , Dec 27, 2002
--- In primenumbers@yahoogroups.com, Phil Carmody <thefatphil@y...>
wrote:
>--- Jud McCranie <judmccr@b...> wrote:
>> At 12:10 PM 12/27/2002 -0800, Phil Carmody wrote:
>>
>>>268MB?!?!?!?! Unfair! you're reserving space for 3, 9, 15, 21,
>> 27,....
>>
>> These are ones I actually use. I know that it isn't hard to make
>> it twice as small, but that adds complication in the program.
>> Anyhow, my point was that gaps will eventually beat a bit vector
>> for any fixed set of moduli.
>
>Some some as yet unknown value of eventually.
>Where do you think that my 48/210 bitmap will be beaten by semi-gaps,
>size-wise?

Pretty low.

The size is very small where the gaps method (coded correctly with
escapes) beats your mod-30 wheel method. It only takes 5 bits to
encode 98.5% of the primes under 100,000,000. The number of bits
needed for different "ranges" is this:

p=1048575, #primes=82024 Gaps>30=4652 Gap>60=196
so at 4 bit level, requires 82024/2+4652/2+196/2+C bytes using
4bit/prime (with escapes) (or about 43436 bytes)
and 82024*5/8+196*5/8+C bytes using 5bit/prime or about 51387 bytes.
Wheel 30 needs 34952 bytes

p=100139007 #primes=5769024 Gaps>30=808721 Gaps>60=91065
so at 4 bit level, requires 5769024/2+808721/2+91065/2+C bytes using
4bit/prime (with escapes) (or about 3334405 bytes)
and 5769024*5/8+91065*5/8+C bytes using 5bit/prime or about 3662556
bytes.
Wheel 30 needs 3337967 bytes

p=2000158719 #primes=98229799 Gaps>19249713 Gaps>60=3178956
so at 4 bit level, requires 98229799/2+3019249713/2+3178956/2+C bytes
using 4bit/prime (with escapes) (or about 60329234 bytes)
and 98229799*5/8+3178956*5/8+C bytes using 5bit/prime or about
3662556 bytes. and at 5 bit level, 98229799*5/8+(c*3178956*5/8) (c>1)
bits using 5bit/prime or about 63380472 bytes.
Wheel 30 needs 66671958 bytes.

Note the above numbers for the 4-bit and 5-bit gap compressions are
low. It does NOT take into account the need for multiple escapes
to handle larger gaps, however, the size is relatively close (at least
for the 5-bit version). The 4-bit varient calculation above does take
into account the computation for a second escape, capable of handling
all gaps up to 60 (not 62)

As you can see, a quality "variable-sized" (i.e. using escapes) gap
compression can do very good.

In my CPAPSieve output file format, I chose a 5-bit gap compression
with escapes. This choice was made due to the fact that gaps in
the heavily trial factored ranges up around 1000 digits or more,
are very frequently over a gap of 30 (but not as often over a gap
of 62). I ran many tests, and used the 5-bit gap compression as
being very fast, and pretty optimal for compression rate.

Jim.
• ... by ... you I know, and I half implemented it a while ago, and I just need some time to get it finished. There are some issues, it is not completely
Message 13 of 23 , Dec 28, 2002
<flames@u...> wrote:
> You know newpgen would work a LOT better if it used some of these
> idea's. For k*2^n+-1 searches if you knew which k was divisible
by
> 3 you could hold 33% more values in the bitmap toss in 5,7,11 and
you

I know, and I half implemented it a while ago, and I just need some
time to get it finished. There are some issues, it is not completely
straightforward. Hopefully I will find some time and get it finished
soon.

Regards,

Paul.
• This is the best thread I ve read here in a while (at least to me). I ve read references to a wheel in reference to sieves/bitmaps before, but didn t
Message 14 of 23 , Dec 28, 2002
This is the best thread I've read here in a while (at least to me). I've read references to a 'wheel' in reference to sieves/bitmaps before, but didn't understand the concept. Phil's description is great.

At 06:26 AM 12/27/02 -0800, Phil Carmody wrote:
>--- Peter Boos <peter_boos@...> wrote:
>>
>> This might be an easy question, I think.
>> Ehm I am thinking of writing a program, what should do
>> sometihng with prime well the point is I wonder what
>> is the optimum way of storing primes.

>> Wel my question is, what would be an better method for
>> storing primes, then this binary way ?.

I wrote some C code years ago to do this, it's still on my website at this page:
I'm looking at the actual code, and I see my goshawful cryptic notes in the comments. But at least the code that reads the file format has some specific documentation on it.
My format encodes prime gaps into four bits and is quite efficient for primes up to 2^32. I looked at similar encodings using 3 and 5 bits, and 4 bits is the most efficient for this range, but with not-too-much-larger primes, where longer gaps are more common, 5 bits becomes more efficient. I also forget exactly how this compares with storing an odd-number-only bitmap (which I used in a sieve to generate the primes), except that this is a lot more compact.

> ... { excellent description snipped }

>This is called using a "wheel". The wheel in this case has circumference 30

> ...

>It's possible to use larger wheels,

For comparison, I've added a percentage of table size vs. a bitmap of all even and odd numbers:

>2 -> 1 from 2 1/2 = 50%
>2.3 -> 2 from 6 2/6 = 33%
>2.3.5 -> 8 from 30 8/30 = 27%
>2.3.5.7 -> 48 from 210 48/210 = 23%
>2.3.5.7.11 -> 480 from 2310 480/2310 = 21%
>
>
>Note, however, that it rarely makes sense to store tables of primes on
>the hard disk, as in most computer languages it's almost always always
>just as fast to calculate them on the fly. Hard disks are slow compared
>to CPUs, on the whole.

I've read that repeatedly over the years, but it didn't seem right to me. But some thought shows that it depends on the number of primes, and it's certainly true for a smaller table of primes.
Sieving is the fastest method of finding large numbers of consecutive primes, but still for a search range of N numbers, the time is (IIRC) proportional to N^2. Reading or writing an already-calculated list is (roughly) proportional to N. So with larger lists, reading from disk can be faster than calculating. I recall my code taking (on my P200) about a half hour to sieve primes below 10^8 or so (I forget the exact numbers - I should have documented these observations), but only five minutes to encode/write or read/decode the file of primes stored as 4-bit gaps.
It just now occurs to me that I'm already using a wheel of circumference 2 in calculating and storing the gaps. A larger wheel would lower the average gap and only reduce the number of instances where a 4-bit encoding is insufficient, which would not be a significant reduction in file size. However, since the gaps are smaller with larger wheels, usung 3-bit gaps might be more efficient (for primes less than 2^32 or some such value).
Certainly for huge ranges (I recall reading (on primepages.org?) that all primes less than 10^15 or so have been calculated), such a method (or something even more efficient) would be mandatory.

>Information-theoretically, all the primes can be stored in only a couple of
>hundred bits, assuming that the primitives like division are atomic.
>If you can describe the primes in a line of text, then that proves that the
>information content is < 80*7 bits.
>"Natural numbers with no proper divisors greater than 1" = 385 bits

For the sake of argument, someone who can decode that surely must know that it generates all prime numbers, so it can be reduced to:

"All prime numbers"

:)

>"{p in|N:!Eq in|N,1<q<p:q|p}" = 196 bits

I recall seeing and using an APL program to generate primes that was one line (less than 80 characters, if you call each greek letter used in APL one character) long. It was in Byte, Creative Computing or some other such magazine in the late '70's. The program was 'elegant' but inefficient and impractical. To get primes less than n, it generated an n*n array, and due to memory limitations would only generate the primes less than 120 or so on the mainframe I used at the time.

>> A happy and peacefull newyear to all of you ! ! !
>
>Thank you; and to you.
>
>Phil

Ditto's.
>
>=====
>I Pledge Allegiance to the flag
>That appears on my Desktop Startup Screen.
>And to the Monopoly for which it Stands:
>One Operating System over all, inescapable,
>With Freedom and Privacy for none. -- Telecommando on /.

---
• ... It is in Knuth, vol 3, but it isn t always true. It depends on the hardware, software, and how far you are going. ... primes, but still for a search range
Message 15 of 23 , Dec 28, 2002
At 01:43 PM 12/28/2002 -0500, Ben Bradley wrote:

>Note, however, that it rarely makes sense to store tables of primes on
> >the hard disk, as in most computer languages it's almost always always
> >just as fast to calculate them on the fly. Hard disks are slow compared
> >to CPUs, on the whole.
>
> I've read that repeatedly over the years, but it didn't seem right to me.

It is in Knuth, vol 3, but it isn't always true. It depends on the
hardware, software, and how far you are going.

> Sieving is the fastest method of finding large numbers of consecutive
primes, but still for a search range of N numbers, the time is (IIRC)
proportional to N^2.

No, it isn't that bad - a little worse than linear.

+---------------------------------------------------------+
| Jud McCranie |
| |
| Programming Achieved with Structure, Clarity, And Logic |
+---------------------------------------------------------+
• yep, right ! my hardware isn t that great it s a pentium 70. So when I want to do something with a lot of numbers this cost me lot s of time, for example
Message 16 of 23 , Dec 30, 2002
yep, right !

my hardware isn't that great it's a pentium 70.
So when I want to do something with a lot of numbers
this cost me lot's of time, for example checking the
first 100.000 primes takes quite a while.
Changing a program a bit and then wait a again,
frustates. So finnaly i made an prime algorithm in
wich I can use a file to start with and then only if
needed continue for finding primes by calculation.
So just for representation i don't have to calculate
again a large amounts of primes, this saves me time.

Now since i deal with this prime number file I just
wonder how to store them with the least amount of disk
space.

I do have enough HD space for my goals but it is
wondering me, when one starts to think about, you
might wonder for example if I had x Mb how many primes
could then be maximum(~ly?) stored in it.

--- Jud McCranie <judmccr@...> wrote:
> At 01:43 PM 12/28/2002 -0500, Ben Bradley wrote:
>
> >Note, however, that it rarely makes sense to store
> tables of primes on
> > >the hard disk, as in most computer languages it's
> almost always always
> > >just as fast to calculate them on the fly. Hard
> disks are slow compared
> > >to CPUs, on the whole.
> >
> > I've read that repeatedly over the years, but
> it didn't seem right to me.
>
> It is in Knuth, vol 3, but it isn't always true. It
> depends on the
> hardware, software, and how far you are going.
>
>
> > Sieving is the fastest method of finding large
> numbers of consecutive
> primes, but still for a search range of N numbers,
> the time is (IIRC)
> proportional to N^2.
>
> No, it isn't that bad - a little worse than linear.
>
>
>
>
+---------------------------------------------------------+
> | Jud McCranie
> |
> |
> |
> | Programming Achieved with Structure, Clarity, And
> Logic |
>
+---------------------------------------------------------+
>
>
>
>

__________________________________________________
Do you Yahoo!?
http://mailplus.yahoo.com
• ... It should take very little time to generate the first 100,000 primes with a sieve of Erathostenes. Are you doing it a harder way? ... I don t know what
Message 17 of 23 , Dec 30, 2002
At 01:47 AM 12/30/2002 -0800, Peter Boos wrote:
>yep, right !
>
>my hardware isn't that great it's a pentium 70.
>So when I want to do something with a lot of numbers
>this cost me lot's of time, for example checking the
>first 100.000 primes takes quite a while.

It should take very little time to generate the first 100,000 primes with a
sieve of Erathostenes. Are you doing it a harder way?

>Now since i deal with this prime number file I just
>wonder how to store them with the least amount of disk
>space.

I don't know what the absolute minimum space to store a list of primes is,
but there will be a trade-off - the more compact the file, probably the
more processing it is going to take to get the numbers out.

If I'm going over 2^32 I generate them with a sieve. Less than 2^32 I
usually read them from a file. I have three types of file: the primes as
32-bit unsigned integers (the largest but easiest to deal with), the gaps
between successive primes, and a bit-vector. The last two take less space
and are good for various things, depending on what you need.

+---------------------------------------------------------+
| Jud McCranie |
| |
| Programming Achieved with Structure, Clarity, And Logic |
+---------------------------------------------------------+

[Non-text portions of this message have been removed]
Your message has been successfully submitted and would be delivered to recipients shortly.