Loading ...
Sorry, an error occurred while loading the content.
 

Re: [PrimeNumbers] Question.

Expand Messages
  • Phil Carmody
    ...
    Message 1 of 23 , Dec 27, 2002
      --- Peter Boos <peter_boos@...> wrote:
      >
      > This might be an easy question, I think.
      > Ehm I am thinking of writing a program, what should do
      > sometihng with prime well the point is I wonder what
      > is the optimum way of storing primes.
      > I mean placing them in a file will require much space
      > on HD. So I wondered then perhaps creating a single
      > binary string of data in which a 1 represents a prime
      > an a zero represent a non prime. wel that would allow
      > me to place more primes in a file (although I am not
      > sure as primes will ocure less with higher numbers.
      > This binary prime file i will place it in a compressed
      > folder. (a zip folder or an ntfs comressed folder)*.
      >
      > Wel my question is, what would be an better method for
      > storing primes, then this binary way ?.

      <!--
      Liguististic aside, compare the etymology of
      THAN (OE. _�anne_, _�onne_, _��nne_, also _�an_, _�on_; with
      THEN (OE. _�anne_, _�onne_, _��nne_).
      They used to be the same word, and may well return to that state!
      -->

      On the whole, no.

      Some have looked into storing gaps between primes, but it's trivial to get a
      bitmap representaton to out-perform (size-wise) such delta-coding, as long
      as you permit yourself some special cases.

      If you permit yourself to just 'know' that 2 is prime, then the bitmap
      doesn't need to store all the unnecessary even numbers, which you know will
      be all composite. => half size
      If you permit yourself to also just 'know' that 3 is a prime, then the bitmap
      no longer needs to contain numbers of the form 6n+3, as you know they're
      divisible by 3. => 2/3rds of half size = 1/3.
      If you also permit yourself to just 'know' that 5 is a prime, then the
      bitmap no longer needs to contain numbers of the form 30n+5 and 30n+25, as
      you know they're divisible by 5. => 4/5ths of 1/3 size = 4/15

      Stopping here lets you store a bitmap range corresponding to 30 numbers in 8
      bits, i.e. a byte.
      Byte 0 contains bits corresponding to { 1, 7, 11, 13, 17, 19, 23, 29 }
      Byte 1 contains bits corresponding to { 31, 37, 41, 43, 47, 49, 53, 59 }
      ...
      Byte n contains bits corresponding to { 30n+1, 30n+7, ........ 30n+29 }

      This is called using a "wheel". The wheel in this case has circumference 30

      0------------------------------------->- Number line

      29 1
      ######
      ##########
      ############ _
      23 ############## 7 |\ roll
      ############## | this
      ############## / way
      ############
      19 ########## 11
      ######
      17 13


      It's possible to use larger wheels,
      2 -> 1 from 2
      2.3 -> 2 from 6
      2.3.5 -> 8 from 30
      2.3.5.7 -> 48 from 210
      2.3.5.7.11 -> 480 from 2310


      Note, however, that it rarely makes sense to store tables of primes on
      the hard disk, as in most computer languages it's almost always always
      just as fast to calculate them on the fly. Hard disks are slow compared
      to CPUs, on the whole.

      > * As far as i now compression acka information
      > theorie this might be verry intresting question.
      > Because it's how to store information of x primes with
      > the less amount of storing data, so it would contain
      > the essentials of primes... or something like that.

      Information-theoretically, all the primes can be stored in only a couple of
      hundred bits, assuming that the primitives like division are atomic.
      If you can describe the primes in a line of text, then that proves that the
      information content is < 80*7 bits.
      "Natural numbers with no proper divisors greater than 1" = 385 bits
      "{p in|N:!Eq in|N,1<q<p:q|p}" = 196 bits

      However, I've assumed that "|N" is known to mean "the natural numbers", and
      ":" means "such that", "!E" means "there does not exist" etc.

      > Ohyeah
      >
      > A happy and peacefull newyear to all of you ! ! !

      Thank you; and to you.

      Phil


      =====
      I Pledge Allegiance to the flag
      That appears on my Desktop Startup Screen.
      And to the Monopoly for which it Stands:
      One Operating System over all, inescapable,
      With Freedom and Privacy for none. -- Telecommando on /.

      __________________________________________________
      Do you Yahoo!?
      Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
      http://mailplus.yahoo.com
    • Jud McCranie
      ... It depends on how many you want to store. If you consider the odd numbers only, storing a 1 for a prime and a 0 for a composite is called a bit vector ,
      Message 2 of 23 , Dec 27, 2002
        At 12:37 AM 12/27/2002 -0800, Peter Boos wrote:

        >Wel my question is, what would be an better method for
        >storing primes, then this binary way ?.

        It depends on how many you want to store. If you consider the odd numbers
        only, storing a 1 for a prime and a 0 for a composite is called a "bit
        vector", but this is not always the best way. And which method is most
        efficient depends on what your upper limit is. But for a practical
        example, consider primes < 2^32.

        Storage method File size, rounded (bytes)

        32-bit unsigned 813,000,000
        Bit-vector 268,000,000
        Semi-gaps 203,000,000

        Semi-gaps is where you store the difference between successive primes
        divided by 2. For primes < 2^32, you can store the semigaps in a byte.

        Each method has good and bad properties. With the bit vector, you can
        easily access it anywhere to see if a particular number is prime. With
        the gaps, you have to process the whole file. etc.




        +---------------------------------------------------------+
        | Jud McCranie |
        | |
        | Programming Achieved with Structure, Clarity, And Logic |
        +---------------------------------------------------------+



        [Non-text portions of this message have been removed]
      • Jud McCranie
        ... I think *eventually* the gap method takes over because the primes get more and more sparse. ... It depends. On my system I can read in a bit vector file
        Message 3 of 23 , Dec 27, 2002
          At 06:26 AM 12/27/2002 -0800, Phil Carmody wrote:

          >Some have looked into storing gaps between primes, but it's trivial to get a
          >bitmap representaton to out-perform (size-wise) such delta-coding, as long
          >as you permit yourself some special cases.

          I think *eventually* the gap method takes over because the primes get more
          and more sparse.


          >Note, however, that it rarely makes sense to store tables of primes on
          >the hard disk, as in most computer languages it's almost always always
          >just as fast to calculate them on the fly. Hard disks are slow compared
          >to CPUs, on the whole.

          It depends. On my system I can read in a bit vector file (odd numbers
          only) for up to 3^32 quicker than I can generate them in a sieve, including
          the time to unpack the bits into actual numbers. (The unpacking takes
          longer than actually reading.)



          +---------------------------------------------------------+
          | Jud McCranie |
          | |
          | Programming Achieved with Structure, Clarity, And Logic |
          +---------------------------------------------------------+



          [Non-text portions of this message have been removed]
        • Phil Carmody
          ... Eventually, yes, I m sure that s the case. Most people would code gaps with a flat distribution, but that can be improved upon. 1) Firstly you could use a
          Message 4 of 23 , Dec 27, 2002
            --- Jud McCranie <judmccr@...> wrote:
            > At 06:26 AM 12/27/2002 -0800, Phil Carmody wrote:
            >
            > >Some have looked into storing gaps between primes, but it's trivial to get a
            > >bitmap representaton to out-perform (size-wise) such delta-coding, as long
            > >as you permit yourself some special cases.
            >
            > I think *eventually* the gap method takes over because the primes get more
            > and more sparse.

            Eventually, yes, I'm sure that's the case.

            Most people would code gaps with a flat distribution, but that can be
            improved upon.
            1) Firstly you could use a static huffman code. I looked at that, once,
            maybe I posted my figures here, it gives a worthwhile compression,
            assuming you don't mind paying for the bit-twiddling overhead.
            You could use a differentmodel for different bands, for slightly
            improved compression.

            2) You use some kind of universal code with the right view of the
            entropy of the sequence. I've not worked out which universal code
            that would be. Maybe a Golomb code with a logarithmically increasing
            parameter (in steps/bands, or maybe continuously changing). Maybe a
            start-stop code would be better.

            What distribution do gaps have? Knowing that would be the best way of
            answering the question of which universal code is best... Something's
            saying that gaps in a poisson process are exponential, and therefore
            the discrete analogue (woh! you don't find discrete analogues that
            often!) would be geometric, and Golomb is a perfect match for geometric
            distribution.

            There are a couple of tricks that can make the gap method compress even
            better. No technique that doesn't take into account knowledge about
            primality will ever be better than one based on the same compression
            principles but also taking into account knowledge about primes.

            > >Note, however, that it rarely makes sense to store tables of primes on
            > >the hard disk, as in most computer languages it's almost always always
            > >just as fast to calculate them on the fly. Hard disks are slow compared
            > >to CPUs, on the whole.
            >
            > It depends. On my system I can read in a bit vector file (odd numbers
            > only) for up to 3^32 quicker than I can generate them in a sieve, including
            > the time to unpack the bits into actual numbers. (The unpacking takes
            > longer than actually reading.)

            You have faster hard disks than me, and I think I have a faster sieve than
            you. A couple of factors of 50% can make a huge difference to the outcome of
            such comparisons.

            A typical SoE will be superlinear, and therefore eventually precaclulation
            will win, as restoration from precalculation is linear.

            It's a shame that the scale constant in Pritchard+Wheel is high, otherwise a
            the sub-linear sieve might have some chance of winning. With the right kind
            of memory/DMA/blitting architecture, a Pritchard can be done outragously
            quickly. You'd probably want bit-addressed memory though, so we're not talking
            a typical CPU here by any means.

            Phil


            =====
            I Pledge Allegiance to the flag
            That appears on my Desktop Startup Screen.
            And to the Monopoly for which it Stands:
            One Operating System over all, inescapable,
            With Freedom and Privacy for none. -- Telecommando on /.

            __________________________________________________
            Do you Yahoo!?
            Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
            http://mailplus.yahoo.com
          • Phil Carmody
            ... 268MB?!?!?!?! Unfair! you re reserving space for 3, 9, 15, 21, 27,.... 2^32 bits is 4Gb = 536870912 bytes 2-wheel 268435456 bytes 6-wheel
            Message 5 of 23 , Dec 27, 2002
              --- Jud McCranie <judmccr@...> wrote:
              > At 12:37 AM 12/27/2002 -0800, Peter Boos wrote:
              >
              > >Wel my question is, what would be an better method for
              > >storing primes, then this binary way ?.
              >
              > It depends on how many you want to store. If you consider the odd numbers
              > only, storing a 1 for a prime and a 0 for a composite is called a "bit
              > vector", but this is not always the best way. And which method is most
              > efficient depends on what your upper limit is. But for a practical
              > example, consider primes < 2^32.
              >
              > Storage method File size, rounded (bytes)
              >
              > 32-bit unsigned 813,000,000
              > Bit-vector 268,000,000
              > Semi-gaps 203,000,000

              268MB?!?!?!?! Unfair! you're reserving space for 3, 9, 15, 21, 27,....

              2^32 bits is 4Gb = 536870912 bytes
              2-wheel 268435456 bytes
              6-wheel 178956970 bytes
              30-wheel 143165576 bytes <- commonly used
              210-wheel 130150524 bytes <- I've toyed with this

              The bitmap based on a wheel mod 30 is 30% smaller than the
              otherwise-uncompressed semigaps.

              > Semi-gaps is where you store the difference between successive primes
              > divided by 2. For primes < 2^32, you can store the semigaps in a byte.

              Yup, a byte can cope with semigaps for primes up to 300G.

              Other length encodings can be useful over other ranges, depending on how
              much extra bit-twiddling you want to do

              length range range+ range++ size
              8 bits 300G 40T 1600T 203MB
              7 bits 400M 20G 240G 178MB
              6 bits 1M 20M 190M x

              I seem to have lost my data on how well a static huffman would do, if you
              cared more about size than compression/decompression speed. With this kind
              of data set there's practically no difference between Huffman and Golomb,
              I'd guess, except that with Golomb you don't need to remember individual
              frequencies or codes, just one parameter works fine.

              > Each method has good and bad properties. With the bit vector, you can
              > easily access it anywhere to see if a particular number is prime. With
              > the gaps, you have to process the whole file. etc.

              Or do it in bands, so you only have to decompress the band where your
              start-point is.

              Peeling out all primes with a modular property related to your wheel size is
              vastly easier in bitmap form. However, that's just repeated abuse of the
              fact that there's random access.


              Phil


              =====
              I Pledge Allegiance to the flag
              That appears on my Desktop Startup Screen.
              And to the Monopoly for which it Stands:
              One Operating System over all, inescapable,
              With Freedom and Privacy for none. -- Telecommando on /.

              __________________________________________________
              Do you Yahoo!?
              Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
              http://mailplus.yahoo.com
            • Jud McCranie
              ... These are ones I actually use. I know that it isn t hard to make it twice as small, but that adds complication in the program. Anyhow, my point was that
              Message 6 of 23 , Dec 27, 2002
                At 12:10 PM 12/27/2002 -0800, Phil Carmody wrote:

                >268MB?!?!?!?! Unfair! you're reserving space for 3, 9, 15, 21, 27,....

                These are ones I actually use. I know that it isn't hard to make it twice
                as small, but that adds complication in the program. Anyhow, my point was
                that gaps will eventually beat a bit vector for any fixed set of moduli.


                >Other length encodings can be useful over other ranges, depending on how
                >much extra bit-twiddling you want to do
                >
                >length range range+ range++ size
                >8 bits 300G 40T 1600T 203MB
                >7 bits 400M 20G 240G 178MB
                >6 bits 1M 20M 190M x

                What do you mean by range+ and range++, using an extra 6/7/8 bits where needed?


                >Or do it in bands, so you only have to decompress the band where your
                >start-point is.

                Well, I try to keep it simple, to reduce the chance of me making an
                error. Like the saying "do you want it done fast or do you want it done
                right?" And unless you're doing something pretty simple with the primes,
                getting the primes is the smaller part of it. Most of the time I read the
                primes as 32-bit unsigned integers from a disk file because (1) that is
                fast enough, (2) it is less prone to error. I know the disk file is right,
                and it is simpler to do that than to plug the processing into my sieve
                program (less chance of error). And I don't have to worry about pulling
                the wrong version of my sieve, one that might have an error.


                +---------------------------------------------------------+
                | Jud McCranie |
                | |
                | Programming Achieved with Structure, Clarity, And Logic |
                +---------------------------------------------------------+



                [Non-text portions of this message have been removed]
              • Jud McCranie
                ... Consider getting the primes less than some limit. Reading a bit vector is linear in time and disk space. A sieve is super-linear and I m pretty sure
                Message 7 of 23 , Dec 27, 2002
                  At 10:55 AM 12/27/2002 -0800, Phil Carmody wrote:

                  > I think *eventually* the gap method takes over because the primes get more
                  > > and more sparse.
                  >
                  >Eventually, yes, I'm sure that's the case.

                  Consider getting the primes less than some limit. Reading a bit vector is
                  linear in time and disk space. A sieve is super-linear and I'm pretty sure
                  reading a file of gaps is sub-linear.

                  >What distribution do gaps have?

                  I don't know, but undoubtedly someone has worked on that. A gap of < 512
                  (so it can be stored in a byte) works into the hundreds of billions. IIRC.

                  >You have faster hard disks than me, and I think I have a faster sieve than
                  >you. A couple of factors of 50% can make a huge difference to the outcome of
                  >such comparisons.

                  you're probably right. I could probably speed up my sieve, but when I'm
                  using it to do something with the primes, generally that takes longer than
                  the sieve itself. Since the sieve isn't the bottleneck, speeding it up
                  won't help the overall time so much.

                  I have two HDs, the faster one reads about 51,000,000 bytes per
                  second. Subsequent reads of the same file are about 3.5 times faster than
                  that.


                  +---------------------------------------------------------+
                  | Jud McCranie |
                  | |
                  | Programming Achieved with Structure, Clarity, And Logic |
                  +---------------------------------------------------------+



                  [Non-text portions of this message have been removed]
                • Phil Carmody
                  ... Some some as yet unknown value of eventually. Where do you think that my 48/210 bitmap will be beaten by semi-gaps, size-wise? ... range is the prime
                  Message 8 of 23 , Dec 27, 2002
                    --- Jud McCranie <judmccr@...> wrote:
                    > At 12:10 PM 12/27/2002 -0800, Phil Carmody wrote:
                    >
                    > >268MB?!?!?!?! Unfair! you're reserving space for 3, 9, 15, 21, 27,....
                    >
                    > These are ones I actually use. I know that it isn't hard to make it twice
                    > as small, but that adds complication in the program. Anyhow, my point was
                    > that gaps will eventually beat a bit vector for any fixed set of moduli.

                    Some some as yet unknown value of eventually.
                    Where do you think that my 48/210 bitmap will be beaten by semi-gaps,
                    size-wise?

                    > >Other length encodings can be useful over other ranges, depending on how
                    > >much extra bit-twiddling you want to do
                    > >
                    > >length range range+ range++ size
                    > >8 bits 300G 40T 1600T 203MB
                    > >7 bits 400M 20G 240G 178MB
                    > >6 bits 1M 20M 190M x
                    >
                    > What do you mean by range+ and range++, using an extra 6/7/8 bits where needed?

                    "range" is the prime range that can be reached using semigaps encoded as X-bits
                    "range+" is the prime range that can be reached using a very small extra bit
                    twiddle, but still based on semi-gaps. Consider it Markov modelled. Each
                    codeword is still X-bits.
                    "range++" is the prime range that can be reached us you're prepared to pull
                    out a bigger Markov model, probably a table-driven scheme. Still X-bits
                    codewords.

                    The 8-bit, 7-bit and 6-bit rows were for the codeword length. You're
                    using 8-bits. /If/ size is an issue more than speed, then using 7-bit
                    or 6-bit codewords might make more sense, and the prime range is compatible
                    with the codeword length.

                    The 'x' in the 6-bit column is slightly misleading as it is possible to code
                    primes up to 4G using 6-bit gaps, as long as you reserve an escape code to
                    indicate that a gap is over-sized. However, that leads you towards schemes
                    like start-stop codes and eventually to Golomb codes.
                    An m_i=3 SSC might be interesting to try, as its decoding speed is pretty
                    good due to the simplicity of using nybbles.

                    > >Or do it in bands, so you only have to decompress the band where your
                    > >start-point is.
                    >
                    > Well, I try to keep it simple,

                    That's no fun! Recursive Elias codes are fun. Ulam s-additive codes are even
                    more fun, OEIS A002858. I bet you didn't realise you could efficiently
                    (asymptotically) encode prime number gaps using it when you added your
                    comment to that sequence! Not that I'd ever chose to use them for that
                    purpose myself.

                    > to reduce the chance of me making an
                    > error. Like the saying "do you want it done fast or do you want it done
                    > right?" And unless you're doing something pretty simple with the primes,
                    > getting the primes is the smaller part of it. Most of the time I read the
                    > primes as 32-bit unsigned integers from a disk file because (1) that is
                    > fast enough, (2) it is less prone to error. I know the disk file is right,
                    > and it is simpler to do that than to plug the processing into my sieve
                    > program (less chance of error). And I don't have to worry about pulling
                    > the wrong version of my sieve, one that might have an error.

                    Don't you like surprises?
                    ;-)

                    Phil


                    =====
                    I Pledge Allegiance to the flag
                    That appears on my Desktop Startup Screen.
                    And to the Monopoly for which it Stands:
                    One Operating System over all, inescapable,
                    With Freedom and Privacy for none. -- Telecommando on /.

                    __________________________________________________
                    Do you Yahoo!?
                    Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
                    http://mailplus.yahoo.com
                  • Markus <flames@unforgettable.com>
                    You know newpgen would work a LOT better if it used some of these idea s. For k*2^n+-1 searches if you knew which k was divisible by 3 you could hold 33%
                    Message 9 of 23 , Dec 27, 2002
                      You know newpgen would work a LOT better if it used some of these
                      idea's. For k*2^n+-1 searches if you knew which k was divisible by
                      3 you could hold 33% more values in the bitmap toss in 5,7,11 and you
                      could hold 2.40625 (480/1155) times more k values, when searching for
                      really sparce forms like CC3 etc this could shave weeks off sieving
                      times.

                      Markus


                      >
                      > It's possible to use larger wheels,
                      > 2 -> 1 from 2
                      > 2.3 -> 2 from 6
                      > 2.3.5 -> 8 from 30
                      > 2.3.5.7 -> 48 from 210
                      > 2.3.5.7.11 -> 480 from 2310
                      >
                      >
                      > Note, however, that it rarely makes sense to store tables of primes
                      on
                      > the hard disk, as in most computer languages it's almost always
                      always
                      > just as fast to calculate them on the fly. Hard disks are slow
                      compared
                      > to CPUs, on the whole.
                      >
                      > > * As far as i now compression acka information
                      > > theorie this might be verry intresting question.
                      > > Because it's how to store information of x primes with
                      > > the less amount of storing data, so it would contain
                      > > the essentials of primes... or something like that.
                      >
                      > Information-theoretically, all the primes can be stored in only a
                      couple of
                      > hundred bits, assuming that the primitives like division are atomic.
                      > If you can describe the primes in a line of text, then that proves
                      that the
                      > information content is < 80*7 bits.
                      > "Natural numbers with no proper divisors greater than 1" = 385 bits
                      > "{p in|N:!Eq in|N,1<q<p:q|p}" = 196 bits
                      >
                      > However, I've assumed that "|N" is known to mean "the natural
                      numbers", and
                      > ":" means "such that", "!E" means "there does not exist" etc.
                      >
                      > > Ohyeah
                      > >
                      > > A happy and peacefull newyear to all of you ! ! !
                      >
                      > Thank you; and to you.
                      >
                      > Phil
                      >
                      >
                      > =====
                      > I Pledge Allegiance to the flag
                      > That appears on my Desktop Startup Screen.
                      > And to the Monopoly for which it Stands:
                      > One Operating System over all, inescapable,
                      > With Freedom and Privacy for none. -- Telecommando on /.
                      >
                    • Jud McCranie
                      ... If you assume 1 byte per gap, about 4.5 x 10^15. ... No, I didn t. ... It depends, of course. I don t like being surprised by finding out that an error in
                      Message 10 of 23 , Dec 27, 2002
                        At 02:38 PM 12/27/2002 -0800, Phil Carmody wrote:

                        > Some some as yet unknown value of eventually.
                        >Where do you think that my 48/210 bitmap will be beaten by semi-gaps,
                        >size-wise?

                        If you assume 1 byte per gap, about 4.5 x 10^15.


                        >That's no fun! Recursive Elias codes are fun. Ulam s-additive codes are even
                        >more fun, OEIS A002858. I bet you didn't realise you could efficiently
                        >(asymptotically) encode prime number gaps using it when you added your
                        >comment to that sequence!

                        No, I didn't.


                        >Don't you like surprises?
                        >;-)


                        It depends, of course. I don't like being surprised by finding out that an
                        error in my program caused it to produce incorrect results, whish happens
                        all to often. :-( ;-)




                        +---------------------------------------------------------+
                        | Jud McCranie |
                        | |
                        | Programming Achieved with Structure, Clarity, And Logic |
                        +---------------------------------------------------------+



                        [Non-text portions of this message have been removed]
                      • Phil Carmody
                        ... I typod (did 10/11 rather than 6/7), the 210-wheel is 122,713,352 bytes The 2310-wheel (which is the first one I d actually treat as a wheel, and thing
                        Message 11 of 23 , Dec 27, 2002
                          --- "jim_fougeron <jfoug@...>" <jfoug@...> wrote:
                          > >> example, consider primes < 2^32.
                          > >>
                          > >> Storage method File size, rounded (bytes)
                          > >>
                          > >> 32-bit unsigned 813,000,000
                          > >> Bit-vector 268,000,000
                          > >> Semi-gaps 203,000,000
                          > >
                          > > 268MB?!?!?!?! Unfair! you're reserving space for 3, 9, 15, 21,
                          > 27,....
                          >
                          > True Phil, but wasting 8 bits on each gap is also unfair. I have
                          > done a compression in CPAPSieve using 5 bits per gap (and escape
                          > sequences for gaps that are too large for 5 bits). The above
                          > 203mb should also be stated as 126 MB. More efficiency can be
                          > obtained than my algorithm can give, but my algorithm was very
                          > very fast (and btw, it is better compression than your mod-210
                          > wheel method)

                          I typod (did 10/11 rather than 6/7), the 210-wheel is 122,713,352 bytes
                          The 2310-wheel (which is the first one I'd actually treat as a wheel,
                          and thing smaller is just bit-twiddling) would be 111,557,593.
                          Hmmm, actually, I could probably do 2310 just as bit-twiddling, but I
                          wouldn't want to.

                          Ignore my 210-wheel in the pfgw source, that was just proof of concept,
                          and left as pretty dumb code (fewer surprises that way). Its loop can
                          be unrolled into something that's not much longer than the current loop
                          by working a word at a time rather than a byte at a time. On the alpha
                          or using MMX, the whole loop, all 6 iterations of it, could be replaced
                          by about 8 instructions!

                          Don't get me wrong, for repeated sequential access, then gaps is the way to
                          go. I'm more of a random-access merchant myself, and for one-off sequential
                          access I just use my tweaked version of Bernstein's primegen. i.e. I almost
                          never have a table of primes. Horses for courses.

                          Phil


                          =====
                          I Pledge Allegiance to the flag
                          That appears on my Desktop Startup Screen.
                          And to the Monopoly for which it Stands:
                          One Operating System over all, inescapable,
                          With Freedom and Privacy for none. -- Telecommando on /.

                          __________________________________________________
                          Do you Yahoo!?
                          Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
                          http://mailplus.yahoo.com
                        • jim_fougeron <jfoug@kdsi.net>
                          ... Pretty low. The size is very small where the gaps method (coded correctly with escapes) beats your mod-30 wheel method. It only takes 5 bits to encode
                          Message 12 of 23 , Dec 27, 2002
                            --- In primenumbers@yahoogroups.com, Phil Carmody <thefatphil@y...>
                            wrote:
                            >--- Jud McCranie <judmccr@b...> wrote:
                            >> At 12:10 PM 12/27/2002 -0800, Phil Carmody wrote:
                            >>
                            >>>268MB?!?!?!?! Unfair! you're reserving space for 3, 9, 15, 21,
                            >> 27,....
                            >>
                            >> These are ones I actually use. I know that it isn't hard to make
                            >> it twice as small, but that adds complication in the program.
                            >> Anyhow, my point was that gaps will eventually beat a bit vector
                            >> for any fixed set of moduli.
                            >
                            >Some some as yet unknown value of eventually.
                            >Where do you think that my 48/210 bitmap will be beaten by semi-gaps,
                            >size-wise?

                            Pretty low.

                            The size is very small where the gaps method (coded correctly with
                            escapes) beats your mod-30 wheel method. It only takes 5 bits to
                            encode 98.5% of the primes under 100,000,000. The number of bits
                            needed for different "ranges" is this:

                            p=1048575, #primes=82024 Gaps>30=4652 Gap>60=196
                            so at 4 bit level, requires 82024/2+4652/2+196/2+C bytes using
                            4bit/prime (with escapes) (or about 43436 bytes)
                            and 82024*5/8+196*5/8+C bytes using 5bit/prime or about 51387 bytes.
                            Wheel 30 needs 34952 bytes

                            p=100139007 #primes=5769024 Gaps>30=808721 Gaps>60=91065
                            so at 4 bit level, requires 5769024/2+808721/2+91065/2+C bytes using
                            4bit/prime (with escapes) (or about 3334405 bytes)
                            and 5769024*5/8+91065*5/8+C bytes using 5bit/prime or about 3662556
                            bytes.
                            Wheel 30 needs 3337967 bytes

                            p=2000158719 #primes=98229799 Gaps>19249713 Gaps>60=3178956
                            so at 4 bit level, requires 98229799/2+3019249713/2+3178956/2+C bytes
                            using 4bit/prime (with escapes) (or about 60329234 bytes)
                            and 98229799*5/8+3178956*5/8+C bytes using 5bit/prime or about
                            3662556 bytes. and at 5 bit level, 98229799*5/8+(c*3178956*5/8) (c>1)
                            bits using 5bit/prime or about 63380472 bytes.
                            Wheel 30 needs 66671958 bytes.

                            Note the above numbers for the 4-bit and 5-bit gap compressions are
                            low. It does NOT take into account the need for multiple escapes
                            to handle larger gaps, however, the size is relatively close (at least
                            for the 5-bit version). The 4-bit varient calculation above does take
                            into account the computation for a second escape, capable of handling
                            all gaps up to 60 (not 62)

                            As you can see, a quality "variable-sized" (i.e. using escapes) gap
                            compression can do very good.

                            In my CPAPSieve output file format, I chose a 5-bit gap compression
                            with escapes. This choice was made due to the fact that gaps in
                            the heavily trial factored ranges up around 1000 digits or more,
                            are very frequently over a gap of 30 (but not as often over a gap
                            of 62). I ran many tests, and used the 5-bit gap compression as
                            being very fast, and pretty optimal for compression rate.

                            Jim.
                          • p_jobling <p_jobling@hotmail.com>
                            ... by ... you I know, and I half implemented it a while ago, and I just need some time to get it finished. There are some issues, it is not completely
                            Message 13 of 23 , Dec 28, 2002
                              --- In primenumbers@yahoogroups.com, "Markus <flames@u...>"
                              <flames@u...> wrote:
                              > You know newpgen would work a LOT better if it used some of these
                              > idea's. For k*2^n+-1 searches if you knew which k was divisible
                              by
                              > 3 you could hold 33% more values in the bitmap toss in 5,7,11 and
                              you


                              I know, and I half implemented it a while ago, and I just need some
                              time to get it finished. There are some issues, it is not completely
                              straightforward. Hopefully I will find some time and get it finished
                              soon.

                              Regards,

                              Paul.
                            • Ben Bradley
                              This is the best thread I ve read here in a while (at least to me). I ve read references to a wheel in reference to sieves/bitmaps before, but didn t
                              Message 14 of 23 , Dec 28, 2002
                                This is the best thread I've read here in a while (at least to me). I've read references to a 'wheel' in reference to sieves/bitmaps before, but didn't understand the concept. Phil's description is great.

                                At 06:26 AM 12/27/02 -0800, Phil Carmody wrote:
                                >--- Peter Boos <peter_boos@...> wrote:
                                >>
                                >> This might be an easy question, I think.
                                >> Ehm I am thinking of writing a program, what should do
                                >> sometihng with prime well the point is I wonder what
                                >> is the optimum way of storing primes.

                                >> Wel my question is, what would be an better method for
                                >> storing primes, then this binary way ?.

                                I wrote some C code years ago to do this, it's still on my website at this page:
                                <http://www.mindspring.com/~benbradley/number_theory.html>
                                I'm looking at the actual code, and I see my goshawful cryptic notes in the comments. But at least the code that reads the file format has some specific documentation on it.
                                My format encodes prime gaps into four bits and is quite efficient for primes up to 2^32. I looked at similar encodings using 3 and 5 bits, and 4 bits is the most efficient for this range, but with not-too-much-larger primes, where longer gaps are more common, 5 bits becomes more efficient. I also forget exactly how this compares with storing an odd-number-only bitmap (which I used in a sieve to generate the primes), except that this is a lot more compact.

                                > ... { excellent description snipped }

                                >This is called using a "wheel". The wheel in this case has circumference 30

                                > ...

                                >It's possible to use larger wheels,

                                For comparison, I've added a percentage of table size vs. a bitmap of all even and odd numbers:

                                >2 -> 1 from 2 1/2 = 50%
                                >2.3 -> 2 from 6 2/6 = 33%
                                >2.3.5 -> 8 from 30 8/30 = 27%
                                >2.3.5.7 -> 48 from 210 48/210 = 23%
                                >2.3.5.7.11 -> 480 from 2310 480/2310 = 21%
                                >
                                >
                                >Note, however, that it rarely makes sense to store tables of primes on
                                >the hard disk, as in most computer languages it's almost always always
                                >just as fast to calculate them on the fly. Hard disks are slow compared
                                >to CPUs, on the whole.

                                I've read that repeatedly over the years, but it didn't seem right to me. But some thought shows that it depends on the number of primes, and it's certainly true for a smaller table of primes.
                                Sieving is the fastest method of finding large numbers of consecutive primes, but still for a search range of N numbers, the time is (IIRC) proportional to N^2. Reading or writing an already-calculated list is (roughly) proportional to N. So with larger lists, reading from disk can be faster than calculating. I recall my code taking (on my P200) about a half hour to sieve primes below 10^8 or so (I forget the exact numbers - I should have documented these observations), but only five minutes to encode/write or read/decode the file of primes stored as 4-bit gaps.
                                It just now occurs to me that I'm already using a wheel of circumference 2 in calculating and storing the gaps. A larger wheel would lower the average gap and only reduce the number of instances where a 4-bit encoding is insufficient, which would not be a significant reduction in file size. However, since the gaps are smaller with larger wheels, usung 3-bit gaps might be more efficient (for primes less than 2^32 or some such value).
                                Certainly for huge ranges (I recall reading (on primepages.org?) that all primes less than 10^15 or so have been calculated), such a method (or something even more efficient) would be mandatory.

                                >Information-theoretically, all the primes can be stored in only a couple of
                                >hundred bits, assuming that the primitives like division are atomic.
                                >If you can describe the primes in a line of text, then that proves that the
                                >information content is < 80*7 bits.
                                >"Natural numbers with no proper divisors greater than 1" = 385 bits

                                For the sake of argument, someone who can decode that surely must know that it generates all prime numbers, so it can be reduced to:

                                "All prime numbers"

                                :)

                                >"{p in|N:!Eq in|N,1<q<p:q|p}" = 196 bits

                                I recall seeing and using an APL program to generate primes that was one line (less than 80 characters, if you call each greek letter used in APL one character) long. It was in Byte, Creative Computing or some other such magazine in the late '70's. The program was 'elegant' but inefficient and impractical. To get primes less than n, it generated an n*n array, and due to memory limitations would only generate the primes less than 120 or so on the mainframe I used at the time.

                                >> A happy and peacefull newyear to all of you ! ! !
                                >
                                >Thank you; and to you.
                                >
                                >Phil

                                Ditto's.
                                >
                                >=====
                                >I Pledge Allegiance to the flag
                                >That appears on my Desktop Startup Screen.
                                >And to the Monopoly for which it Stands:
                                >One Operating System over all, inescapable,
                                >With Freedom and Privacy for none. -- Telecommando on /.

                                ---
                                http://mindspring.com/~benbradley
                              • Jud McCranie
                                ... It is in Knuth, vol 3, but it isn t always true. It depends on the hardware, software, and how far you are going. ... primes, but still for a search range
                                Message 15 of 23 , Dec 28, 2002
                                  At 01:43 PM 12/28/2002 -0500, Ben Bradley wrote:

                                  >Note, however, that it rarely makes sense to store tables of primes on
                                  > >the hard disk, as in most computer languages it's almost always always
                                  > >just as fast to calculate them on the fly. Hard disks are slow compared
                                  > >to CPUs, on the whole.
                                  >
                                  > I've read that repeatedly over the years, but it didn't seem right to me.

                                  It is in Knuth, vol 3, but it isn't always true. It depends on the
                                  hardware, software, and how far you are going.


                                  > Sieving is the fastest method of finding large numbers of consecutive
                                  primes, but still for a search range of N numbers, the time is (IIRC)
                                  proportional to N^2.

                                  No, it isn't that bad - a little worse than linear.



                                  +---------------------------------------------------------+
                                  | Jud McCranie |
                                  | |
                                  | Programming Achieved with Structure, Clarity, And Logic |
                                  +---------------------------------------------------------+
                                • Peter Boos
                                  yep, right ! my hardware isn t that great it s a pentium 70. So when I want to do something with a lot of numbers this cost me lot s of time, for example
                                  Message 16 of 23 , Dec 30, 2002
                                    yep, right !

                                    my hardware isn't that great it's a pentium 70.
                                    So when I want to do something with a lot of numbers
                                    this cost me lot's of time, for example checking the
                                    first 100.000 primes takes quite a while.
                                    Changing a program a bit and then wait a again,
                                    frustates. So finnaly i made an prime algorithm in
                                    wich I can use a file to start with and then only if
                                    needed continue for finding primes by calculation.
                                    So just for representation i don't have to calculate
                                    again a large amounts of primes, this saves me time.

                                    Now since i deal with this prime number file I just
                                    wonder how to store them with the least amount of disk
                                    space.

                                    I do have enough HD space for my goals but it is
                                    wondering me, when one starts to think about, you
                                    might wonder for example if I had x Mb how many primes
                                    could then be maximum(~ly?) stored in it.

                                    --- Jud McCranie <judmccr@...> wrote:
                                    > At 01:43 PM 12/28/2002 -0500, Ben Bradley wrote:
                                    >
                                    > >Note, however, that it rarely makes sense to store
                                    > tables of primes on
                                    > > >the hard disk, as in most computer languages it's
                                    > almost always always
                                    > > >just as fast to calculate them on the fly. Hard
                                    > disks are slow compared
                                    > > >to CPUs, on the whole.
                                    > >
                                    > > I've read that repeatedly over the years, but
                                    > it didn't seem right to me.
                                    >
                                    > It is in Knuth, vol 3, but it isn't always true. It
                                    > depends on the
                                    > hardware, software, and how far you are going.
                                    >
                                    >
                                    > > Sieving is the fastest method of finding large
                                    > numbers of consecutive
                                    > primes, but still for a search range of N numbers,
                                    > the time is (IIRC)
                                    > proportional to N^2.
                                    >
                                    > No, it isn't that bad - a little worse than linear.
                                    >
                                    >
                                    >
                                    >
                                    +---------------------------------------------------------+
                                    > | Jud McCranie
                                    > |
                                    > |
                                    > |
                                    > | Programming Achieved with Structure, Clarity, And
                                    > Logic |
                                    >
                                    +---------------------------------------------------------+
                                    >
                                    >
                                    >
                                    >


                                    __________________________________________________
                                    Do you Yahoo!?
                                    Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
                                    http://mailplus.yahoo.com
                                  • Jud McCranie
                                    ... It should take very little time to generate the first 100,000 primes with a sieve of Erathostenes. Are you doing it a harder way? ... I don t know what
                                    Message 17 of 23 , Dec 30, 2002
                                      At 01:47 AM 12/30/2002 -0800, Peter Boos wrote:
                                      >yep, right !
                                      >
                                      >my hardware isn't that great it's a pentium 70.
                                      >So when I want to do something with a lot of numbers
                                      >this cost me lot's of time, for example checking the
                                      >first 100.000 primes takes quite a while.

                                      It should take very little time to generate the first 100,000 primes with a
                                      sieve of Erathostenes. Are you doing it a harder way?


                                      >Now since i deal with this prime number file I just
                                      >wonder how to store them with the least amount of disk
                                      >space.

                                      I don't know what the absolute minimum space to store a list of primes is,
                                      but there will be a trade-off - the more compact the file, probably the
                                      more processing it is going to take to get the numbers out.

                                      If I'm going over 2^32 I generate them with a sieve. Less than 2^32 I
                                      usually read them from a file. I have three types of file: the primes as
                                      32-bit unsigned integers (the largest but easiest to deal with), the gaps
                                      between successive primes, and a bit-vector. The last two take less space
                                      and are good for various things, depending on what you need.




                                      +---------------------------------------------------------+
                                      | Jud McCranie |
                                      | |
                                      | Programming Achieved with Structure, Clarity, And Logic |
                                      +---------------------------------------------------------+



                                      [Non-text portions of this message have been removed]
                                    Your message has been successfully submitted and would be delivered to recipients shortly.