Loading ...
Sorry, an error occurred while loading the content.

Calculate first 3.2 trillion primes in 39 hours on standard PC

Expand Messages
  • James Firth
    Hi, Forgive the intrusion as I m a physicist with an interest in efficient computing rather than a mathematician who has studied number theory. In order to
    Message 1 of 7 , Mar 4, 2013
    View Source
    • 0 Attachment
      Hi,

      Forgive the intrusion as I'm a physicist with an interest in efficient computing rather than a mathematician who has studied number theory.

      In order to benchmark system performance I've been playing with an algorithm I created to calculate primes and was surprised by the results and wondered how it compared to other methods and implementations.

      I calculated the first 3.2 trillion primes in 38 hours and 32 minutes on standard intel home PC (4-cores, 4GHz, Linux, mean memory usage using mean of ~300MB of memory).  In this pass I merely calculated pi(x) rather than output the data as I don't have 25 terabytes of disk space to hand.  My results for pi(x) are correct according to other published data for the number of primes up to 1x10^14.


      A little research after the fact shows my method is a multi-threaded variant on a sieve of Eratosthenes with an additional sieve to filter multiples of the first 8 integers coprime with 2.

      A bit of tinkering shows the algorithm scales reasonably well; calculation time increases by around 15% as x increases by 10-fold but memory usage creaps up to keep the algorithm efficient at high values of x.  Also there is a uint64 limit as I haven't used a big number library.

      Since this is my first stab I wondered what I could realistically aim for in terms of cycle time for finding primes with realtively low values of x (<1.8x10^19)?

      fdj
    • Jack Brennen
      Take a look here: http://code.google.com/p/primesieve/ Looks like you can extrapolate that implementation to get the first 3.2 trillion primes in about 5-6
      Message 2 of 7 , Mar 4, 2013
      View Source
      • 0 Attachment
        Take a look here: http://code.google.com/p/primesieve/

        Looks like you can extrapolate that implementation to get the first
        3.2 trillion primes in about 5-6 hours on an Intel i7-920, assuming
        sufficient RAM available.

        I'm not sure RAM is that much of a limiting factor though; I've
        written fast sieves that sieved in what I called "slices", and
        could easily do sieves of this size with much less RAM than today's
        machines commonly have available.

        Generally, as you chase speed, you end up tuning the algorithm
        parameters to minimize cache misses -- cache misses are the biggest
        impediment to speed.


        On 3/4/2013 2:20 PM, James Firth wrote:
        > Hi,
        >
        > Forgive the intrusion as I'm a physicist with an interest in efficient computing rather than a mathematician who has studied number theory.
        >
        > In order to benchmark system performance I've been playing with an algorithm I created to calculate primes and was surprised by the results and wondered how it compared to other methods and implementations.
        >
        > I calculated the first 3.2 trillion primes in 38 hours and 32 minutes on standard intel home PC (4-cores, 4GHz, Linux, mean memory usage using mean of ~300MB of memory). In this pass I merely calculated pi(x) rather than output the data as I don't have 25 terabytes of disk space to hand. My results for pi(x) are correct according to other published data for the number of primes up to 1x10^14.
        >
        >
        > A little research after the fact shows my method is a multi-threaded variant on a sieve of Eratosthenes with an additional sieve to filter multiples of the first 8 integers coprime with 2.
        >
        > A bit of tinkering shows the algorithm scales reasonably well; calculation time increases by around 15% as x increases by 10-fold but memory usage creaps up to keep the algorithm efficient at high values of x. Also there is a uint64 limit as I haven't used a big number library.
        >
        > Since this is my first stab I wondered what I could realistically aim for in terms of cycle time for finding primes with realtively low values of x (<1.8x10^19)?
        >
        > fdj
        >
        >
        > ------------------------------------
        >
        > Unsubscribe by an email to: primenumbers-unsubscribe@yahoogroups.com
        > The Prime Pages : http://primes.utm.edu/
        >
        > Yahoo! Groups Links
        >
        >
        >
        >
        >
        >
      • James Firth
        Thanks Jack, I ll take a look. RAM becomes an issue in my implementation at least because as x gets large then if the sieve slice is too small one wastes
        Message 3 of 7 , Mar 4, 2013
        View Source
        • 0 Attachment
          Thanks Jack, I'll take a look.

          RAM becomes an issue in my implementation at least because as x gets large then if the sieve 'slice' is too small one wastes cycles looping through values of iy where there are no values of ix in range (ix, iy being cartesians on the sieve grid).

          Increase sieve slice size removes waste and improves efficiency by manyfold.

          I created a Javascript visualisation here:
          http://www.daltonfirth.co.uk/tuts/js/prime-finder/prime-finder.html

          jdf



          [Non-text portions of this message have been removed]
        • Phil Carmody
          ... I think the results of a willy-waving contest nearly a decade ago between Terje Mathisen and James Van Buskirk on c.l.a.x, concluded that about an order of
          Message 4 of 7 , Mar 5, 2013
          View Source
          • 0 Attachment
            Jack Brennen wrote:
            > Looks like you can extrapolate that implementation to get the first
            > 3.2 trillion primes in about 5-6 hours on an Intel i7-920, assuming
            > sufficient RAM available.

            I think the results of a willy-waving contest nearly a decade ago between Terje Mathisen and James Van Buskirk on c.l.a.x, concluded that about an order of magnitude faster than that speed should be possible. If you're James, that is. Tomas e Silva's sieve uses pretty much the same algorithm, but he hasn't optimised it quite as much. (James is an extreme optimiser, very few people can optimise as much!)
            http://www.ieeta.pt/~tos/software/prime_sieve.html

            Phil
          • Ben Buhrow
            ... Nearly two orders of magnitude is possible. YAFU and primesieve both compute pi(x) to 3.2e12 using the sieve of Eratosthenes in about 30 minutes on my
            Message 5 of 7 , Mar 6, 2013
            View Source
            • 0 Attachment
              --- In primenumbers@yahoogroups.com, Phil Carmody <thefatphil@...> wrote:
              >
              > Jack Brennen wrote:
              > > Looks like you can extrapolate that implementation to get the first
              > > 3.2 trillion primes in about 5-6 hours on an Intel i7-920, assuming
              > > sufficient RAM available.
              >
              > I think the results of a willy-waving contest nearly a decade ago between Terje Mathisen and James Van Buskirk on c.l.a.x, concluded that about an order of magnitude faster than that speed should be possible. If you're James, that is. Tomas e Silva's sieve uses pretty much the same algorithm, but he hasn't optimised it quite as much. (James is an extreme optimiser, very few people can optimise as much!)
              > http://www.ieeta.pt/~tos/software/prime_sieve.html
              >
              > Phil
              >

              Nearly two orders of magnitude is possible. YAFU and primesieve both compute pi(x) to 3.2e12 using the sieve of Eratosthenes in about 30 minutes on my machine (a pretty fast server). They are both threadable too - with 4 threads YAFU takes about 8 minutes.

              - ben.
            • Ben Buhrow
              ... Apologies, I was confusing pi(1e14) ~= 3.2e12 with pi(3.2e12). After running some experiments, it looks like the two programs I mentioned will take about
              Message 6 of 7 , Mar 6, 2013
              View Source
              • 0 Attachment
                --- In primenumbers@yahoogroups.com, "Ben Buhrow" <bbuhrow@...> wrote:

                >
                > Nearly two orders of magnitude is possible. YAFU and primesieve both compute pi(x) to 3.2e12 using the sieve of Eratosthenes in about 30 minutes on my machine (a pretty fast server). They are both threadable too - with 4 threads YAFU takes about 8 minutes.
                >
                > - ben.
                >

                Apologies, I was confusing pi(1e14) ~= 3.2e12 with pi(3.2e12). After running some experiments, it looks like the two programs I mentioned will take about 18 hrs for pi(1e14), so more like a factor of 2.
              • James Firth
                Thank you Ben. I have already optimised down to 18 hours for pi(1e14) however primesieve is consistently running at least 4 times faster on my machine. I am
                Message 7 of 7 , Mar 6, 2013
                View Source
                • 0 Attachment
                  Thank you Ben.

                  I have already optimised down to 18 hours for pi(1e14) however primesieve is consistently running at least 4 times faster on my machine.

                  I am aware that I might be using an odd method to stitch together my sieve segments.  Some Javascript here (when you click on the demo link) explains how I have calculated my boundaries:
                  http://www.daltonfirth.co.uk/tuts/js/prime-finder/prime-finder.html

                  I am not seeing the efficiency improvements the primesieve author gets when the sieve memory fits into the L2 cache; I will probably have to switch to Tomás Oliveira e Silva's bucket method.

                  I had a hunch my method, which allows starting on a prime's square for starting values greater than seg_start/start_val, might be efficient, but alas...

                  I might try improving the storage compression, I'm currently using modulo 30, but I doubt I'll find another 4x optimisation in my code.

                  JDF

                  --- In primenumbers@yahoogroups.com, "Ben Buhrow"  wrote:

                  >
                  > Nearly two orders of magnitude is possible.  YAFU and primesieve both compute pi(x) to 3.2e12 using the sieve of Eratosthenes in about 30 minutes on my machine (a pretty fast server).  They are both threadable too - with 4 threads YAFU takes about 8 minutes.
                  >
                  > - ben.
                  >

                  Apologies, I was confusing pi(1e14) ~= 3.2e12 with pi(3.2e12).  After running some experiments, it looks like the two programs I mentioned will take about 18 hrs for pi(1e14), so more like a factor of 2.
                Your message has been successfully submitted and would be delivered to recipients shortly.