Loading ...
Sorry, an error occurred while loading the content.

Re: [PrimeNumbers] single-->double multiply?

Expand Messages
  • Jack Brennen
    Slight fix and optimization: asm( mulq %3 : =a (F), =d (G) : %0 (a), rm (b)); Using %0 allows the compiler to swap the position of a and b if that s
    Message 1 of 7 , Jan 4, 2012
    • 0 Attachment
      Slight fix and optimization:

      asm("mulq %3" :
      "=a" (F), "=d" (G) :
      "%0" (a), "rm" (b));

      Using "%0" allows the compiler to swap the position of a and b
      if that's more efficient. Using "rm" instead of "g" prohibits
      the compiler from trying to multiply by an immediate value if
      b happens to be a constant.




      On 1/4/2012 12:54 PM, Jack Brennen wrote:
      > Something like this should work for x86-64 gcc (all variables
      > are uint64_t):
      >
      > asm("mulq %3" :
      > "=a" (F), "=d" (G) :
      > "0" (a), "g" (b));
      >
      > That should multiply a*b and put the 128-bit result into G (high half)
      > and F (low half).
      >
      > It won't work on any other architecture or compiler though...
      >
      >
      >
      > On 1/4/2012 12:13 PM, WarrenS wrote:
      >> Let a and b be 64-bit unsigned integers.
      >>
      >> Suppose I want to compute a*b
      >> which has 128 bits and stick the lower& upper halves into two 64-bit integers F,G.
      >>
      >> How do I do it? Is there a compiler builtin that will access hardware capability
      >> to do this?
      >>
      >>
      >>
      >>
      >>
      >> ------------------------------------
      >>
      >> Unsubscribe by an email to: primenumbers-unsubscribe@yahoogroups.com
      >> The Prime Pages : http://www.primepages.org/
      >>
      >> Yahoo! Groups Links
      >>
      >>
      >>
      >>
      >>
      >
      >
      >
      > ------------------------------------
      >
      > Unsubscribe by an email to: primenumbers-unsubscribe@yahoogroups.com
      > The Prime Pages : http://www.primepages.org/
      >
      > Yahoo! Groups Links
      >
      >
      >
      >
      >
    • Chuck Lasher
      Regarding the use of the multiply, I concur with exploiting the hardware wherever possible. There is also a technique which is hardware & compiler portable
      Message 2 of 7 , Jan 4, 2012
      • 0 Attachment
        Regarding the use of the multiply, I concur with exploiting the
        hardware wherever possible.

        There is also a technique which is hardware & compiler portable if not
        using gcc on the x86 platform (with or w/o sse)

        We all know it as 'Montgomery Math', named after mathematician Peter
        Montgomery (Microsoft Research)

        Not only does the technique allow portability (even to a GPU where 128
        bits is not available in the SDK),
        but 128 bit math on a 32 or 64 bit OS w/ or w/o GCC.

        but you will also find 'Montgomery Modular Reduction'. This allows 32
        -> 64 -> 128 bit and variations on it

        for just about anything. The 'cost' of this is the extra step to add
        the carry bit during the multiply IFF it is set.

        It's a great way to handle modular reductions and/or multiplies when
        searching for prime numbers.


        Please do look it up and I think you'll be pleased at the versatility of
        the techniques credited to Peter.

        Please pay attention to the 'high half' and 'low half' and you'll find
        even a 16 bit machine can do it if you overlay
        the union {} / struct {} definitions and keep them properly aligned.
        a * b (both 64 bit) can be done in 3 steps on any hardware / OS
        even if you want that 128 bit result.

        We use it all the time. I wish I could give you an example of working
        code, but don't have any here to paste (C code).

        Chuck


        -----
        No virus found in this message.
        Checked by AVG - www.avg.com
        Version: 2012.0.1901 / Virus Database: 2109/4722 - Release Date: 01/04/12
      • WarrenS
        Related question: If we have two 64-bit unsigned integers a,b and wish to divide the double-length a*2^64 + b by a single 64-bit unsigned integer c to get
        Message 3 of 7 , Jan 14, 2012
        • 0 Attachment
          Related question:

          If we have two 64-bit unsigned integers a,b and wish to divide the
          double-length
          a*2^64 + b
          by a single 64-bit unsigned integer c to get single-length
          quotient=q and remainder=r...
          how do we do it using hardware?

          And what happens if the quotient q does not fit in a single-length?
        • Warren Smith
          sorry, I m not much of a programmer. Last question re multiply, got a nice answer from Brennan saying how to within C program, to call the assembler to do the
          Message 4 of 7 , Jan 14, 2012
          • 0 Attachment
            sorry, I'm not much of a programmer. Last question re multiply, got a nice
            answer from Brennan saying how to within C program, to call the assembler
            to do the job.

            I assume there is similar answer here.

            You did that, but did not explain how to do it in C. In C you need to
            speak both C and assembler and understand how to get them to talk, but yes.

            Really if the C compiler writers had a smidgen of brains, they'd have
            supplied a hook so that everybody could easily do these things without
            needing to deal with assembler themselves.


            [Non-text portions of this message have been removed]
          • Paul Leyland
            ... What kind of hardware would you like to use? It s pretty easy to do it with almost any FPGA but if you want an ASIC it s likely to cost you serious money.
            Message 5 of 7 , Jan 14, 2012
            • 0 Attachment
              > how do we do it using hardware?

              What kind of hardware would you like to use? It's pretty easy to do it
              with almost any FPGA but if you want an ASIC it's likely to cost you
              serious money. It's also pretty easy to do with an abacus and an
              instruction manual, but likely to be rather slow.

              In other words, what's your real question?

              Paul
            Your message has been successfully submitted and would be delivered to recipients shortly.