Loading ...
Sorry, an error occurred while loading the content.

single-->double multiply?

Expand Messages
  • WarrenS
    Let a and b be 64-bit unsigned integers. Suppose I want to compute a*b which has 128 bits and stick the lower & upper halves into two 64-bit integers F,G. How
    Message 1 of 7 , Jan 4, 2012
    • 0 Attachment
      Let a and b be 64-bit unsigned integers.

      Suppose I want to compute a*b
      which has 128 bits and stick the lower & upper halves into two 64-bit integers F,G.

      How do I do it? Is there a compiler builtin that will access hardware capability
      to do this?
    • Jack Brennen
      Something like this should work for x86-64 gcc (all variables are uint64_t): asm( mulq %3 : =a (F), =d (G) : 0 (a), g (b)); That should multiply a*b
      Message 2 of 7 , Jan 4, 2012
      • 0 Attachment
        Something like this should work for x86-64 gcc (all variables
        are uint64_t):

        asm("mulq %3" :
        "=a" (F), "=d" (G) :
        "0" (a), "g" (b));

        That should multiply a*b and put the 128-bit result into G (high half)
        and F (low half).

        It won't work on any other architecture or compiler though...



        On 1/4/2012 12:13 PM, WarrenS wrote:
        > Let a and b be 64-bit unsigned integers.
        >
        > Suppose I want to compute a*b
        > which has 128 bits and stick the lower& upper halves into two 64-bit integers F,G.
        >
        > How do I do it? Is there a compiler builtin that will access hardware capability
        > to do this?
        >
        >
        >
        >
        >
        > ------------------------------------
        >
        > Unsubscribe by an email to: primenumbers-unsubscribe@yahoogroups.com
        > The Prime Pages : http://www.primepages.org/
        >
        > Yahoo! Groups Links
        >
        >
        >
        >
        >
      • Jack Brennen
        Slight fix and optimization: asm( mulq %3 : =a (F), =d (G) : %0 (a), rm (b)); Using %0 allows the compiler to swap the position of a and b if that s
        Message 3 of 7 , Jan 4, 2012
        • 0 Attachment
          Slight fix and optimization:

          asm("mulq %3" :
          "=a" (F), "=d" (G) :
          "%0" (a), "rm" (b));

          Using "%0" allows the compiler to swap the position of a and b
          if that's more efficient. Using "rm" instead of "g" prohibits
          the compiler from trying to multiply by an immediate value if
          b happens to be a constant.




          On 1/4/2012 12:54 PM, Jack Brennen wrote:
          > Something like this should work for x86-64 gcc (all variables
          > are uint64_t):
          >
          > asm("mulq %3" :
          > "=a" (F), "=d" (G) :
          > "0" (a), "g" (b));
          >
          > That should multiply a*b and put the 128-bit result into G (high half)
          > and F (low half).
          >
          > It won't work on any other architecture or compiler though...
          >
          >
          >
          > On 1/4/2012 12:13 PM, WarrenS wrote:
          >> Let a and b be 64-bit unsigned integers.
          >>
          >> Suppose I want to compute a*b
          >> which has 128 bits and stick the lower& upper halves into two 64-bit integers F,G.
          >>
          >> How do I do it? Is there a compiler builtin that will access hardware capability
          >> to do this?
          >>
          >>
          >>
          >>
          >>
          >> ------------------------------------
          >>
          >> Unsubscribe by an email to: primenumbers-unsubscribe@yahoogroups.com
          >> The Prime Pages : http://www.primepages.org/
          >>
          >> Yahoo! Groups Links
          >>
          >>
          >>
          >>
          >>
          >
          >
          >
          > ------------------------------------
          >
          > Unsubscribe by an email to: primenumbers-unsubscribe@yahoogroups.com
          > The Prime Pages : http://www.primepages.org/
          >
          > Yahoo! Groups Links
          >
          >
          >
          >
          >
        • Chuck Lasher
          Regarding the use of the multiply, I concur with exploiting the hardware wherever possible. There is also a technique which is hardware & compiler portable
          Message 4 of 7 , Jan 4, 2012
          • 0 Attachment
            Regarding the use of the multiply, I concur with exploiting the
            hardware wherever possible.

            There is also a technique which is hardware & compiler portable if not
            using gcc on the x86 platform (with or w/o sse)

            We all know it as 'Montgomery Math', named after mathematician Peter
            Montgomery (Microsoft Research)

            Not only does the technique allow portability (even to a GPU where 128
            bits is not available in the SDK),
            but 128 bit math on a 32 or 64 bit OS w/ or w/o GCC.

            but you will also find 'Montgomery Modular Reduction'. This allows 32
            -> 64 -> 128 bit and variations on it

            for just about anything. The 'cost' of this is the extra step to add
            the carry bit during the multiply IFF it is set.

            It's a great way to handle modular reductions and/or multiplies when
            searching for prime numbers.


            Please do look it up and I think you'll be pleased at the versatility of
            the techniques credited to Peter.

            Please pay attention to the 'high half' and 'low half' and you'll find
            even a 16 bit machine can do it if you overlay
            the union {} / struct {} definitions and keep them properly aligned.
            a * b (both 64 bit) can be done in 3 steps on any hardware / OS
            even if you want that 128 bit result.

            We use it all the time. I wish I could give you an example of working
            code, but don't have any here to paste (C code).

            Chuck


            -----
            No virus found in this message.
            Checked by AVG - www.avg.com
            Version: 2012.0.1901 / Virus Database: 2109/4722 - Release Date: 01/04/12
          • WarrenS
            Related question: If we have two 64-bit unsigned integers a,b and wish to divide the double-length a*2^64 + b by a single 64-bit unsigned integer c to get
            Message 5 of 7 , Jan 14, 2012
            • 0 Attachment
              Related question:

              If we have two 64-bit unsigned integers a,b and wish to divide the
              double-length
              a*2^64 + b
              by a single 64-bit unsigned integer c to get single-length
              quotient=q and remainder=r...
              how do we do it using hardware?

              And what happens if the quotient q does not fit in a single-length?
            • Warren Smith
              sorry, I m not much of a programmer. Last question re multiply, got a nice answer from Brennan saying how to within C program, to call the assembler to do the
              Message 6 of 7 , Jan 14, 2012
              • 0 Attachment
                sorry, I'm not much of a programmer. Last question re multiply, got a nice
                answer from Brennan saying how to within C program, to call the assembler
                to do the job.

                I assume there is similar answer here.

                You did that, but did not explain how to do it in C. In C you need to
                speak both C and assembler and understand how to get them to talk, but yes.

                Really if the C compiler writers had a smidgen of brains, they'd have
                supplied a hook so that everybody could easily do these things without
                needing to deal with assembler themselves.


                [Non-text portions of this message have been removed]
              • Paul Leyland
                ... What kind of hardware would you like to use? It s pretty easy to do it with almost any FPGA but if you want an ASIC it s likely to cost you serious money.
                Message 7 of 7 , Jan 14, 2012
                • 0 Attachment
                  > how do we do it using hardware?

                  What kind of hardware would you like to use? It's pretty easy to do it
                  with almost any FPGA but if you want an ASIC it's likely to cost you
                  serious money. It's also pretty easy to do with an abacus and an
                  instruction manual, but likely to be rather slow.

                  In other words, what's your real question?

                  Paul
                Your message has been successfully submitted and would be delivered to recipients shortly.