## single-->double multiply?

Expand Messages
• Let a and b be 64-bit unsigned integers. Suppose I want to compute a*b which has 128 bits and stick the lower & upper halves into two 64-bit integers F,G. How
Message 1 of 7 , Jan 4, 2012
• 0 Attachment
Let a and b be 64-bit unsigned integers.

Suppose I want to compute a*b
which has 128 bits and stick the lower & upper halves into two 64-bit integers F,G.

How do I do it? Is there a compiler builtin that will access hardware capability
to do this?
• Something like this should work for x86-64 gcc (all variables are uint64_t): asm( mulq %3 : =a (F), =d (G) : 0 (a), g (b)); That should multiply a*b
Message 2 of 7 , Jan 4, 2012
• 0 Attachment
Something like this should work for x86-64 gcc (all variables
are uint64_t):

asm("mulq %3" :
"=a" (F), "=d" (G) :
"0" (a), "g" (b));

That should multiply a*b and put the 128-bit result into G (high half)
and F (low half).

It won't work on any other architecture or compiler though...

On 1/4/2012 12:13 PM, WarrenS wrote:
> Let a and b be 64-bit unsigned integers.
>
> Suppose I want to compute a*b
> which has 128 bits and stick the lower& upper halves into two 64-bit integers F,G.
>
> How do I do it? Is there a compiler builtin that will access hardware capability
> to do this?
>
>
>
>
>
> ------------------------------------
>
> Unsubscribe by an email to: primenumbers-unsubscribe@yahoogroups.com
> The Prime Pages : http://www.primepages.org/
>
>
>
>
>
>
• Slight fix and optimization: asm( mulq %3 : =a (F), =d (G) : %0 (a), rm (b)); Using %0 allows the compiler to swap the position of a and b if that s
Message 3 of 7 , Jan 4, 2012
• 0 Attachment
Slight fix and optimization:

asm("mulq %3" :
"=a" (F), "=d" (G) :
"%0" (a), "rm" (b));

Using "%0" allows the compiler to swap the position of a and b
if that's more efficient. Using "rm" instead of "g" prohibits
the compiler from trying to multiply by an immediate value if
b happens to be a constant.

On 1/4/2012 12:54 PM, Jack Brennen wrote:
> Something like this should work for x86-64 gcc (all variables
> are uint64_t):
>
> asm("mulq %3" :
> "=a" (F), "=d" (G) :
> "0" (a), "g" (b));
>
> That should multiply a*b and put the 128-bit result into G (high half)
> and F (low half).
>
> It won't work on any other architecture or compiler though...
>
>
>
> On 1/4/2012 12:13 PM, WarrenS wrote:
>> Let a and b be 64-bit unsigned integers.
>>
>> Suppose I want to compute a*b
>> which has 128 bits and stick the lower& upper halves into two 64-bit integers F,G.
>>
>> How do I do it? Is there a compiler builtin that will access hardware capability
>> to do this?
>>
>>
>>
>>
>>
>> ------------------------------------
>>
>> Unsubscribe by an email to: primenumbers-unsubscribe@yahoogroups.com
>> The Prime Pages : http://www.primepages.org/
>>
>>
>>
>>
>>
>>
>
>
>
> ------------------------------------
>
> Unsubscribe by an email to: primenumbers-unsubscribe@yahoogroups.com
> The Prime Pages : http://www.primepages.org/
>
>
>
>
>
>
• Regarding the use of the multiply, I concur with exploiting the hardware wherever possible. There is also a technique which is hardware & compiler portable
Message 4 of 7 , Jan 4, 2012
• 0 Attachment
Regarding the use of the multiply, I concur with exploiting the
hardware wherever possible.

There is also a technique which is hardware & compiler portable if not
using gcc on the x86 platform (with or w/o sse)

We all know it as 'Montgomery Math', named after mathematician Peter
Montgomery (Microsoft Research)

Not only does the technique allow portability (even to a GPU where 128
bits is not available in the SDK),
but 128 bit math on a 32 or 64 bit OS w/ or w/o GCC.

but you will also find 'Montgomery Modular Reduction'. This allows 32
-> 64 -> 128 bit and variations on it

for just about anything. The 'cost' of this is the extra step to add
the carry bit during the multiply IFF it is set.

It's a great way to handle modular reductions and/or multiplies when
searching for prime numbers.

Please do look it up and I think you'll be pleased at the versatility of
the techniques credited to Peter.

Please pay attention to the 'high half' and 'low half' and you'll find
even a 16 bit machine can do it if you overlay
the union {} / struct {} definitions and keep them properly aligned.
a * b (both 64 bit) can be done in 3 steps on any hardware / OS
even if you want that 128 bit result.

We use it all the time. I wish I could give you an example of working
code, but don't have any here to paste (C code).

Chuck

-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1901 / Virus Database: 2109/4722 - Release Date: 01/04/12
• Related question: If we have two 64-bit unsigned integers a,b and wish to divide the double-length a*2^64 + b by a single 64-bit unsigned integer c to get
Message 5 of 7 , Jan 14, 2012
• 0 Attachment
Related question:

If we have two 64-bit unsigned integers a,b and wish to divide the
double-length
a*2^64 + b
by a single 64-bit unsigned integer c to get single-length
quotient=q and remainder=r...
how do we do it using hardware?

And what happens if the quotient q does not fit in a single-length?
• sorry, I m not much of a programmer. Last question re multiply, got a nice answer from Brennan saying how to within C program, to call the assembler to do the
Message 6 of 7 , Jan 14, 2012
• 0 Attachment
sorry, I'm not much of a programmer. Last question re multiply, got a nice
answer from Brennan saying how to within C program, to call the assembler
to do the job.

I assume there is similar answer here.

You did that, but did not explain how to do it in C. In C you need to
speak both C and assembler and understand how to get them to talk, but yes.

Really if the C compiler writers had a smidgen of brains, they'd have
supplied a hook so that everybody could easily do these things without
needing to deal with assembler themselves.

[Non-text portions of this message have been removed]
• ... What kind of hardware would you like to use? It s pretty easy to do it with almost any FPGA but if you want an ASIC it s likely to cost you serious money.
Message 7 of 7 , Jan 14, 2012
• 0 Attachment
> how do we do it using hardware?

What kind of hardware would you like to use? It's pretty easy to do it
with almost any FPGA but if you want an ASIC it's likely to cost you
serious money. It's also pretty easy to do with an abacus and an
instruction manual, but likely to be rather slow.

In other words, what's your real question?

Paul
Your message has been successfully submitted and would be delivered to recipients shortly.