;16 bit right shifting multiply which is faster for smaller operands
; credit to http://map.grauw.nl/articles/mult_div_shifts.php#lrmultr
; bottom bit of multiplier is one so add multiplicand to product
ghi regarg2 ;check for all bits of multiplier shifted out
bnz $$mulrshft ;nope, continue
glo regarg2 ;check bottom byte
shl2 regarg1 ;shift multiplicand left 1
$$mulrdone: ;here the product is in retval
Date: Sun, 13 Oct 2013 11:25:27 -0500
Subject: Re: [cosmacelf] 23 and counting with much improved division
Some musings on multiplication.
My preferred algorithm does a 16X16 multiply and returns a 32 bit
result. But that really isn't required for a C compiler which always
throws away the upper half of the result. That has some consequences.
By trading a loop counter which costs 3 instructions for a test to see
if the multiplier is zero that costs 4 instructions you get faster
execution in the general case. The reason is simple. A multiply of
256X256 overflows the 16 bit result. The results of an overflow are not
specified so it doesn't matter what happens.
By adding an extra test you can make sure that you are as fast as
possible. The key is that you want the smaller number to be the
multiplier since this controls the number of loop iterations. Check it
at the beginning and swap the operands as required. Once you do the swap
the multiplier is guaranteed to be less than 256 unless the programmer
has passed arguments that result in an overflow. In which case who cares?
The result is that the number of loop iterations is never more than 8.
If you don't like the cost of a full blown compare, just test the upper
byte of the multiplier. If it is zero, proceed as usual. If not, swap
the multiplier and multiplicand. This will be slower in some cases where
both are less than 256 and differ by factor of 2 or more.
David W. Schultz
Returned for Regrooving