Loading ...
Sorry, an error occurred while loading the content.

14493RE: [cosmacelf] 23 and counting with much improved division

Expand Messages
  • bill rowe
    Oct 13, 2013
    • 0 Attachment
      David: Thanks for the thoughts.  My current mult routine is below.  It doesn't have an explicit loop counter but it does perform better if the multiplier is smaller.  I could implement the swap in the preamble if, say, the top of the multiplier is not zero.  That seems less likely to make things worse.  

      I do note, looking at the code again, that I could shed a couple of instructions by pulling the 1st multiplier shift up out of the loop so that i either initialize the product with zero or the multiplicand.  I'll think about it.

      ;16 bit right shifting multiply which is faster for smaller operands
      ; credit to http://map.grauw.nl/articles/mult_div_shifts.php#lrmultr

      _mulU2: ;retval(product)=regarg1(multiplicand)*regarg2(multiplier)
            ld2z retval
            shru2 regarg2
            bnf $$mulrnoadd
      ; bottom bit of multiplier is one so add multiplicand to product
            alu2 retval,retval,regarg1,add,adc
            ghi regarg2 ;check for all bits of multiplier shifted out
            bnz $$mulrshft ;nope, continue
            glo regarg2 ;check bottom byte
            bz $$mulrdone
            shl2 regarg1 ;shift multiplicand left 1
            br $$mulrlp
      $$mulrdone: ;here the product is in retval

      To: cosmacelf@yahoogroups.com
      From: david.schultz@...
      Date: Sun, 13 Oct 2013 11:25:27 -0500
      Subject: Re: [cosmacelf] 23 and counting with much improved division

      Some musings on multiplication.

      My preferred algorithm does a 16X16 multiply and returns a 32 bit
      result. But that really isn't required for a C compiler which always
      throws away the upper half of the result. That has some consequences.

      By trading a loop counter which costs 3 instructions for a test to see
      if the multiplier is zero that costs 4 instructions you get faster
      execution in the general case. The reason is simple. A multiply of
      256X256 overflows the 16 bit result. The results of an overflow are not
      specified so it doesn't matter what happens.

      By adding an extra test you can make sure that you are as fast as
      possible. The key is that you want the smaller number to be the
      multiplier since this controls the number of loop iterations. Check it
      at the beginning and swap the operands as required. Once you do the swap
      the multiplier is guaranteed to be less than 256 unless the programmer
      has passed arguments that result in an overflow. In which case who cares?

      The result is that the number of loop iterations is never more than 8.

      If you don't like the cost of a full blown compare, just test the upper
      byte of the multiplier. If it is zero, proceed as usual. If not, swap
      the multiplier and multiplicand. This will be slower in some cases where
      both are less than 256 and differ by factor of 2 or more.

      David W. Schultz
      Returned for Regrooving

    • Show all 7 messages in this topic