## 14491RE: [cosmacelf] 23 and counting with much improved division

Expand Messages
• Oct 11, 2013
Thanks very much David.  I had not thought to look at that.

I am currently experimenting with yet another division algorithm that is quicker for smaller operands but worse for larger numbers of bits(and negative numbers).  For the division in dhrystone the operands are generally both less than ten and this cuts the calculation down to under 100 instructions(!).  It's a bit cheat-y but the worst cases are under 450 instructions so I may leave this as the default algorithm.  It's a bit messy to read my code but the paper ref'd in the credits is clear. Interestingly, I did not try to leave any of the operands on the stack since they all change during the calculation but I note that your non-restoring algorithm does leave one on the stack even though it gets shifted.  I may see if that would help here.

The multiplication algorithm follows and again it's at its best with small positive numbers which is pretty real-world friendly.  It's much simpler and shorter than the division but the source code link is given for reference:

;this is a divisor shifting algorithm which is faster for smaller operands
;credit to http://research.microsoft.com/pubs/70645/tr-2008-141.pdf
_divu2: ;retval=regarg1(x or dividend)/regarg2(y or divisor)
;uses R14(rwork) as temp, R10 to hold remainder, R11 to save original divisor y0
testsub: macro reg1,reg2 ;test subtraction of reg2 from reg1
;result in D, rwork.hi, DF
glo reg2
str sp
glo reg1
sm
phi rwork
ghi reg2
str sp
ghi reg1
smb
endm
cpy2 R10,regarg1
ld2z retval
cpy2 R11,regarg2
dec sp
testsub regarg1,regarg2
bnf \$\$computequot ;DF=0 means it didn't fit
phi regarg1 ;regarg1=regarg1-regarg2
ghi rwork
plo regarg1
\$\$again: ;this is the divisor doubling phase
testsub regarg1,regarg2
bnf \$\$computequot ;df=0 means it didn't fit
phi regarg1 ;regarg1=regarg1-regarg2
ghi rwork
plo regarg1
shl2 regarg2 ;y=y+y
br \$\$again

\$\$computequot: ;here we're computing the quotient
testsub R10,regarg2
bnf \$\$testexit
phi R10 ;complete the subtraction
ghi rwork
plo R10
inc retval
\$\$testexit:
ghi R11
sm ;top of regarg2 is still on stack
bnz \$\$ney0y
glo regarg2
str sp
glo R11
sm ;test low order bytes
bz \$\$out ;if = we're done
\$\$ney0y:
shl2 retval ;double quotient
shrU2 regarg2 ;halve divisor
br \$\$computequot ;continue
\$\$out:
inc sp ;release work area
;here the quotient is in retval, remainder in R10
cpy2 regarg2,R10
cretn ;and we're done

;16 bit right shifting multiply which is faster for smaller operands
; credit to http://map.grauw.nl/articles/mult_div_shifts.php#lrmultr
_mulU2: ;retval(product)=regarg1(multiplicand)*regarg2(multiplier)
ld2z retval
\$\$mulrlp:
shru2 regarg2
; bottom bit of multiplier is one so add multiplicand to product
ghi regarg2 ;check for all bits of multiplier shifted out
bnz \$\$mulrshft ;nope, continue
glo regarg2 ;check bottom byte
bz \$\$mulrdone
\$\$mulrshft:
shl2 regarg1 ;shift multiplicand left 1
br \$\$mulrlp
\$\$mulrdone: ;here the product is in retval
cretn

> To: cosmacelf@yahoogroups.com
> From: david.schultz@...
> Date: Thu, 10 Oct 2013 19:01:51 -0500
> Subject: Re: [cosmacelf] 23 and counting with much improved division
>
> On 10/08/2013 10:26 AM, bill rowe wrote:
> > The Rhinestone Edition of LCC1802 has now soared past the 23
> > Dhrystones/second mark with a score of 23.75. The most recent big
> > improvement came from replacing the 16 bit division routine. I've been
> > using routines I cribbed from Ted Rossin's C compiler(thanks Ted!) but I
> > never *really* understood the algorithms. I had a hard hard look at the
> > division routine and re-did it to make better use of the 1802. The
> > routine repeatedly subtracts the divisor from a shifted dividend so I
> > was able to leave the divisor in storage rather than repeatedly sourcing
> > it from a register and storing it each time - the inspiration came from
> > looking at a 6502 implementation. Division is still a painful 399
> > instructions excluding call/return overhead but that's down from 588 the
> > old way. There's still some register shuffling at the end that can
> > maybe be cleaned up but this is a major win.
>
> This tickled a memory so I dug into the archive and found the code for
> 32 bit non-restoring division that I sent you.
>
> After hacking it down to 16 bits I compared it to your code.
>
> The shift is effectively identical:
>
> dloop: inc TEMPP ; Adjust index to point at LSB
>
> glo Quo ; shift quotient left
> shlc
> plo Quo
> ghi Quo
> shlc
> phi Quo
>
> ldn TEMPP ; shift remainder
> shlc
> stxd
> ldn TEMPP
> shlc
> str TEMPP
>
> inc TEMPP ; back to LSB
>
> as is the subtraction:
>
> glo Div ; do subtraction
> sd
> stxd
> ghi Div
> sdb
> str TEMPP
>
> The difference is that this is a non-restoring algorithm. Instead of
> performing the subtraction and then backing out of it, the NR algorithm
> performs either an addition or a subtraction at each step. The code for
> this is:
>
> glo Quo ; check lsb
> ani 1
>
> Plus an additional branch at the end of the subtraction code.
>
> Add three more instructions for the loop counter and that is 24
> instructions/loop if a subtraction is performed (because of the extra
> branch) or 23 if addition is used. Call it 23.5. 16 loops gives 376
> instructions. Plus some more at the end to shift in the last bit of the
> quotient (6) and to correct the remainder as required. 3 to check plus 6
> more to correct if needed. For a total of 388 instructions on average.
>
> Your code uses 23 instructions/loop if the restore can be skipped and 27
> if it can't. Call it 25/loop * 16 is 400.
>
> So the non-restoring version is slightly faster.
>
> After looking at the two versions for a bit I think I can shave another
> instruction per loop off that by reversing the subtract operation. Thus
> saving one instruction to adjust the pointer. Make that 372 instructions
> on average.
>
>
> --
> David W. Schultz
> Returned for Regrooving
>
>
>
>
> ------------------------------------
>
> ========================================================
> Questions? Check the FAQ at http://www.cosmacelf.com/forumfaq.html
> Visit the COSMAC ELF website at http://www.cosmacelf.comYahoo! Groups Links
>
> <*> To visit your group on the web, go to:
> http://groups.yahoo.com/group/cosmacelf/
>
>
> <*> To change settings online go to:
> http://groups.yahoo.com/group/cosmacelf/join
> (Yahoo! ID required)
>
> <*> To change settings via email:
> cosmacelf-digest@yahoogroups.com
> cosmacelf-fullfeatured@yahoogroups.com
>
> <*> To unsubscribe from this group, send an email to:
> cosmacelf-unsubscribe@yahoogroups.com
>
> <*> Your use of Yahoo! Groups is subject to:
> http://info.yahoo.com/legal/us/yahoo/utos/terms/
>
• Show all 7 messages in this topic