- Oct 11, 2013Thanks very much David. I had not thought to look at that.I am currently experimenting with yet another division algorithm that is quicker for smaller operands but worse for larger numbers of bits(and negative numbers). For the division in dhrystone the operands are generally both less than ten and this cuts the calculation down to under 100 instructions(!). It's a bit cheat-y but the worst cases are under 450 instructions so I may leave this as the default algorithm. It's a bit messy to read my code but the paper ref'd in the credits is clear. Interestingly, I did not try to leave any of the operands on the stack since they all change during the calculation but I note that your non-restoring algorithm does leave one on the stack even though it gets shifted. I may see if that would help here.The multiplication algorithm follows and again it's at its best with small positive numbers which is pretty real-world friendly. It's much simpler and shorter than the division but the source code link is given for reference:;this is a divisor shifting algorithm which is faster for smaller operands;credit to http://research.microsoft.com/pubs/70645/tr-2008-141.pdf_divu2: ;retval=regarg1(x or dividend)/regarg2(y or divisor);uses R14(rwork) as temp, R10 to hold remainder, R11 to save original divisor y0testsub: macro reg1,reg2 ;test subtraction of reg2 from reg1;result in D, rwork.hi, DFglo reg2str spglo reg1smphi rworkghi reg2str spghi reg1smbendmcpy2 R10,regarg1ld2z retvalcpy2 R11,regarg2dec sptestsub regarg1,regarg2bnf $$computequot ;DF=0 means it didn't fitphi regarg1 ;regarg1=regarg1-regarg2ghi rworkplo regarg1$$again: ;this is the divisor doubling phasetestsub regarg1,regarg2bnf $$computequot ;df=0 means it didn't fitphi regarg1 ;regarg1=regarg1-regarg2ghi rworkplo regarg1shl2 regarg2 ;y=y+ybr $$again$$computequot: ;here we're computing the quotienttestsub R10,regarg2bnf $$testexitphi R10 ;complete the subtractionghi rworkplo R10inc retval$$testexit:ghi R11sm ;top of regarg2 is still on stackbnz $$ney0yglo regarg2str spglo R11sm ;test low order bytesbz $$out ;if = we're done$$ney0y:shl2 retval ;double quotientshrU2 regarg2 ;halve divisorbr $$computequot ;continue$$out:inc sp ;release work area;here the quotient is in retval, remainder in R10cpy2 regarg2,R10cretn ;and we're done;16 bit right shifting multiply which is faster for smaller operands; credit to http://map.grauw.nl/articles/mult_div_shifts.php#lrmultr_mulU2: ;retval(product)=regarg1(multiplicand)*regarg2(multiplier)ld2z retval$$mulrlp:shru2 regarg2bnf $$mulrnoadd; bottom bit of multiplier is one so add multiplicand to productalu2 retval,retval,regarg1,add,adc$$mulrnoadd:ghi regarg2 ;check for all bits of multiplier shifted outbnz $$mulrshft ;nope, continueglo regarg2 ;check bottom bytebz $$mulrdone$$mulrshft:shl2 regarg1 ;shift multiplicand left 1br $$mulrlp$$mulrdone: ;here the product is in retvalcretn> To: cosmacelf@yahoogroups.com

> From: david.schultz@...

> Date: Thu, 10 Oct 2013 19:01:51 -0500

> Subject: Re: [cosmacelf] 23 and counting with much improved division

>

> On 10/08/2013 10:26 AM, bill rowe wrote:

> > The Rhinestone Edition of LCC1802 has now soared past the 23

> > Dhrystones/second mark with a score of 23.75. The most recent big

> > improvement came from replacing the 16 bit division routine. I've been

> > using routines I cribbed from Ted Rossin's C compiler(thanks Ted!) but I

> > never *really* understood the algorithms. I had a hard hard look at the

> > division routine and re-did it to make better use of the 1802. The

> > routine repeatedly subtracts the divisor from a shifted dividend so I

> > was able to leave the divisor in storage rather than repeatedly sourcing

> > it from a register and storing it each time - the inspiration came from

> > looking at a 6502 implementation. Division is still a painful 399

> > instructions excluding call/return overhead but that's down from 588 the

> > old way. There's still some register shuffling at the end that can

> > maybe be cleaned up but this is a major win.

>

> This tickled a memory so I dug into the archive and found the code for

> 32 bit non-restoring division that I sent you.

>

> After hacking it down to 16 bits I compared it to your code.

>

> The shift is effectively identical:

>

> dloop: inc TEMPP ; Adjust index to point at LSB

>

> glo Quo ; shift quotient left

> shlc

> plo Quo

> ghi Quo

> shlc

> phi Quo

>

> ldn TEMPP ; shift remainder

> shlc

> stxd

> ldn TEMPP

> shlc

> str TEMPP

>

> inc TEMPP ; back to LSB

>

> as is the subtraction:

>

> glo Div ; do subtraction

> sd

> stxd

> ghi Div

> sdb

> str TEMPP

>

> The difference is that this is a non-restoring algorithm. Instead of

> performing the subtraction and then backing out of it, the NR algorithm

> performs either an addition or a subtraction at each step. The code for

> this is:

>

> glo Quo ; check lsb

> ani 1

> bdf doadd

>

> Plus an additional branch at the end of the subtraction code.

>

> Add three more instructions for the loop counter and that is 24

> instructions/loop if a subtraction is performed (because of the extra

> branch) or 23 if addition is used. Call it 23.5. 16 loops gives 376

> instructions. Plus some more at the end to shift in the last bit of the

> quotient (6) and to correct the remainder as required. 3 to check plus 6

> more to correct if needed. For a total of 388 instructions on average.

>

> Your code uses 23 instructions/loop if the restore can be skipped and 27

> if it can't. Call it 25/loop * 16 is 400.

>

> So the non-restoring version is slightly faster.

>

> After looking at the two versions for a bit I think I can shave another

> instruction per loop off that by reversing the subtract operation. Thus

> saving one instruction to adjust the pointer. Make that 372 instructions

> on average.

>

>

> --

> David W. Schultz

> http://home.earthlink.net/~david.schultz

> Returned for Regrooving

>

>

>

>

> ------------------------------------

>

> ========================================================

> Questions? Check the FAQ at http://www.cosmacelf.com/forumfaq.html

> Visit the COSMAC ELF website at http://www.cosmacelf.comYahoo! Groups Links

>

> <*> To visit your group on the web, go to:

> http://groups.yahoo.com/group/cosmacelf/

>

> <*> Your email settings:

> Individual Email | Traditional

>

> <*> To change settings online go to:

> http://groups.yahoo.com/group/cosmacelf/join

> (Yahoo! ID required)

>

> <*> To change settings via email:

> cosmacelf-digest@yahoogroups.com

> cosmacelf-fullfeatured@yahoogroups.com

>

> <*> To unsubscribe from this group, send an email to:

> cosmacelf-unsubscribe@yahoogroups.com

>

> <*> Your use of Yahoo! Groups is subject to:

> http://info.yahoo.com/legal/us/yahoo/utos/terms/

> - << Previous post in topic Next post in topic >>