14494Dhrystone Before and After
- Oct 14, 2013The Rhinestone Compiler isn't quite ready for prime time but the results are pretty good. The very earliest run scored 13.33 Dhrystones/second - 7500 instructions at a nominal 100,000 instructions/sec. The most recent run scored 26.8 executing 3731 instructions per pass. The 26.8 compares favorably with the 36 score of the 6502 in an apple IIe since the 1802 is running well below its rated speed.By far the biggest improvements came from improving support routines rather than directly optimizing the emitted code.-The compiler now generates inline shifts and adds for multiplications by small constants and the multiplication routine has been optimized for small operands(which are common);-The division routine was re-coded to better use 1802 instructions and, again, expedite smaller divides;-Two common string routines(copy and compare) were re-written in assembly;Changing these routines pulled out thousands of instructions from each pass.The peephole optimizer runs over the code after it's been emitted by the compiler and before it's assembled looking for simple changes such as combining multiple accesses to the same storage location or eliminating unnecessary register loads . By comparison the peephole rules pulled out 500-600 instructions.At the same time, the object module got about 10-20% smaller so overall I'm calling this a win.
The current version of LCC1802 is posted here https://sites.google.com/site/lcc1802/
Subject: Dhrystone Benchmark LCC1802
Date: Sun, 15 Sep 2013 17:09:06 -0400As I work on the C compiler I've kept my eyes open for a simple benchmark to add to the test suite and as a target for optimization.The Dhrystone integer benchmark dates from the 80's and is not much used anymore but I was able to find the source which is written in C. http://www.netlib.org/benchmark/dhry-cIt is very well documented and fairly easy to understand at a high level. It includes integer math, procedure calls, and string manipulation. It ends with a printout of predicted and actual results so you can tell it worked.Compiling it for the 1802 I only had to comment out the parts that call the operating system for time information. Other than that it compiled and ran without a hitch. (so Yay!)My initial run took about 7.5 seconds for 100 passes or 13.333 Dhrystones/second. This compares to a classic VAX 11/780 which did 1757 Dhrystones/sec. I.E. the VAX score is 133 times the 1802's.I think the VAX executed 500,000 instructions per sec vs 100,000 for the MC so the VAX did 500000/1757=284 vax instructions per pass and the 1802 is doing about 7500. Frankly, looking at the benchmark code, 284 seems like an impossibly small number for the work being done but 7500 must leave some room for improvement.I've tried doing quick and dirty profiling by interrupting the code running on the emulator to see where it is and try to find hot spots. Nothing very obvious showed up so I think this will be a long slog rather than any sort of magic bullet. In the end I'd be amazed if I could cut the run time in half but that seems like a goal. At any rate, I'll have a good solid, directed, look at the code being generated and have a bunch of test cases for the peephole optimizer.
- << Previous post in topic Next post in topic >>