Loading ...
Sorry, an error occurred while loading the content.

Dhrystone Benchmark LCC1802

Expand Messages
  • bill rowe
    As I work on the C compiler I ve kept my eyes open for a simple benchmark to add to the test suite and as a target for optimization. The Dhrystone integer
    Message 1 of 13 , Sep 15, 2013
    • 0 Attachment
      As I work on the C compiler I've kept my eyes open for a simple benchmark to add to the test suite and as a target for optimization.

      The Dhrystone integer benchmark dates from the 80's and is not much used anymore but I was able to find the source which is written in C.  http://www.netlib.org/benchmark/dhry-c 
      It is very well documented and fairly easy to understand at a high level.  It includes integer math, procedure calls, and string manipulation.  It ends with a printout of predicted and actual results so you can tell it worked.

      Compiling it for the 1802 I only had to comment out the parts that call the operating system for time information. Other than that it compiled and ran without a hitch. (so Yay!)

      My initial run took about 7.5 seconds for 100 passes or 13.333 Dhrystones/second.  This compares to a classic VAX 11/780 which did 1757 Dhrystones/sec.  I.E. the VAX score is 133 times the 1802's.

      I think the VAX executed 500,000 instructions per sec vs 100,000 for the MC so the VAX did 500000/1757=284 vax instructions per pass and the 1802 is doing about 7500.  Frankly, looking at the benchmark code, 284 seems like an impossibly small number for the work being done but 7500 must leave some room for improvement.

      I've tried doing quick and dirty profiling by interrupting the code running on the emulator to see where it is and try to find hot spots.  Nothing very obvious showed up so I think this will be a long slog rather than any sort of magic bullet.  In the end I'd be amazed if I could cut the run time in half but that seems like a goal.  At any rate, I'll have a good solid, directed, look at the code being generated and have a bunch of test cases for the peephole optimizer.


    • bill rowe
      I ve done a bit more looking at the dhrystone benchmark. and I found a table of results for vintage processors at
      Message 2 of 13 , Sep 28, 2013
      • 0 Attachment
        I've done a bit more looking at the dhrystone benchmark. and I found a table of results for vintage processors at http://www.dunnington.u-net.com/public/dhrystone.c

        I'm glad to have found the table but I'm a bit taken aback by the results.  My first pass with the LCC1802 compiler ran just over 13 dhrystones/sec.  The lowest performance in the table was 36 for the 1mhz 6510 in a commodore 64.  The PC/XT class machines give results in the 300-400 range - call it 30X mine.  There's some hell's brew of clock speed, clocks per instruction, instruction complexity and compiler cleverosity keeping me in the cellar!  If anyone is able to run one of the vintage C compiler below I'd love to get a look at the assembly output.

        I've also started profiling the code generated by LCC1802 in my simulator.  There are indeed a bunch of hot-spots.  

        Each pass through the benchmark executes 7150 1802 instructions.  Almost 1/3 of them are in the unsigned integer multiply which is used mostly in subscript calculation.  A lot of them are multiplications by smallish constants(50 and 100).  I can probably improve those dramatically either by implementing a 16X8 bit multiply or by doing shifts and adds.  The latter might get pretty bulky.

        The next big hitter is the string compare(946 instructions or 13%) which is written in C and should be easy to fix.

        After that is divide(582 or 8%) which is only used once and I probably can't do much about.

        Call/Return are next - soaking up 10% of the cycles(686 inst) for 27 uses.  I can imagine using direct SEP instructions instead of SCRT at the bottom level but I don't know if i'm clever enough to actually do it.

        There's a block copy routine that's only used once but chews up 281 instructions(4%) moving 39 bytes.

        After that it's down to random C instructions generally executed only once per pass.  I've got some good ideas from having a hard look at the code.

        So I'm a bit more confident that I can reach my goal of cutting the time in half but I'm still stung by those 8088 results.
        *----------------DHRYSTONE VERSION 1.0 RESULTS BEGIN--------------------------
         *
         * MACHINE      MICROPROCESSOR  OPERATING       COMPILER        DHRYSTONES/SEC.
         * TYPE                         SYSTEM                          NO REG  REGS
         * --------------------------   ------------    -----------     ---------------
         * Commodore 64 6510-1MHz       C64 ROM         C Power 2.8       36      36
         * IBM PC/XT    8088-4.77Mhz    PC/IX           cc               271     294
         * IBM PC/XT    8088-4.77Mhz    COHERENT 2.3.43 MarkWilliams cc  296     317
         * IBM PC/XT    8088-4.77Mhz    Venix/86 2.0    cc               297     324
         * IBM PC       8088-4.77Mhz    MSDOS 2.0       b16cc 2.0        310     340
         * PC/XT        8088-4.77Mhz    Venix/86 SYS V  cc               339     377
         * IBM PC       8088-4.77Mhz    MSDOS 2.0       CI-C86 2.20M     390     390
         * IBM PC/XT    8088-4.77Mhz    PCDOS 2.1       Wizard 2.1       367     403
         * IBM PC/XT    8088-4.77Mhz    PCDOS 3.1       Lattice 2.15     403     403 @
         * IBM PC       8088-4.77Mhz    PCDOS 3.1       Datalight 1.10   416     416
         * IBM PC       NEC V20-4.77Mhz MSDOS 3.1       MS 3.1           387     420
         * IBM PC/XT    8088-4.77Mhz    PCDOS 2.1       Microsoft 3.0    390     427
         * IBM PC       NEC V20-4.77Mhz MSDOS 3.1       MS 3.1 (186)     393     427
        
         * VAX 11/780   -               UNIX 5.2        cc              1515    1562


        To: cosmacelf@yahoogroups.com
        From: bill_rowe_ottawa@...
        Date: Sun, 15 Sep 2013 17:09:06 -0400
        Subject: [cosmacelf] Dhrystone Benchmark LCC1802

         

        As I work on the C compiler I've kept my eyes open for a simple benchmark to add to the test suite and as a target for optimization.

        The Dhrystone integer benchmark dates from the 80's and is not much used anymore but I was able to find the source which is written in C.  http://www.netlib.org/benchmark/dhry-c 
        It is very well documented and fairly easy to understand at a high level.  It includes integer math, procedure calls, and string manipulation.  It ends with a printout of predicted and actual results so you can tell it worked.

        Compiling it for the 1802 I only had to comment out the parts that call the operating system for time information. Other than that it compiled and ran without a hitch. (so Yay!)

        My initial run took about 7.5 seconds for 100 passes or 13.333 Dhrystones/second.  This compares to a classic VAX 11/780 which did 1757 Dhrystones/sec.  I.E. the VAX score is 133 times the 1802's.

        I think the VAX executed 500,000 instructions per sec vs 100,000 for the MC so the VAX did 500000/1757=284 vax instructions per pass and the 1802 is doing about 7500.  Frankly, looking at the benchmark code, 284 seems like an impossibly small number for the work being done but 7500 must leave some room for improvement.

        I've tried doing quick and dirty profiling by interrupting the code running on the emulator to see where it is and try to find hot spots.  Nothing very obvious showed up so I think this will be a long slog rather than any sort of magic bullet.  In the end I'd be amazed if I could cut the run time in half but that seems like a goal.  At any rate, I'll have a good solid, directed, look at the code being generated and have a bunch of test cases for the peephole optimizer.



      • Egan Ford
        Hi Bill, Another Bill (Bill Buckels) has been recently been benchmarking Dhrystone on the Apple II and C64 (both 1MHz 6502) with both Aztec C and cc65. His
        Message 3 of 13 , Sep 28, 2013
        • 0 Attachment
          Hi Bill,

          Another Bill (Bill Buckels) has been recently been benchmarking Dhrystone on the Apple II and C64 (both 1MHz 6502) with both Aztec C and cc65.  His results, links, etc... are posted at comp.sys.apple2.programmer and comp.sys.cbm.

          I've been doing my own comparison of all popular 8-bit processors available in hobby and personal computers as they were available in hobby/personal computers (i.e. MHz, platform bottlenecks, etc...).  I didn't want to use something like Dhrystone with a C compiler because the variability of performance on the same platform with different compilers was too great.  So I am using assembly for all benchmarks and doing my best to take advantage of all advantages each processor has.  It's my assertion that of the processors I selected that assembly was used for most titles.

          My 8088 results as run on a real 4.77MHz PC vs. a 1 MHz 6502 (Apple II) is 7.5x faster (not the 10x the Dhrystone results report, however in my testing 8088/C:8088/ASM vs. 6502/C:6502/ASM ratios favor the 8088 compilers).  There are a lot of factors I am trying to track in this comparison (e.g. overlapping fetch and execute or decode) to explain the results.  So far I've completed one benchmark (Pi to 1000 digits) on the following processors: 8008, 8080, 8085, z80, 6502, 6800, 6809, 68008, 65816, 8088, TMS9900.  The 1802 is the last processor on my list and I have not had the time to write the code.  I've got a cross assembler setup and have written my own cycle counting simulator for the 1802, now I just need time.

          My rules require I leverage all processors to their full potential; some have huge advantages, e.g. 6809, 8088, 68008 have hardware multiplication and some instructions can operate on 16 or 32 bits requiring less instruction fetches.  The 8088 and 68008 also have HW div.  Division is used a lot for computing Pi.  So, I decided to add a 2nd benchmark for comparison that does not use mult or div (Sieve of Eratosthenes).  Based on some simple instruction timing loops I expect the 4.77 MHz 8088 to only be 2.6x faster than the 1 MHz 6502.

          I expect the 1802 @ 1.76 MHz (speed available in popular 1802 systems) to perform poorly compared to the 6502.  The 6502 can fetch, decode, and execute instructions in 2-7 cycles.  The average being 3.08 cycles/instruction for my Pi benchmark.  The 1802 requires 16 or 24 cycles/instruction.  Unless I misread something I am expecting the 1.76MHz 1802 to run at 1/3 the speed of the 1MHz 6502.

          I'm not surprised by the 8088 Dhrystone results given its overlapping fetch/execute architecture, 16-bit registers, faster clock, and hardware multiplication (32bit results).  Mult takes time in the 8088, and while its cranking away the next 4 instruction bytes are pulled in parallel into an on-chip cache saving costly fetch time.  The 6502 has a primitive version of this where the instruction decode and the fetch for next instruction happen in parallel, hence shorter cycle times.  The 6800, 6809, and 68008 also work the same way.  The 8080 and z80 for a certain class of instructions does the same during execution.  AFAIK, the 1802 like the 8008 has no overlapping operations.  IIRC, neither does the TMS9900.

          Thanks for all your work on LCC1802 and this benchmark.  I love reading about it and look forward to testing it myself in my simulator.  I am still amazed that LCC1802 exists!

          Cheers,

          Egan

          On Sat, Sep 28, 2013 at 3:25 PM, bill rowe <bill_rowe_ottawa@...> wrote:
           

          I've done a bit more looking at the dhrystone benchmark. and I found a table of results for vintage processors at http://www.dunnington.u-net.com/public/dhrystone.c

          I'm glad to have found the table but I'm a bit taken aback by the results.  My first pass with the LCC1802 compiler ran just over 13 dhrystones/sec.  The lowest performance in the table was 36 for the 1mhz 6510 in a commodore 64.  The PC/XT class machines give results in the 300-400 range - call it 30X mine.  There's some hell's brew of clock speed, clocks per instruction, instruction complexity and compiler cleverosity keeping me in the cellar!  If anyone is able to run one of the vintage C compiler below I'd love to get a look at the assembly output.

          I've also started profiling the code generated by LCC1802 in my simulator.  There are indeed a bunch of hot-spots.  

          Each pass through the benchmark executes 7150 1802 instructions.  Almost 1/3 of them are in the unsigned integer multiply which is used mostly in subscript calculation.  A lot of them are multiplications by smallish constants(50 and 100).  I can probably improve those dramatically either by implementing a 16X8 bit multiply or by doing shifts and adds.  The latter might get pretty bulky.

          The next big hitter is the string compare(946 instructions or 13%) which is written in C and should be easy to fix.

          After that is divide(582 or 8%) which is only used once and I probably can't do much about.

          Call/Return are next - soaking up 10% of the cycles(686 inst) for 27 uses.  I can imagine using direct SEP instructions instead of SCRT at the bottom level but I don't know if i'm clever enough to actually do it.

          There's a block copy routine that's only used once but chews up 281 instructions(4%) moving 39 bytes.

          After that it's down to random C instructions generally executed only once per pass.  I've got some good ideas from having a hard look at the code.

          So I'm a bit more confident that I can reach my goal of cutting the time in half but I'm still stung by those 8088 results.
          *----------------DHRYSTONE VERSION 1.0 RESULTS BEGIN--------------------------
           *
           * MACHINE      MICROPROCESSOR  OPERATING       COMPILER        DHRYSTONES/SEC.
           * TYPE                         SYSTEM                          NO REG  REGS
           * --------------------------   ------------    -----------     ---------------
           * Commodore 64 6510-1MHz       C64 ROM         C Power 2.8       36      36
           * IBM PC/XT    8088-4.77Mhz    PC/IX           cc               271     294
           * IBM PC/XT    8088-4.77Mhz    COHERENT 2.3.43 MarkWilliams cc  296     317
           * IBM PC/XT    8088-4.77Mhz    Venix/86 2.0    cc               297     324
           * IBM PC       8088-4.77Mhz    MSDOS 2.0       b16cc 2.0        310     340
           * PC/XT        8088-4.77Mhz    Venix/86 SYS V  cc               339     377
           * IBM PC       8088-4.77Mhz    MSDOS 2.0       CI-C86 2.20M     390     390
           * IBM PC/XT    8088-4.77Mhz    PCDOS 2.1       Wizard 2.1       367     403
           * IBM PC/XT    8088-4.77Mhz    PCDOS 3.1       Lattice 2.15     403     403 @
           * IBM PC       8088-4.77Mhz    PCDOS 3.1       Datalight 1.10   416     416
           * IBM PC       NEC V20-4.77Mhz MSDOS 3.1       MS 3.1           387     420
           * IBM PC/XT    8088-4.77Mhz    PCDOS 2.1       Microsoft 3.0    390     427
           * IBM PC       NEC V20-4.77Mhz MSDOS 3.1       MS 3.1 (186)     393     427
          
           * VAX 11/780   -               UNIX 5.2        cc              1515    1562


          To: cosmacelf@yahoogroups.com
          From: bill_rowe_ottawa@...
          Date: Sun, 15 Sep 2013 17:09:06 -0400
          Subject: [cosmacelf] Dhrystone Benchmark LCC1802


           

          As I work on the C compiler I've kept my eyes open for a simple benchmark to add to the test suite and as a target for optimization.

          The Dhrystone integer benchmark dates from the 80's and is not much used anymore but I was able to find the source which is written in C.  http://www.netlib.org/benchmark/dhry-c 
          It is very well documented and fairly easy to understand at a high level.  It includes integer math, procedure calls, and string manipulation.  It ends with a printout of predicted and actual results so you can tell it worked.

          Compiling it for the 1802 I only had to comment out the parts that call the operating system for time information. Other than that it compiled and ran without a hitch. (so Yay!)

          My initial run took about 7.5 seconds for 100 passes or 13.333 Dhrystones/second.  This compares to a classic VAX 11/780 which did 1757 Dhrystones/sec.  I.E. the VAX score is 133 times the 1802's.

          I think the VAX executed 500,000 instructions per sec vs 100,000 for the MC so the VAX did 500000/1757=284 vax instructions per pass and the 1802 is doing about 7500.  Frankly, looking at the benchmark code, 284 seems like an impossibly small number for the work being done but 7500 must leave some room for improvement.

          I've tried doing quick and dirty profiling by interrupting the code running on the emulator to see where it is and try to find hot spots.  Nothing very obvious showed up so I think this will be a long slog rather than any sort of magic bullet.  In the end I'd be amazed if I could cut the run time in half but that seems like a goal.  At any rate, I'll have a good solid, directed, look at the code being generated and have a bunch of test cases for the peephole optimizer.




        • bill rowe
          Wow, excellent info. Thanks. I did not know about overlap in the 8088 or 6502, i just thought instructions needed more or fewer cycles. Besides the instruction
          Message 4 of 13 , Sep 28, 2013
          • 0 Attachment
            Wow, excellent info. Thanks.

            I did not know about overlap in the 8088 or 6502, i just thought instructions needed more or fewer cycles.

            Besides the instruction execution ratio the 1802 suffers badly in arbitrary storage access which happens a lot in C. Having a lot of registers helps but only goes so far.

            The 1802 is such a delight to write code for that it cries out for a blazing clock speed at least internally.

            I'm off to look at the apple dhrystone results and i'd like to see your work as well. Is it posted somewhere?

            Thanks again.

            On 2013-09-28, at 8:01 PM, "Egan Ford" <datajerk@...> wrote:

             

            Hi Bill,

            Another Bill (Bill Buckels) has been recently been benchmarking Dhrystone on the Apple II and C64 (both 1MHz 6502) with both Aztec C and cc65.  His results, links, etc... are posted at comp.sys.apple2.programmer and comp.sys.cbm.

            I've been doing my own comparison of all popular 8-bit processors available in hobby and personal computers as they were available in hobby/personal computers (i.e. MHz, platform bottlenecks, etc...).  I didn't want to use something like Dhrystone with a C compiler because the variability of performance on the same platform with different compilers was too great.  So I am using assembly for all benchmarks and doing my best to take advantage of all advantages each processor has.  It's my assertion that of the processors I selected that assembly was used for most titles.

            My 8088 results as run on a real 4.77MHz PC vs. a 1 MHz 6502 (Apple II) is 7.5x faster (not the 10x the Dhrystone results report, however in my testing 8088/C:8088/ASM vs. 6502/C:6502/ASM ratios favor the 8088 compilers).  There are a lot of factors I am trying to track in this comparison (e.g. overlapping fetch and execute or decode) to explain the results.  So far I've completed one benchmark (Pi to 1000 digits) on the following processors: 8008, 8080, 8085, z80, 6502, 6800, 6809, 68008, 65816, 8088, TMS9900.  The 1802 is the last processor on my list and I have not had the time to write the code.  I've got a cross assembler setup and have written my own cycle counting simulator for the 1802, now I just need time.

            My rules require I leverage all processors to their full potential; some have huge advantages, e.g. 6809, 8088, 68008 have hardware multiplication and some instructions can operate on 16 or 32 bits requiring less instruction fetches.  The 8088 and 68008 also have HW div.  Division is used a lot for computing Pi.  So, I decided to add a 2nd benchmark for comparison that does not use mult or div (Sieve of Eratosthenes).  Based on some simple instruction timing loops I expect the 4.77 MHz 8088 to only be 2.6x faster than the 1 MHz 6502.

            I expect the 1802 @ 1.76 MHz (speed available in popular 1802 systems) to perform poorly compared to the 6502.  The 6502 can fetch, decode, and execute instructions in 2-7 cycles.  The average being 3.08 cycles/instruction for my Pi benchmark.  The 1802 requires 16 or 24 cycles/instruction.  Unless I misread something I am expecting the 1.76MHz 1802 to run at 1/3 the speed of the 1MHz 6502.

            I'm not surprised by the 8088 Dhrystone results given its overlapping fetch/execute architecture, 16-bit registers, faster clock, and hardware multiplication (32bit results).  Mult takes time in the 8088, and while its cranking away the next 4 instruction bytes are pulled in parallel into an on-chip cache saving costly fetch time.  The 6502 has a primitive version of this where the instruction decode and the fetch for next instruction happen in parallel, hence shorter cycle times.  The 6800, 6809, and 68008 also work the same way.  The 8080 and z80 for a certain class of instructions does the same during execution.  AFAIK, the 1802 like the 8008 has no overlapping operations.  IIRC, neither does the TMS9900.

            Thanks for all your work on LCC1802 and this benchmark.  I love reading about it and look forward to testing it myself in my simulator.  I am still amazed that LCC1802 exists!

            Cheers,

            Egan

            On Sat, Sep 28, 2013 at 3:25 PM, bill rowe <bill_rowe_ottawa@...> wrote:
             

            I've done a bit more looking at the dhrystone benchmark. and I found a table of results for vintage processors at http://www.dunnington.u-net.com/public/dhrystone.c

            I'm glad to have found the table but I'm a bit taken aback by the results.  My first pass with the LCC1802 compiler ran just over 13 dhrystones/sec.  The lowest performance in the table was 36 for the 1mhz 6510 in a commodore 64.  The PC/XT class machines give results in the 300-400 range - call it 30X mine.  There's some hell's brew of clock speed, clocks per instruction, instruction complexity and compiler cleverosity keeping me in the cellar!  If anyone is able to run one of the vintage C compiler below I'd love to get a look at the assembly output.

            I've also started profiling the code generated by LCC1802 in my simulator.  There are indeed a bunch of hot-spots.  

            Each pass through the benchmark executes 7150 1802 instructions.  Almost 1/3 of them are in the unsigned integer multiply which is used mostly in subscript calculation.  A lot of them are multiplications by smallish constants(50 and 100).  I can probably improve those dramatically either by implementing a 16X8 bit multiply or by doing shifts and adds.  The latter might get pretty bulky.

            The next big hitter is the string compare(946 instructions or 13%) which is written in C and should be easy to fix.

            After that is divide(582 or 8%) which is only used once and I probably can't do much about.

            Call/Return are next - soaking up 10% of the cycles(686 inst) for 27 uses.  I can imagine using direct SEP instructions instead of SCRT at the bottom level but I don't know if i'm clever enough to actually do it.

            There's a block copy routine that's only used once but chews up 281 instructions(4%) moving 39 bytes.

            After that it's down to random C instructions generally executed only once per pass.  I've got some good ideas from having a hard look at the code.

            So I'm a bit more confident that I can reach my goal of cutting the time in half but I'm still stung by those 8088 results.
            *----------------DHRYSTONE VERSION 1.0 RESULTS BEGIN--------------------------
             *
             * MACHINE      MICROPROCESSOR  OPERATING       COMPILER        DHRYSTONES/SEC.
             * TYPE                         SYSTEM                          NO REG  REGS
             * --------------------------   ------------    -----------     ---------------
             * Commodore 64 6510-1MHz       C64 ROM         C Power 2.8       36      36
             * IBM PC/XT    8088-4.77Mhz    PC/IX           cc               271     294
             * IBM PC/XT    8088-4.77Mhz    COHERENT 2.3.43 MarkWilliams cc  296     317
             * IBM PC/XT    8088-4.77Mhz    Venix/86 2.0    cc               297     324
             * IBM PC       8088-4.77Mhz    MSDOS 2.0       b16cc 2.0        310     340
             * PC/XT        8088-4.77Mhz    Venix/86 SYS V  cc               339     377
             * IBM PC       8088-4.77Mhz    MSDOS 2.0       CI-C86 2.20M     390     390
             * IBM PC/XT    8088-4.77Mhz    PCDOS 2.1       Wizard 2.1       367     403
             * IBM PC/XT    8088-4.77Mhz    PCDOS 3.1       Lattice 2.15     403     403 @
             * IBM PC       8088-4.77Mhz    PCDOS 3.1       Datalight 1.10   416     416
             * IBM PC       NEC V20-4.77Mhz MSDOS 3.1       MS 3.1           387     420
             * IBM PC/XT    8088-4.77Mhz    PCDOS 2.1       Microsoft 3.0    390     427
             * IBM PC       NEC V20-4.77Mhz MSDOS 3.1       MS 3.1 (186)     393     427
            
             * VAX 11/780   -               UNIX 5.2        cc              1515    1562


            To: cosmacelf@yahoogroups.com
            From: bill_rowe_ottawa@...
            Date: Sun, 15 Sep 2013 17:09:06 -0400
            Subject: [cosmacelf] Dhrystone Benchmark LCC1802


             

            As I work on the C compiler I've kept my eyes open for a simple benchmark to add to the test suite and as a target for optimization.

            The Dhrystone integer benchmark dates from the 80's and is not much used anymore but I was able to find the source which is written in C.  http://www.netlib.org/benchmark/dhry-c 
            It is very well documented and fairly easy to understand at a high level.  It includes integer math, procedure calls, and string manipulation.  It ends with a printout of predicted and actual results so you can tell it worked.

            Compiling it for the 1802 I only had to comment out the parts that call the operating system for time information. Other than that it compiled and ran without a hitch. (so Yay!)

            My initial run took about 7.5 seconds for 100 passes or 13.333 Dhrystones/second.  This compares to a classic VAX 11/780 which did 1757 Dhrystones/sec.  I.E. the VAX score is 133 times the 1802's.

            I think the VAX executed 500,000 instructions per sec vs 100,000 for the MC so the VAX did 500000/1757=284 vax instructions per pass and the 1802 is doing about 7500.  Frankly, looking at the benchmark code, 284 seems like an impossibly small number for the work being done but 7500 must leave some room for improvement.

            I've tried doing quick and dirty profiling by interrupting the code running on the emulator to see where it is and try to find hot spots.  Nothing very obvious showed up so I think this will be a long slog rather than any sort of magic bullet.  In the end I'd be amazed if I could cut the run time in half but that seems like a goal.  At any rate, I'll have a good solid, directed, look at the code being generated and have a bunch of test cases for the peephole optimizer.




          • Egan Ford
            ... No, I am still working on it. What was a weekend project to right a wrong re: a 6502 and 8088 comparison made in another blog sometime ago turned into a
            Message 5 of 13 , Sep 28, 2013
            • 0 Attachment
              > I'm off to look at the apple dhrystone results and i'd like to see your work
              > as well. Is it posted somewhere?

              No, I am still working on it. What was a weekend project to right a
              wrong re: a 6502 and 8088 comparison made in another blog sometime ago
              turned into a > 1 year project. The more I learn, the more I want to
              explore. I took a new job last May have been too busy for a lot of
              retro fun. I expect around the end of the year things will get
              quieter and I'll be able to finish my work and post it to
              retrocompute.org.
            • ajparent1
              Vax 11/780 was 1MIPS or very close to that. It was the definition of 1VUP. That measure was for a multiuser/multitasking system unlike the PC running 4.77mhz
              Message 6 of 13 , Sep 28, 2013
              • 0 Attachment

                 Vax 11/780 was 1MIPS or very close to that.  It was the definition of 1VUP.  That measure 

                was for a multiuser/multitasking system unlike the  PC running 4.77mhz single thread/task OS.  

                Later generations were faster.


                Clock speed is meaningless as 1802 is 8/16/24 clocks per instruction and at 5V typically maxed 

                at 3.2mhz often slower.  8088 wqs minimum 4 clocks to as many as a repeat prefix tied to the 

                next instruction needed.  However the 8088 is a CISC machine with far faster stack and call/return

                that 1802.


                The 8088 loads register faster.

                The 8088 has a far more extensive push/pop stack 

                The 8088 has faster call/return

                The 8088 has 16bit add/subtract with 16 bit registers. (its a 16bit machine with an 8bit external bus)>


                And the 8088 draws even for CMOS versions far more power in the process.


                Personally a 8080 or Z80 would be a more fair comparison.  Neither have a multiply or divide

                but do have faster stack handling and call/return.  I'd add the 6502 is also a good compare as it

                was an amazingly fast 8bitter.



                Allison



                ---In cosmacelf@yahoogroups.com, <datajerk@...> wrote:

                > I'm off to look at the apple dhrystone results and i'd like to see your work
                > as well. Is it posted somewhere?

                No, I am still working on it. What was a weekend project to right a
                wrong re: a 6502 and 8088 comparison made in another blog sometime ago
                turned into a > 1 year project. The more I learn, the more I want to
                explore. I took a new job last May have been too busy for a lot of
                retro fun. I expect around the end of the year things will get
                quieter and I'll be able to finish my work and post it to
                retrocompute.org.
              • ajparent1
                In general I ve found the 1802 to be slow. Not because of clock speed but the general architecture. While I has a stack mechanism it is tuned for a lack of
                Message 7 of 13 , Sep 28, 2013
                • 0 Attachment

                  In general I've found the 1802 to be slow.  Not because of clock speed but the general architecture. While I has a stack mechanism it is tuned for a lack of words for 8bit operations and only from the D register.

                  So the cost for storing a 16bit register is high and equally high for restoring it.  The other point of pain for speed is the D is the pass through for  loading the 16bit registers or saving them making loding literals high cost.  The SCRT routine is good but the register usage is high and the amount of cycles to do a call or return are very high.  Many of the 8bit micros are tough to do math but the lowly 8080 has three

                  16bit registers that can be added for a 16 bit result and shifted as well, that makes for optimizations.


                  This makes it hard for C programming as its difficult to optimize and the general addressing modes are primitive in a language designed on/for a very rich two address architecture (PDP-11) . 


                  The fact that many of the 1802 system are often 2mhz or less hurts but total clocks are high and even at 3.2mhz its not a speed demon. At that sped the instruction rate is 400K/200K/133K I/S for 1/2/3 cycle instructions.   


                  For the 1802 the net cost (overhead) to perform a complex and recursive action with many routines involved has a high cost.  Again total instruction counts to perform common programming tasks are 

                  often very high. 


                  An example.

                  The 8085 (not 8088) at 6.144mhz (slow part) is has instructions that take 2 to 22 clocks

                  making them average speed based on core common instructions around or above 500k I/S.  The 6502 is 

                  at least that fast or faster.  The key is how many instructions can you do per second and then how many

                  instructions are needed to to basic things.  TO save the accumulator and flags on 8085 its only 12 cycles

                  and 10 to restore.  A 8080 call is 18 cycles and a return is 10.  Those were for the instruction and the action.  Even if we slow down the 8085 we see the number of clocks needed to do may things are lower.


                  Most of the later CPUs like the Z80, 65012, 6809 were far more 16bit oriented or could accommodate

                  16 bit operations including addressing far more easily.


                  Not to say the 1802 is silly or useless only that it's architecture is optimized toward a set of problems

                  that were solvable only at low power and its simplicity allowed CMOS implementation.  CMOS 8085 and Z80 were many years later.  The only other CMOS CPU was the intersil 6100 (pdp-8 on a chip) and

                  that also had its weirdness.  Both are extremely interesting and fun to do tiny systems with programming

                  on a sheet or paper.  They both also tech what a computer needs to do to be useful as well as how they work inside.



                  Allison







                  ---In cosmacelf@yahoogroups.com, <cosmacelf@yahoogroups.com> wrote:

                  I've done a bit more looking at the dhrystone benchmark. and I found a table of results for vintage processors at http://www.dunnington.u-net.com/public/dhrystone.c

                  I'm glad to have found the table but I'm a bit taken aback by the results.  My first pass with the LCC1802 compiler ran just over 13 dhrystones/sec.  The lowest performance in the table was 36 for the 1mhz 6510 in a commodore 64.  The PC/XT class machines give results in the 300-400 range - call it 30X mine.  There's some hell's brew of clock speed, clocks per instruction, instruction complexity and compiler cleverosity keeping me in the cellar!  If anyone is able to run one of the vintage C compiler below I'd love to get a look at the assembly output.

                  I've also started profiling the code generated by LCC1802 in my simulator.  There are indeed a bunch of hot-spots.  

                  Each pass through the benchmark executes 7150 1802 instructions.  Almost 1/3 of them are in the unsigned integer multiply which is used mostly in subscript calculation.  A lot of them are multiplications by smallish constants(50 and 100).  I can probably improve those dramatically either by implementing a 16X8 bit multiply or by doing shifts and adds.  The latter might get pretty bulky.

                  The next big hitter is the string compare(946 instructions or 13%) which is written in C and should be easy to fix.

                  After that is divide(582 or 8%) which is only used once and I probably can't do much about.

                  Call/Return are next - soaking up 10% of the cycles(686 inst) for 27 uses.  I can imagine using direct SEP instructions instead of SCRT at the bottom level but I don't know if i'm clever enough to actually do it.

                  There's a block copy routine that's only used once but chews up 281 instructions(4%) moving 39 bytes.

                  After that it's down to random C instructions generally executed only once per pass.  I've got some good ideas from having a hard look at the code.

                  So I'm a bit more confident that I can reach my goal of cutting the time in half but I'm still stung by those 8088 results.
                  *----------------DHRYSTONE VERSION 1.0 RESULTS BEGIN--------------------------
                   *
                   * MACHINE      MICROPROCESSOR  OPERATING       COMPILER        DHRYSTONES/SEC.
                   * TYPE                         SYSTEM                          NO REG  REGS
                   * --------------------------   ------------    -----------     ---------------
                   * Commodore 64 6510-1MHz       C64 ROM         C Power 2.8       36      36
                   * IBM PC/XT    8088-4.77Mhz    PC/IX           cc               271     294
                   * IBM PC/XT    8088-4.77Mhz    COHERENT 2.3.43 MarkWilliams cc  296     317
                   * IBM PC/XT    8088-4.77Mhz    Venix/86 2.0    cc               297     324
                   * IBM PC       8088-4.77Mhz    MSDOS 2.0       b16cc 2.0        310     340
                   * PC/XT        8088-4.77Mhz    Venix/86 SYS V  cc               339     377
                   * IBM PC       8088-4.77Mhz    MSDOS 2.0       CI-C86 2.20M     390     390
                   * IBM PC/XT    8088-4.77Mhz    PCDOS 2.1       Wizard 2.1       367     403
                   * IBM PC/XT    8088-4.77Mhz    PCDOS 3.1       Lattice 2.15     403     403 @
                   * IBM PC       8088-4.77Mhz    PCDOS 3.1       Datalight 1.10   416     416
                   * IBM PC       NEC V20-4.77Mhz MSDOS 3.1       MS 3.1           387     420
                   * IBM PC/XT    8088-4.77Mhz    PCDOS 2.1       Microsoft 3.0    390     427
                   * IBM PC       NEC V20-4.77Mhz MSDOS 3.1       MS 3.1 (186)     393     427
                  
                   * VAX 11/780   -               UNIX 5.2        cc              1515    1562


                  To: cosmacelf@yahoogroups.com
                  From: bill_rowe_ottawa@...
                  Date: Sun, 15 Sep 2013 17:09:06 -0400
                  Subject: [cosmacelf] Dhrystone Benchmark LCC1802

                   

                  As I work on the C compiler I've kept my eyes open for a simple benchmark to add to the test suite and as a target for optimization.

                  The Dhrystone integer benchmark dates from the 80's and is not much used anymore but I was able to find the source which is written in C.  http://www.netlib.org/benchmark/dhry-c 
                  It is very well documented and fairly easy to understand at a high level.  It includes integer math, procedure calls, and string manipulation.  It ends with a printout of predicted and actual results so you can tell it worked.

                  Compiling it for the 1802 I only had to comment out the parts that call the operating system for time information. Other than that it compiled and ran without a hitch. (so Yay!)

                  My initial run took about 7.5 seconds for 100 passes or 13.333 Dhrystones/second.  This compares to a classic VAX 11/780 which did 1757 Dhrystones/sec.  I.E. the VAX score is 133 times the 1802's.

                  I think the VAX executed 500,000 instructions per sec vs 100,000 for the MC so the VAX did 500000/1757=284 vax instructions per pass and the 1802 is doing about 7500.  Frankly, looking at the benchmark code, 284 seems like an impossibly small number for the work being done but 7500 must leave some room for improvement.

                  I've tried doing quick and dirty profiling by interrupting the code running on the emulator to see where it is and try to find hot spots.  Nothing very obvious showed up so I think this will be a long slog rather than any sort of magic bullet.  In the end I'd be amazed if I could cut the run time in half but that seems like a goal.  At any rate, I'll have a good solid, directed, look at the code being generated and have a bunch of test cases for the peephole optimizer.



                • Lee Hart
                  ... I agree. The 1802 is slow; but that s because it wasn t optimized for speed. It was optimized for simplicity and low power consumption. Its architecture
                  Message 8 of 13 , Sep 28, 2013
                  • 0 Attachment
                    > In general I've found the 1802 to be slow. Not because of clock speed
                    > but the general architecture...

                    I agree. The 1802 is slow; but that's because it wasn't optimized for
                    speed. It was optimized for simplicity and low power consumption. Its
                    architecture was chosen accordingly.

                    Try figuring out how many Dhrystones per milliwatt it gets, compared to
                    these other CPUs. :-)

                    --
                    Ingenuity gets you through times of no money better than money
                    will get you through times of no ingenuity. -- Terry Pratchett
                    --
                    Lee A. Hart, http://www.sunrise-ev.com/LeesEVs.htm
                  • bill_rowe@rogers.com
                    And remember that for most purposes where I use an olduino/arduino - the 1802 even at 1.6mhz is plenty fast. If you re processing button pushes, sensor
                    Message 9 of 13 , Sep 29, 2013
                    • 0 Attachment
                      And remember that for most purposes where I use an olduino/arduino - the 1802 even at 1.6mhz is plenty fast. If you're processing button pushes, sensor readings, or blinking lights, millisecond response times are fine.

                      My reason for using the dhrystone benchmark isn't so much to compete with other processors(well, that too of course) but to have an objective standard to measure my compiler improvements against.

                      --- In cosmacelf@yahoogroups.com, Lee Hart <leeahart@...> wrote:
                      >
                      > > In general I've found the 1802 to be slow. Not because of clock speed
                      > > but the general architecture...
                      >
                      > I agree. The 1802 is slow; but that's because it wasn't optimized for
                      > speed. It was optimized for simplicity and low power consumption. Its
                      > architecture was chosen accordingly.
                      >
                      > Try figuring out how many Dhrystones per milliwatt it gets, compared to
                      > these other CPUs. :-)
                      >
                      > --
                      > Ingenuity gets you through times of no money better than money
                      > will get you through times of no ingenuity. -- Terry Pratchett
                      > --
                      > Lee A. Hart, http://www.sunrise-ev.com/LeesEVs.htm
                      >
                    • ajparent1
                      For most control apps it s plenty fast enough. Try to treat it like the run of the mill computer with OS and you feel the overhead. Still fun to do and at
                      Message 10 of 13 , Sep 29, 2013
                      • 0 Attachment

                         For most control apps it's plenty fast enough.  Try to treat it like the run of the mill computer with OS and you feel the overhead.  Still fun to do and at 3.2mhz it's not all that bad.  Then again I have little interest myself for the 1861 (requires the 1.7mhz clock).  When I do video its more dots.  Then again using a self refreshing  240x128 panel makes for decent low res graphics and text without pain.


                        I think the Dhrystone test can be better.  Look at the register load, stack ops, and especially the subroutine call/return.  If your doing any display during the calcs that will be costly too.


                        Its an odd critter.  there are few cpus around that are that odd.


                        Allison




                        ---In cosmacelf@yahoogroups.com, <cosmacelf@yahoogroups.com> wrote:

                        And remember that for most purposes where I use an olduino/arduino - the 1802 even at 1.6mhz is plenty fast. If you're processing button pushes, sensor readings, or blinking lights, millisecond response times are fine.

                        My reason for using the dhrystone benchmark isn't so much to compete with other processors(well, that too of course) but to have an objective standard to measure my compiler improvements against.

                        --- In cosmacelf@yahoogroups.com, Lee Hart <leeahart@...> wrote:
                        >
                        > > In general I've found the 1802 to be slow. Not because of clock speed
                        > > but the general architecture...
                        >
                        > I agree. The 1802 is slow; but that's because it wasn't optimized for
                        > speed. It was optimized for simplicity and low power consumption. Its
                        > architecture was chosen accordingly.
                        >
                        > Try figuring out how many Dhrystones per milliwatt it gets, compared to
                        > these other CPUs. :-)
                        >
                        > --
                        > Ingenuity gets you through times of no money better than money
                        > will get you through times of no ingenuity. -- Terry Pratchett
                        > --
                        > Lee A. Hart, http://www.sunrise-ev.com/LeesEVs.htm
                        >
                      • bill rowe
                        The Rhinestone Compiler isn t quite ready for prime time but the results are pretty good. The very earliest run scored 13.33 Dhrystones/second - 7500
                        Message 11 of 13 , Oct 14, 2013
                        • 0 Attachment
                          The Rhinestone Compiler isn't quite ready for prime time but the results are pretty good.  The very earliest run scored 13.33 Dhrystones/second - 7500 instructions at a nominal 100,000 instructions/sec.  The most recent run scored 26.8 executing 3731 instructions per pass.  The 26.8 compares favorably with the 36 score of the 6502 in an apple IIe since the 1802 is running well below its rated speed.

                          By far the biggest improvements came from improving support routines rather than directly optimizing the emitted code.  
                          -The compiler now generates inline shifts and adds for multiplications by small constants and the multiplication routine has been optimized for small operands(which are common);
                          -The division routine was re-coded to better use 1802 instructions and, again, expedite smaller divides;
                          -Two common string routines(copy and compare) were re-written in assembly;
                          Changing these routines pulled out thousands of instructions from each pass.

                          The peephole optimizer runs over the code after it's been emitted by the compiler and before it's assembled looking for simple changes such as combining multiple accesses to the same storage location or eliminating unnecessary register loads . By comparison the peephole rules pulled out 500-600 instructions.

                          At the same time, the object module got about 10-20% smaller so overall I'm calling this a win.

                          The current version of LCC1802 is posted here https://sites.google.com/site/lcc1802/

                          From: bill_rowe_ottawa@...
                          To: cosmacelf@yahoogroups.com
                          Subject: Dhrystone Benchmark LCC1802
                          Date: Sun, 15 Sep 2013 17:09:06 -0400

                          As I work on the C compiler I've kept my eyes open for a simple benchmark to add to the test suite and as a target for optimization.

                          The Dhrystone integer benchmark dates from the 80's and is not much used anymore but I was able to find the source which is written in C.  http://www.netlib.org/benchmark/dhry-c 
                          It is very well documented and fairly easy to understand at a high level.  It includes integer math, procedure calls, and string manipulation.  It ends with a printout of predicted and actual results so you can tell it worked.

                          Compiling it for the 1802 I only had to comment out the parts that call the operating system for time information. Other than that it compiled and ran without a hitch. (so Yay!)

                          My initial run took about 7.5 seconds for 100 passes or 13.333 Dhrystones/second.  This compares to a classic VAX 11/780 which did 1757 Dhrystones/sec.  I.E. the VAX score is 133 times the 1802's.

                          I think the VAX executed 500,000 instructions per sec vs 100,000 for the MC so the VAX did 500000/1757=284 vax instructions per pass and the 1802 is doing about 7500.  Frankly, looking at the benchmark code, 284 seems like an impossibly small number for the work being done but 7500 must leave some room for improvement.

                          I've tried doing quick and dirty profiling by interrupting the code running on the emulator to see where it is and try to find hot spots.  Nothing very obvious showed up so I think this will be a long slog rather than any sort of magic bullet.  In the end I'd be amazed if I could cut the run time in half but that seems like a goal.  At any rate, I'll have a good solid, directed, look at the code being generated and have a bunch of test cases for the peephole optimizer.


                        • bill rowe
                          I ve loaded the latest version of lcc1802 to https://sites.google.com/site/lcc1802/the-rhinestone-compiler The source code is now online at
                          Message 12 of 13 , Oct 25, 2013
                          • 0 Attachment
                            I've loaded the latest version of lcc1802 to https://sites.google.com/site/lcc1802/the-rhinestone-compiler  The source code is now online at https://code.google.com/p/lcc1802/

                            The biggest changes to this version are speedups driven by the Dhrystone benchmark.  The production compiler now scores 27 Dhrystones/second up from 13 for the previous version.  

                            I would appreciate anyone with an interest trying it out.  It works fine on my windows 7 and Vista systems and passes all my self tests.  The optimizer is not turned on by default(it needs command line option -O) but the basic code emitted by the compiler and the support routines are notably better even without.



                            To: cosmacelf@yahoogroups.com
                            From: bill_rowe_ottawa@...
                            Date: Mon, 14 Oct 2013 17:30:03 -0400
                            Subject: [cosmacelf] Dhrystone Before and After

                             

                            The Rhinestone Compiler isn't quite ready for prime time but the results are pretty good.  The very earliest run scored 13.33 Dhrystones/second - 7500 instructions at a nominal 100,000 instructions/sec.  The most recent run scored 26.8 executing 3731 instructions per pass.  The 26.8 compares favorably with the 36 score of the 6502 in an apple IIe since the 1802 is running well below its rated speed.

                            By far the biggest improvements came from improving support routines rather than directly optimizing the emitted code.  
                            -The compiler now generates inline shifts and adds for multiplications by small constants and the multiplication routine has been optimized for small operands(which are common);
                            -The division routine was re-coded to better use 1802 instructions and, again, expedite smaller divides;
                            -Two common string routines(copy and compare) were re-written in assembly;
                            Changing these routines pulled out thousands of instructions from each pass.

                            The peephole optimizer runs over the code after it's been emitted by the compiler and before it's assembled looking for simple changes such as combining multiple accesses to the same storage location or eliminating unnecessary register loads . By comparison the peephole rules pulled out 500-600 instructions.

                            At the same time, the object module got about 10-20% smaller so overall I'm calling this a win.

                            The current version of LCC1802 is posted here https://sites.google.com/site/lcc1802/

                            From: bill_rowe_ottawa@...
                            To: cosmacelf@yahoogroups.com
                            Subject: Dhrystone Benchmark LCC1802
                            Date: Sun, 15 Sep 2013 17:09:06 -0400

                            As I work on the C compiler I've kept my eyes open for a simple benchmark to add to the test suite and as a target for optimization.

                            The Dhrystone integer benchmark dates from the 80's and is not much used anymore but I was able to find the source which is written in C.  http://www.netlib.org/benchmark/dhry-c 
                            It is very well documented and fairly easy to understand at a high level.  It includes integer math, procedure calls, and string manipulation.  It ends with a printout of predicted and actual results so you can tell it worked.

                            Compiling it for the 1802 I only had to comment out the parts that call the operating system for time information. Other than that it compiled and ran without a hitch. (so Yay!)

                            My initial run took about 7.5 seconds for 100 passes or 13.333 Dhrystones/second.  This compares to a classic VAX 11/780 which did 1757 Dhrystones/sec.  I.E. the VAX score is 133 times the 1802's.

                            I think the VAX executed 500,000 instructions per sec vs 100,000 for the MC so the VAX did 500000/1757=284 vax instructions per pass and the 1802 is doing about 7500.  Frankly, looking at the benchmark code, 284 seems like an impossibly small number for the work being done but 7500 must leave some room for improvement.

                            I've tried doing quick and dirty profiling by interrupting the code running on the emulator to see where it is and try to find hot spots.  Nothing very obvious showed up so I think this will be a long slog rather than any sort of magic bullet.  In the end I'd be amazed if I could cut the run time in half but that seems like a goal.  At any rate, I'll have a good solid, directed, look at the code being generated and have a bunch of test cases for the peephole optimizer.



                          • Egan Ford
                            Bill, Congrats! I look forward to playing with it.
                            Message 13 of 13 , Oct 27, 2013
                            • 0 Attachment
                              Bill, Congrats!  I look forward to playing with it.


                              On Fri, Oct 25, 2013 at 9:04 AM, bill rowe <bill_rowe_ottawa@...> wrote:
                               

                              I've loaded the latest version of lcc1802 to https://sites.google.com/site/lcc1802/the-rhinestone-compiler  The source code is now online at https://code.google.com/p/lcc1802/

                              The biggest changes to this version are speedups driven by the Dhrystone benchmark.  The production compiler now scores 27 Dhrystones/second up from 13 for the previous version.  

                              I would appreciate anyone with an interest trying it out.  It works fine on my windows 7 and Vista systems and passes all my self tests.  The optimizer is not turned on by default(it needs command line option -O) but the basic code emitted by the compiler and the support routines are notably better even without.



                              To: cosmacelf@yahoogroups.com
                              From: bill_rowe_ottawa@...
                              Date: Mon, 14 Oct 2013 17:30:03 -0400
                              Subject: [cosmacelf] Dhrystone Before and After

                               

                              The Rhinestone Compiler isn't quite ready for prime time but the results are pretty good.  The very earliest run scored 13.33 Dhrystones/second - 7500 instructions at a nominal 100,000 instructions/sec.  The most recent run scored 26.8 executing 3731 instructions per pass.  The 26.8 compares favorably with the 36 score of the 6502 in an apple IIe since the 1802 is running well below its rated speed.

                              By far the biggest improvements came from improving support routines rather than directly optimizing the emitted code.  
                              -The compiler now generates inline shifts and adds for multiplications by small constants and the multiplication routine has been optimized for small operands(which are common);
                              -The division routine was re-coded to better use 1802 instructions and, again, expedite smaller divides;
                              -Two common string routines(copy and compare) were re-written in assembly;
                              Changing these routines pulled out thousands of instructions from each pass.

                              The peephole optimizer runs over the code after it's been emitted by the compiler and before it's assembled looking for simple changes such as combining multiple accesses to the same storage location or eliminating unnecessary register loads . By comparison the peephole rules pulled out 500-600 instructions.

                              At the same time, the object module got about 10-20% smaller so overall I'm calling this a win.

                              The current version of LCC1802 is posted here https://sites.google.com/site/lcc1802/

                              From: bill_rowe_ottawa@...
                              To: cosmacelf@yahoogroups.com
                              Subject: Dhrystone Benchmark LCC1802
                              Date: Sun, 15 Sep 2013 17:09:06 -0400

                              As I work on the C compiler I've kept my eyes open for a simple benchmark to add to the test suite and as a target for optimization.

                              The Dhrystone integer benchmark dates from the 80's and is not much used anymore but I was able to find the source which is written in C.  http://www.netlib.org/benchmark/dhry-c 
                              It is very well documented and fairly easy to understand at a high level.  It includes integer math, procedure calls, and string manipulation.  It ends with a printout of predicted and actual results so you can tell it worked.

                              Compiling it for the 1802 I only had to comment out the parts that call the operating system for time information. Other than that it compiled and ran without a hitch. (so Yay!)

                              My initial run took about 7.5 seconds for 100 passes or 13.333 Dhrystones/second.  This compares to a classic VAX 11/780 which did 1757 Dhrystones/sec.  I.E. the VAX score is 133 times the 1802's.

                              I think the VAX executed 500,000 instructions per sec vs 100,000 for the MC so the VAX did 500000/1757=284 vax instructions per pass and the 1802 is doing about 7500.  Frankly, looking at the benchmark code, 284 seems like an impossibly small number for the work being done but 7500 must leave some room for improvement.

                              I've tried doing quick and dirty profiling by interrupting the code running on the emulator to see where it is and try to find hot spots.  Nothing very obvious showed up so I think this will be a long slog rather than any sort of magic bullet.  In the end I'd be amazed if I could cut the run time in half but that seems like a goal.  At any rate, I'll have a good solid, directed, look at the code being generated and have a bunch of test cases for the peephole optimizer.




                            Your message has been successfully submitted and would be delivered to recipients shortly.