Loading ...
Sorry, an error occurred while loading the content.
 

RE: [fpga-cpu] Fpga and Cpu cores

Expand Messages
  • Jan Gray
    A list of some FPGA CPU cores is at www.fpgacpu.org/links.html. To date I have not seen a published performance comparison chart. Were anyone to publish a
    Message 1 of 3 , Sep 10, 2000
      A list of some FPGA CPU cores is at www.fpgacpu.org/links.html.

      To date I have not seen a published performance comparison chart. Were
      anyone to publish a performance chart it would be instructive to state

      * performance on some standard set of benchmarks. SPEC or EEMBC would be
      nice except these are (IIRC) not inexpensive to do. Dhrystones would be
      better than nothing, I suppose. Benchmarks consisting of unavailable-source
      company-written inner loops are useless except for demonstrating peak
      frequency.
      * size of core (logic cells and block RAMs)
      * time to run each of the benchmarks (what really matters)
      * harmonic mean time averages over the benchmarks
      * frequency (over worst case temp and voltage conditions) (interesting but
      not comparable)
      * instructions per clock (interesting but not comparable)
      * host system, including detailed description of memory subsystem latencies
      * measured peak power and total energy required to run the benchmarks
      * whether the data are simulated or measured on real machines
      * how the core was prepared -- straight compilation of shipped source or, at
      the other extreme, manually tweaked in the FPGA Editor?

      For a while to come, expect apples-to-oranges data that warrant considerable
      skepticism. Company #1 will present simulated results for their core for
      their fastest speed grade parts (expensive unobtainium), running entirely
      on-chip, with programs and data in on-chip block RAM, on their best case
      inner loops. Company #2 will present measured results in the context of a
      real low-cost system using last year's slowest-speed grade device, running
      standard benchmark programs out of external RAM.

      As an inadequate starting point, see
      www.fpgacpu.org/usenet/a-x-announce.html (size comparisons),
      www.fpgacpu.org/xsoc/xr16.html (xr16 in SpartanXL-4: "25 MHz" - "40 MHz")
      www.fpgacpu.org/xsoc2/log.html ("60 MHz" so far in Virtex-4),
      http://www.xilinx.com/products/logicore/alliance/arc/risc_processor.pdf (ARC
      basecase at "41 MHz" in VirtexE-8), and
      http://www.altera.com/document/ds/ds_excnios.pdf (Nios at "up to 50 MIPS and
      50 MHz", presumably in fastest 20KE). A while back Damjan Lampret of
      opencores.org claimed >100 MHz simulated frequencies in the fastest speed
      grade of VirtexE (if I recall correctly) but I haven't seen any more recent
      data now that their OR1K core is further along.

      All this clock frequency data is inadequate for serious comparison purposes
      because almost no one is stating instructions per clock data, and more
      importantly, because the instruction sets are not comparable. (#1's
      instructions may well do 30% more work per instruction than #2's). For
      example, some 16-bit instruction word architectures (like xr16) require 2
      instructions to form a 16-bit constant whereas some 32-bit instruction word
      designs need only one. Another example, I can design a stack machine (even
      a Java machine) that really screams, frequency- and IPC-wise, but if it
      requires four instructions to fetch 2 local variables, add them, and store
      the result to another local, it could underperform a 3-operation RISC
      machine with twice the cycle time.

      Caveat emptor!

      Jan Gray
      Gray Research LLC
    • bfranchuk@jetnet.ab.ca
      ... Part of the problem is the architecture design also is a large unknown in bench marking.So many CPU s so,few details. For example here is a model of what I
      Message 2 of 3 , Sep 21, 2000
        --- In fpga-cpu@egroups.com, "Jan Gray" <jsgray@a...> wrote:
        > A list of some FPGA CPU cores is at www.fpgacpu.org/links.html.
        > To date I have not seen a published performance comparison chart.
        >Were anyone to publish a performance chart it would be instructive to
        >state performance on some standard set of benchmarks.

        Part of the problem is the architecture design also is a large
        unknown in bench marking.So many CPU's so,few details.

        For example here is a model of what I belive
        to be the FASTEST cpu possible: a THREE stroke single address
        computing engine.
        1) Fetch the current instruction. (Memory access)
        2) Calculate the effect address of the memory operand
        3) Load/Store from Memory Data(Memory access)

        Since benchmarks results are never looked there is no
        need for any ALU operations performed with the data. Since
        nobody wants to admit a instruction takes 3 cycles there
        is strong pressure to make things look like they take 1 or less
        cycles per instruction. RISK... um RISC machines try to
        make step 3 vanish by have lots of registers on chip.
        Pipelining of instructions tries to make step 2 vanish.
        Cacheing makes step 1 vanish. Harvard machines (what RISC's want to
        be) do steps 1 and 3 at once. Immedate operations save steps 2,3.
        Forth machines ignores step 2. And so forth with other computer
        designs. CPU designs have to guess at what is the best design
        for the real world and that can be hard.I hope this model with
        help with the Apples and the Oranges of computing.
        Ben.
        ----
        "We do not inherit our time on this planet from our parents...
        We borrow it from our children."
        "Luna family of Octal Computers"
        http://www.jetnet.ab.ca/users/bfranchuk
      Your message has been successfully submitted and would be delivered to recipients shortly.