Loading ...
Sorry, an error occurred while loading the content.

Re: [SeattleRobotics] how much time single instruction takes?

Expand Messages
  • Brian Dean
    ... I think you are right - this document says it is not: http://www.atmel.com/dyn/resources/prod_documents/atmelavr.PDF It says: For all intents and
    Message 1 of 19 , Nov 1, 2003
    View Source
    • 0 Attachment
      On Fri, Oct 31, 2003 at 11:58:40PM -0500, David VanHorn wrote:

      > >As far as I know Atmel AVR is using Harvard
      > >Architecture, RISC and Pipelining to reach that speed.
      > >Could anybody explain more detail about this?
      >
      > I don't know that it's pipelined.

      I think you are right - this document says it is not:

      http://www.atmel.com/dyn/resources/prod_documents/atmelavr.PDF

      It says: "For all intents and purposes, the CPU has no pipeline. It
      retrieves both source operands, executes the instruction, and stores
      the result in a single clock cycle. Branch latency is one clock for
      taken branches. All operations are register-to-register; the chip
      follows a strinct load/store model."

      I recall hearing something similar at a recent Atmel seminar.

      Cheers,
      -Brian
      --
      Brian Dean, bsd@...
      BDMICRO - Maker of the MAVRIC ATmega128 Dev Board
      http://www.bdmicro.com/
    • Larry Barello
      In the AVR all 32 registers are complete ALU s. So there is no shuffling of data: just operate on the register and you are done. That is how they get 1 cycle
      Message 2 of 19 , Nov 1, 2003
      View Source
      • 0 Attachment
        In the AVR all 32 registers are complete ALU's. So there is no shuffling of
        data: just operate on the register and you are done. That is how they get 1
        cycle instructions. Leading edge: read instruction, falling edge execute
        instruction (e.g. add, sub, mov, etc). That is why load/store instructions
        and branches take 1-2 extra cycles since they have to go outside the core
        for data (branch address or SRAM data).

        Contrast that with the PIC four cycles (in the '16 manual): read
        instruction, fetch operands, do it, store results. Most PIC instructions
        are one machine cycle, but, again, go outside the core and it is 2 cycles
        (e.g. branches). I am guessing, clock for clock, good Assembly or C coders
        will get roughly 4x the performance out of an AVR than a PIC.

        I think folks should get over the Microchip marketing of the PIC as a simple
        processor. If Atmel called out instructions like Microchip does, the AVR
        would have only ~30. The PIC is NOT simple, just archaic. On the other
        hand, the PDP-11 was complex, with a beautiful and elegant instruction set.
        So simple you could program and read it in octal.

        Microchip does have a couple things on Atmel: early market lead and a huge
        support system and, in fact, their chips are totally suitable for what most
        small systems need. Plus you can get OTP chips for very cheap which for
        toys is everything.

        However, for hobby robotic work where one-off designs are common and where
        the development cost is everything (i.e. our time spent designing, coding
        and debugging) I still think the AVR wins hands-down.

        Cheers!

        -----Original Message-----
        From: Brian Dean [mailto:bsd@...]
        Sent: Saturday, November 01, 2003 7:25 AM
        To: SeattleRobotics@yahoogroups.com
        Subject: Re: [SeattleRobotics] how much time single instruction takes?


        On Fri, Oct 31, 2003 at 11:58:40PM -0500, David VanHorn wrote:

        > >As far as I know Atmel AVR is using Harvard
        > >Architecture, RISC and Pipelining to reach that speed.
        > >Could anybody explain more detail about this?
        >
        > I don't know that it's pipelined.

        I think you are right - this document says it is not:

        http://www.atmel.com/dyn/resources/prod_documents/atmelavr.PDF

        It says: "For all intents and purposes, the CPU has no pipeline. It
        retrieves both source operands, executes the instruction, and stores
        the result in a single clock cycle. Branch latency is one clock for
        taken branches. All operations are register-to-register; the chip
        follows a strinct load/store model."

        I recall hearing something similar at a recent Atmel seminar.

        Cheers,
        -Brian
      • Kipton Moravec
        I am a big 8051 fan, and am starting to switch over to the Atmel AVR also. As someone who produces boards for a living, the only downside to the AVR is single
        Message 3 of 19 , Nov 1, 2003
        View Source
        • 0 Attachment
          I am a big 8051 fan, and am starting to switch over to the Atmel AVR also.

          As someone who produces boards for a living, the only downside to the AVR
          is single source.

          I agree with almost everything Larry writes. However there is a slight mistake.

          A register is not an ALU. An ALU is a Arithmetic / Logic Unit. Registers
          are like memory locations. In the Atmel AVR there are 32 registers, and 1 ALU.

          The ALU performs the math functions like add, subtract increment (usually)
          and logic operations like or, and, not. The output of the ALU usually goes
          back to a of the registers. During the calculation, it also updates the
          status register, for the Z, C, N, V, H flags.

          Most of the ALU operations are 1 clock cycle. To save hardware and make
          the chip smaller and cheaper, they made the multiply a 2 clock
          operation. Most cheap microcontrollers do not have a hardware multiply and
          an assembler routine to do it can take 40 to 100 machine cycles. So having
          a hardware multiply is a big advantage even if it takes 2 clocks versus
          doing it in software.

          The advantage of a register, is that it is hard wired to the ALU so two
          registers can be an input and one register can be an output. There is no
          "put the address on the bus, wait for the data lines to get the data, and
          then present it to the ALU". This way it can be done in 1 clock cycle.

          Kip




          At 11:09 AM 11/1/03, you wrote:
          >In the AVR all 32 registers are complete ALU's. So there is no shuffling of
          >data: just operate on the register and you are done. That is how they get 1
          >cycle instructions. Leading edge: read instruction, falling edge execute
          >instruction (e.g. add, sub, mov, etc). That is why load/store instructions
          >and branches take 1-2 extra cycles since they have to go outside the core
          >for data (branch address or SRAM data).
          >
          >Contrast that with the PIC four cycles (in the '16 manual): read
          >instruction, fetch operands, do it, store results. Most PIC instructions
          >are one machine cycle, but, again, go outside the core and it is 2 cycles
          >(e.g. branches). I am guessing, clock for clock, good Assembly or C coders
          >will get roughly 4x the performance out of an AVR than a PIC.
          >
          >I think folks should get over the Microchip marketing of the PIC as a simple
          >processor. If Atmel called out instructions like Microchip does, the AVR
          >would have only ~30. The PIC is NOT simple, just archaic. On the other
          >hand, the PDP-11 was complex, with a beautiful and elegant instruction set.
          >So simple you could program and read it in octal.
          >
          >Microchip does have a couple things on Atmel: early market lead and a huge
          >support system and, in fact, their chips are totally suitable for what most
          >small systems need. Plus you can get OTP chips for very cheap which for
          >toys is everything.
          >
          >However, for hobby robotic work where one-off designs are common and where
          >the development cost is everything (i.e. our time spent designing, coding
          >and debugging) I still think the AVR wins hands-down.
          >
          >Cheers!
          >
          >-----Original Message-----
          >From: Brian Dean [mailto:bsd@...]
          >Sent: Saturday, November 01, 2003 7:25 AM
          >To: SeattleRobotics@yahoogroups.com
          >Subject: Re: [SeattleRobotics] how much time single instruction takes?
          >
          >
          >On Fri, Oct 31, 2003 at 11:58:40PM -0500, David VanHorn wrote:
          >
          > > >As far as I know Atmel AVR is using Harvard
          > > >Architecture, RISC and Pipelining to reach that speed.
          > > >Could anybody explain more detail about this?
          > >
          > > I don't know that it's pipelined.
          >
          >I think you are right - this document says it is not:
          >
          > http://www.atmel.com/dyn/resources/prod_documents/atmelavr.PDF
          >
          >It says: "For all intents and purposes, the CPU has no pipeline. It
          >retrieves both source operands, executes the instruction, and stores
          >the result in a single clock cycle. Branch latency is one clock for
          >taken branches. All operations are register-to-register; the chip
          >follows a strinct load/store model."
          >
          >I recall hearing something similar at a recent Atmel seminar.
          >
          >Cheers,
          >-Brian
          >
          >
          >Visit the SRS Website at http://www.seattlerobotics.org
          >
          >To unsubscribe from this group, send an email to:
          >SeattleRobotics-unsubscribe@yahoogroups.com
          >
          >
          >Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
        • Larry Barello
          Well, you learn something new every day. A couple years ago at an atmel seminar they said each register was a full ALU. It was probably a simplification to
          Message 4 of 19 , Nov 1, 2003
          View Source
          • 0 Attachment
            Well, you learn something new every day. A couple years ago at an atmel
            seminar they said each register was a full ALU. It was probably a
            simplification to get across the idea of single cycle instructions.

            -----Original Message-----
            From: Kipton Moravec [mailto:kip@...]

            I am a big 8051 fan, and am starting to switch over to the Atmel AVR also.

            As someone who produces boards for a living, the only downside to the AVR
            is single source.

            I agree with almost everything Larry writes. However there is a slight
            mistake.

            A register is not an ALU. An ALU is a Arithmetic / Logic Unit. Registers
            are like memory locations. In the Atmel AVR there are 32 registers, and 1
            ALU.
          • andi
            ... http://www.atmel.com/dyn/resources/prod_documents/atmelavr.PDF ... In my opinion there is a pipeline to executed instruction from program memory, maybe it
            Message 5 of 19 , Nov 2, 2003
            View Source
            • 0 Attachment
              --- Brian Dean <bsd@...> wrote:
              > I think you are right - this document says it is
              > not:
              >
              http://www.atmel.com/dyn/resources/prod_documents/atmelavr.PDF
              >
              > It says: "For all intents and purposes, the CPU has
              > no pipeline. It
              > retrieves both source operands, executes the
              > instruction, and stores
              > the result in a single clock cycle. Branch latency
              > is one clock for
              > taken branches. All operations are
              > register-to-register; the chip
              > follows a strinct load/store model."
              >
              > I recall hearing something similar at a recent Atmel
              > seminar.
              >
              > Cheers,
              > -Brian

              In my opinion there is a pipeline to executed
              instruction from program memory, maybe it different
              with pipeline in CPU, because the
              datasheet(ATMega32,page 6) says:
              "In order to maximize performance and parallelism, the
              AVR uses a Harvard architecture–with separate memories
              and buses for program and data. Instructions in the
              program memory are executed with a single level
              pipelining. While one instruction is being executed,
              the next instruction is pre-fetched from the program
              memory. This concept enables instructions to be
              executed in every clock cycle."

              and the datasheet from AT90S8535 (page7) says:
              "The AVR uses a Harvard architecture concept-with
              separate memories and buses for program and data. The
              program memory is executed with a two stage pipeline.
              While one instruction is being executed, the next
              instruction is pre-fetched from the program memory.
              This concept enables instructions to be executed in
              every clock cycle."

              regards,


              Andi

              ________________________________________________________________________
              Want to chat instantly with your online friends? Get the FREE Yahoo!
              Messenger http://mail.messenger.yahoo.co.uk
            • Kenneth Maxon
              Yes, looking carefully at the architecture and the decisions made (internal code space Vs external, etc) and then looking at how the firmware is implemented in
              Message 6 of 19 , Nov 2, 2003
              View Source
              • 0 Attachment
                Yes, looking carefully at the architecture and the decisions made (internal
                code space Vs external, etc) and then looking at how the firmware is
                implemented in an FPGA it becomes evident that the unit is internally
                clocked much faster and that a pipelined architecture is being used. The
                additional hit taken by a branch instruction in the wrong direction
                (opposite of that predicted) is a tell tail sign of this as well. Where
                people may becoming confused is that due to the architecture decisions and
                the fact that they have no software interaction with the ultra simplistic
                cache, they (users)(for the most part) can not see that it is there.


                -Kenneth
                (Unit 3's in trouble and it's scared out of its wits) -Geddy Lee
                ----- Original Message -----
                From: "andi" <v_andi_m97@...>
                To: <SeattleRobotics@yahoogroups.com>
                Sent: Sunday, November 02, 2003 12:19 AM
                Subject: Re: [SeattleRobotics] how much time single instruction
                takes?(Pipeline)


                > --- Brian Dean <bsd@...> wrote:
                > > I think you are right - this document says it is
                > > not:
                > >
                > http://www.atmel.com/dyn/resources/prod_documents/atmelavr.PDF
                > >
                > > It says: "For all intents and purposes, the CPU has
                > > no pipeline. It
                > > retrieves both source operands, executes the
                > > instruction, and stores
                > > the result in a single clock cycle. Branch latency
                > > is one clock for
                > > taken branches. All operations are
                > > register-to-register; the chip
                > > follows a strinct load/store model."
                > >
                > > I recall hearing something similar at a recent Atmel
                > > seminar.
                > >
                > > Cheers,
                > > -Brian
                >
                > In my opinion there is a pipeline to executed
                > instruction from program memory, maybe it different
                > with pipeline in CPU, because the
                > datasheet(ATMega32,page 6) says:
                > "In order to maximize performance and parallelism, the
                > AVR uses a Harvard architecture-with separate memories
                > and buses for program and data. Instructions in the
                > program memory are executed with a single level
                > pipelining. While one instruction is being executed,
                > the next instruction is pre-fetched from the program
                > memory. This concept enables instructions to be
                > executed in every clock cycle."
                >
                > and the datasheet from AT90S8535 (page7) says:
                > "The AVR uses a Harvard architecture concept-with
                > separate memories and buses for program and data. The
                > program memory is executed with a two stage pipeline.
                > While one instruction is being executed, the next
                > instruction is pre-fetched from the program memory.
                > This concept enables instructions to be executed in
                > every clock cycle."
                >
                > regards,
                >
                >
                > Andi
                >
                > ________________________________________________________________________
                > Want to chat instantly with your online friends? Get the FREE Yahoo!
                > Messenger http://mail.messenger.yahoo.co.uk
                >
                > Visit the SRS Website at http://www.seattlerobotics.org
                >
                > To unsubscribe from this group, send an email to:
                > SeattleRobotics-unsubscribe@yahoogroups.com
                >
                >
                > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                >
                >
                >
              Your message has been successfully submitted and would be delivered to recipients shortly.