Loading ...
Sorry, an error occurred while loading the content.
 

[self-interest] compact encoding

Expand Messages
  • Jecel Assumpcao Jr
    I decided to check out if the compact encoding I proposed a while ago (with two bits per instruction mixed in the the literals) was a good idea or not. So
    Message 1 of 3 , Aug 28, 1999
      I decided to check out if the compact encoding I proposed a while
      ago (with two bits per instruction mixed in the the literals) was
      a good idea or not.

      So first I got me a list of all the methods in the system (actually,
      their mirrors):

      enumerating all enumerate asList copyFilteredBy: [|:e|
      e isReflecteeMethod]

      This yields a list of 26320 elements in the normal Snapshot. Then,
      in this list, I evaluated the following expression to see how
      the coding would effect each method:

      | d <- dictionay copyRemoveAll |
      do: [ | :m. delta <- -4 |

      "delta starts out with -4 since we will no longer need a
      bytecode object for each method, and each bytevector object
      has 4 words of overhead"

      delta: delta - (m codes size /+ 4).

      "here we also discount the bytes themselves, coverting them
      to a rounded up number of words"

      delta: delta + (m codes size /+ 15).

      "on the other hand, the instructions now add words to the
      literal vector. But now we pack them 15 to a word"

      m byteCodesDo: [ | :bci. :op. :lit |
      delta: delta + 1.

      "we suppose that every single instruction will need
      exactly one literal. Note that in the case of pushSelf
      that originally didn't need a literal, now we have
      changed it to implicitSend 'self' which does need a
      literal"

      op = bytecodeFormat opcodes resendOp ifTrue: [delta: delta+1].

      "by converting resends to use primitives, we will need
      an extra literal for them"

      op = bytecodeFormat opcodes return ifTrue: [delta: delta-1].

      "return instructions don't need a literal, so we correct
      our mistake here"
      ].
      delta: delta - m literals size.

      "since we want to know how many words are added to the literal
      vector, we have to discount how many were in there before"

      d at: delta Put: (d at: delta IfAbsent: 0) + 1.

      "count yet another method with delta amount of memory
      words added"
      ].
      d

      So now we have a 70 element dictionary that has as keys the number of
      words that my enconding scheme would add to Self and as values the
      number of methods that will increase by that amount. Some simple
      expressions will reveal that:

      number of methods that increase or stay the same: 959
      memory increase due to these methods: 8200 words
      number of methods that decrease in size: 25361
      memory decrease due to these methods: 102380 words

      total memory gain due to new encoding: 94180 words

      Since that is almost half a megabyte, it might turn out to be
      a good idea after all.

      Another space saving measure (I haven't tested this one to see
      if it is any good) would be to allow singleton objects to be
      their own maps. Only when a new object is copied for the first
      time would we bother to separate it from the map. That might
      save some memory for all these methods.

      -- Jecel
    • Stefan Matthias Aust
      Hi! ... I think, this was your encoding: 00 NON_LOCAL_RETURN 01 PUSH_NEXT_LITERAL 10 SEND_NEXT_LITERAL 11 SELF_SEND_NEXT_LITERAL (or end of method marker,
      Message 2 of 3 , Aug 29, 1999
        Hi!

        Jecel Assumpcao Jr wrote:

        >I decided to check out if the compact encoding I proposed a while
        >ago (with two bits per instruction mixed in the the literals) was
        >a good idea or not.

        I think, this was your encoding:

        00 NON_LOCAL_RETURN
        01 PUSH_NEXT_LITERAL
        10 SEND_NEXT_LITERAL
        11 SELF_SEND_NEXT_LITERAL (or end of method marker, if no more
        literals are available)

        Instructions and literals both stored in one 32-bit word array.

        I don't remember what your idea was when you've more than 15 instructions,
        but I think, you could use the following algorithm:

        You actually don't need the "00" instruction. Non-local returns only occur
        in block objects and are always the last instruction. You could encode in
        the block object whether it's a block with "^" non-local return or a normal
        block. Now you could use "00" as mark for the end of the instruction
        stream. If you now reach the 16th instruction and it's no "00", you will
        use the word following the last used literal as source for the next
        instructions. Here's some (pseudo) Java-code to illustrate this. Note that
        instructions are stored "the wrong way":

        int ip = 0; // instruction pointer
        int lp = 0; // literal pointer
        loop:while (true) {
        int iw = method.words[ip];
        for (int n=0; n<16; n++) {
        int i = iw & 3; // instruction
        iw >>>= 2; // always shifts-in 0 and doesn't propagate sign bit
        switch (i) {
        case 0: break loop;
        case 1:
        int literal = method.worlds[++lp];
        //...
        //...
        }
        ip = ++lp;
        }
        }
        if (method.hasNonLocalReturn()) // ...
        else //...

        >------------------------------------------------------------------------

        However, I wonder how your idea compares to this variant (which I wanted to
        use for my mySelf interpreter):

        The idea is to minimize the number of literal references, which need 4
        bytes each time. One need to identify the 63 most common literals,
        selectors and/or primitive selectors. These are directly encoded. Only
        otherwise a literal vector is used, encoded into the byte vector similar to
        the above idea.

        Each instruction is byte-sized, which also helps to efficiently interpret
        them, I think. Everything is stored in 32 bit words. Upto four
        instructions are followed by zero or more (at most four) literal
        references. Each instruction encodes a type in 2 bits and carries a 6 bit
        argument.

        0 special instruction
        - normal return
        - non-local return
        - push self
        - push special literal (nil, true, false, -1, 0, 1, 2, 10)
        and these are useful for interpreters
        which optimize a few control flow methods...
        - drop stack
        - dup stack
        - conditional branch
        - unconditional branch
        - push next block
        and still 48 instructions free!

        1 push literal
        - push next literal
        - push special literal (1..63)

        2 stack send
        - send next literal
        - send special literal (1..63)
        (probably things like #traits, #at:, #clone, #copy, #size, etc.)

        3 implicit self send
        - send next literal
        - send special literal (1..63)
        (same as with 2 send)

        Either of the return instructions end the method's (or block's) execution.
        This means, the shortest possible code is "push self", "normal return".

        The method "clone = (cloneSize: 10)" whould be encoded as "implicit self
        send"+#cloneSize:, "push special"+10, "normal return" - still one word
        (assuming that cloneSize: is one of the special literals). Jecels encoding
        would need two words - as mine would without the special literal encoding.

        It's difficult for me to guess what would save more space.

        In any case, I agree that Jecel's encoding is an improvement over the old
        system.

        >------------------------------------------------------------------------

        >Another space saving measure (I haven't tested this one to see
        >if it is any good) would be to allow singleton objects to be
        >their own maps. Only when a new object is copied for the first
        >time would we bother to separate it from the map. That might
        >save some memory for all these methods.

        I don't think that this saving is worth the additional time. I don't know
        the original self system, but I'd guess that oddballs aren't that common.


        bye


        bye
        --
        Stefan Matthias Aust // Bevor wir fallen, fallen wir lieber auf.
      • Jecel Assumpcao Jr
        ... Great idea! It is interesting that this was what I did in my 1984 bytecodeless Smalltalk (http://www.lsi.usp.br/~jecel/st84.txt). Your C code for parsing
        Message 3 of 3 , Aug 29, 1999
          Stefan Matthias Aust wrote:
          > [suggestion that non_local_return instruction isn't needed]

          Great idea! It is interesting that this was what I did in my
          1984 bytecodeless Smalltalk (http://www.lsi.usp.br/~jecel/st84.txt).

          Your C code for parsing this encoding show that it isn't very
          complicated.

          >[alternate encoding]
          > 0 special instruction
          > - normal return
          > - non-local return
          > - push self
          > - push special literal (nil, true, false, -1, 0, 1, 2, 10)
          > and these are useful for interpreters
          > which optimize a few control flow methods...
          > - drop stack
          > - dup stack
          > - conditional branch
          > - unconditional branch
          > - push next block
          > and still 48 instructions free!

          Once I thought of making the bytecodes complete enough so you
          could actually write primitives in them. It was a pretty neat
          idea (and would make porting much easier), but some primitives
          (like one to manipulate the MMU in the processor) would still
          require special treatment, so I gave up on this.

          > 1 push literal
          > - push next literal
          > - push special literal (1..63)

          Here are the ten most popular literals (as counted by references
          from bytecodes):

          3872 'ifTrue:'
          3388 ','
          2520 'e'
          2466 '='
          2292 'value'
          1714 'copy'
          1692 's'
          1667 'value:'
          1550 'm'
          1518 0
          1429 'fb'

          I was very surprised not to see 'i' among these :-)

          > 2 stack send
          > - send next literal
          > - send special literal (1..63)
          > (probably things like #traits, #at:, #clone, #copy, #size, etc.)
          >
          > 3 implicit self send
          > - send next literal
          > - send special literal (1..63)
          > (same as with 2 send)

          Adding up the number of times that the 63 most popular literals
          appear in the literal frames (not quite the same as the count
          above, but the results are very close) we would be saving a total
          of 43903 words with your encoding. Not bad at all!

          > Either of the return instructions end the method's (or block's) execution.
          > This means, the shortest possible code is "push self", "normal return".
          >
          > The method "clone = (cloneSize: 10)" whould be encoded as "implicit self
          > send"+#cloneSize:, "push special"+10, "normal return" - still one word
          > (assuming that cloneSize: is one of the special literals). Jecels encoding
          > would need two words - as mine would without the special literal encoding.

          I think you got the order of your bytecodes wrong - you first
          have to push the arguments on the stack and *then* send the
          message.

          > >[singleton objects could be their own maps]
          >
          > I don't think that this saving is worth the additional time. I don't know
          > the original self system, but I'd guess that oddballs aren't that common.

          Every single method in a system is a differente *type* of object
          and has its own map, so there are quite a few "oddballs". I'll
          explain in more detail when I answer your other emails tomorrow.

          -- Jecel
        Your message has been successfully submitted and would be delivered to recipients shortly.