Loading ...
Sorry, an error occurred while loading the content.

Re: [self-interest] bytecodes

Expand Messages
  • Jecel Assumpcao Jr
    ... Exactly. Urs did several comparisons between ParcPlace Smalltalk and the Self NIC (the Non Inlining Compiler, and also speculated on a Self interpreter) in
    Message 1 of 9 , Nov 1, 2000
    • 0 Attachment
      Marko Mikulicic wrote:
      > Jecel Assumpcao Jr wrote:
      > > where lexical level 0, slot 1 is 'z', level 1 slot 0 is 'a' and level 2
      > > slot 2 is 'tmp'.
      >
      > ah. I understand. This is only to allow fast indexing through local slots, without
      > literals.

      Exactly. Urs did several comparisons between ParcPlace Smalltalk and
      the Self NIC (the Non Inlining Compiler, and also speculated on a Self
      interpreter) in his thesis. He saw that the main performance difference
      between the NIC and an interpreter is the use of message sends for
      local slot access (page 31). These new bytecodes fix this.

      He also found out that the NIC would be 2.5 times faster if it inlined
      the "if" and "while" messages and blocks like Smalltalk does. The
      branch bytecodes would fix this (if they were used).

      > But does it depend too much on the implementation ?
      > In some paper I have readed that the lowest level at wich Self code can access is
      > bytecodes and

      Yes - there are messages to mirrors that let you see the bytecodes and
      literal vectors. And since they are regular Self objects, you could (in
      theory) change them from Self code. There are primitives that will let
      you dump the compiled code (on the Sparc, for example) to the standard
      output, but Self can't "see" this or change it in any way.

      > in some other paper that the system tries hard to mantain the illusion that what
      > is going on is what the user expects.
      > For example, dynamic deoptimization and debugging. The original bytecode was quite
      > abstract and looked good
      > also when one thought of a block as a real object ... the bytecodes was so
      > abstract that they didn't depend on the lexical context
      > of the block.

      Right - using normal inheritance (delegation for the picky people) as
      variable scopes was simply brilliant, even if dynamically scoped Lisp
      (and Logo) had something very similar.

      > I think that maybe for the interpreter there could be two levels of bytecode, one
      > created by the parser (the standard set) and
      > one deeper, closer to the interpreter (just in time interpreting). Of course there
      > is a waste of space (like for machine code) but at least
      > I don't break compatibility with older snapshots or Self programs wich accessed
      > bytecodes (emulators,...) only because I want to make my VM
      > portable.
      > What do you think ?

      Urs suggested something like this. You could include PICs and other
      complications in this second set of bytecodes. The advantage of such a
      solution over the current NIC is that it could easily be ported to any
      machine with a C++ compiler.

      It would be great if someone were to do this.

      > Why should the compiler try to analyze the dataflow (quite expensive) only to save
      > a couple of words on the stack ?

      The compiler has to follow the dataflow to optimize register usage,
      find out the types of the expressions and so on. As long as it is doing
      these things, it might as well notice expressions which aren't used and
      free the storage (register or stack) they are taking up.

      > Cannot find any links regarding Pep.

      The OOPSLA99 workshop and TPOS97 papers:

      http://www.sun.com/research/kanban/oopsla-vm-wkshp.pdf

      http://www.sunlabs.com/research/java-topics/pubs/97-pep.ps

      > > Though this encoding can have repeated literals, it actually uses up
      > > less memory than the normal one.
      >
      > Very good idea.
      > Hope you didn't patent it :-)

      No. I decided to not even patent the Self-in-hardware architecture I
      developed.

      > Can you tell me how less memory uses.
      >
      > the example above would be:
      > (0 = push literal, 1 = send, 2 = selfSend, 3 =
      >
      > 0: selfSend, 0
      > 1: selfSend, 1
      > 2: push, 2
      > 3: selfSend, 3
      > ----------
      > 1*3 = 3 bytes

      Bytecode 2 should be a "send", not a "push". And I counted 4 bytecodes
      above instead of 3.

      > 1: 'z'
      > 2: 'a'
      > 3: '+'
      > 4: 'tmp:'
      > ------
      > 4*4 = 16 bytes
      >
      > total: 19 bytes versus yours 5*4 = 20 bytes => you loose 1 byte :-)

      Don't forget to add 4 more words for the bytevector (header word, map
      pointer, size, bytes pointer). I came out 16 bytes ahead!

      > I imagine that for a bit longer methods there is a gain of 1 byte (at least for 16
      > codes).
      > But every duplicate eats 4 bytes!

      See the details in my 28 Aug 1999 message to this list:

      http://www.egroups.com/message/self-interest/299

      I saved 400KB.

      > I'm very interested to see what the real-world says.
      > Have you implemented this on something that can read the self4 world ?

      Not yet, though it would be trivial to write a Self method to save the
      image in this format.

      > Very interesting! Can you please also tell me the mean size of the literals (I
      > suppose the "mean literals" above
      > is the number of literals per method) to estimate the percentual saving of 3.66
      > bytes per method ?

      You save 4 bytes per pointer that you no longer have in the literal
      vector plus the size of the literal (only once for each different
      literal) but only if that string is not used for anything else in the
      system. If you have a local slot called 'a', you might still need the
      cannonical string 'a' for something else (in this case this is sure to
      happen since this string also stands for what would be the character $a
      in Smalltalk). It seems very complicated to calculate how much you will
      save by eliminating the literals.

      > I thought to stay more or less compatible but I think I won't follow that.
      > Maybe it won't be bad if we could share the images between OpenSelf and Self/R.

      That would be great, but as long as we can read in each other's textual
      Self source it would be a good start too.

      > What have you implemented until now, Jacel ?

      Nothing since tinySelf 1, unfortunately. I have been doing a lot of
      designing, however.

      -- Jecel
    • Marko Mikulicic
      ... And the PICS ? Can they be implemented with the new bytecode ? What is the impact of PICS then ? ... but the inlining compiler does that job good, right ?
      Message 2 of 9 , Nov 2, 2000
      • 0 Attachment
        Jecel Assumpcao Jr wrote:

        > Marko Mikulicic wrote:
        > > Jecel Assumpcao Jr wrote:
        > > > where lexical level 0, slot 1 is 'z', level 1 slot 0 is 'a' and level 2
        > > > slot 2 is 'tmp'.
        > >
        > > ah. I understand. This is only to allow fast indexing through local slots, without
        > > literals.
        >
        > Exactly. Urs did several comparisons between ParcPlace Smalltalk and
        > the Self NIC (the Non Inlining Compiler, and also speculated on a Self
        > interpreter) in his thesis. He saw that the main performance difference
        > between the NIC and an interpreter is the use of message sends for
        > local slot access (page 31). These new bytecodes fix this.

        And the PICS ? Can they be implemented with the new bytecode ?
        What is the impact of PICS then ?

        >
        >
        > He also found out that the NIC would be 2.5 times faster if it inlined
        > the "if" and "while" messages and blocks like Smalltalk does. The
        > branch bytecodes would fix this (if they were used).

        but the inlining compiler does that job good, right ?
        Branch bytecodes are IMHO bad, because the parser statycally gives meaning to a message
        (ifTrue:...)

        Perhaps hinting the compiler with such bytecodes could help. The compiler generates
        code
        for two versions of the expression: one inlined for true/false objects, and one that
        uses
        normal send. This analysis could be done by the compiler but for an interpreter it
        could be a gain
        without paying a big price for flexibility.

        >
        >
        > > I think that maybe for the interpreter there could be two levels of bytecode, one
        > > created by the parser (the standard set) and
        > > one deeper, closer to the interpreter (just in time interpreting). Of course there
        > > is a waste of space (like for machine code) but at least
        > > I don't break compatibility with older snapshots or Self programs wich accessed
        > > bytecodes (emulators,...) only because I want to make my VM
        > > portable.
        > > What do you think ?
        >
        > Urs suggested something like this. You could include PICs and other
        > complications in this second set of bytecodes. The advantage of such a
        > solution over the current NIC is that it could easily be ported to any
        > machine with a C++ compiler.
        >
        > It would be great if someone were to do this.
        >

        I will try it in openself. The framework is similar to that for machine code. I'll have
        to
        plug-in an interpreter.

        >
        >
        > Bytecode 2 should be a "send", not a "push". And I counted 4 bytecodes
        > above instead of 3.

        Sorry.

        >
        >
        > > 1: 'z'
        > > 2: 'a'
        > > 3: '+'
        > > 4: 'tmp:'
        > > ------
        > > 4*4 = 16 bytes
        > >
        > > total: 19 bytes versus yours 5*4 = 20 bytes => you loose 1 byte :-)
        >
        > Don't forget to add 4 more words for the bytevector (header word, map
        > pointer, size, bytes pointer). I came out 16 bytes ahead!

        You can encode bytecodes and literals in one vector (using an additional word for bcode
        size,
        and the padding max. 3 bytes) so you come out with 4+(0..3) bytes :-)
        For methods with repetitive literals (many "ifTrue:"..., matematical operations, ....)
        the old format is better, but I agree that for the general case your encoding can be
        better.
        I think only the real statistical data can judge.

        >
        >
        > > I imagine that for a bit longer methods there is a gain of 1 byte (at least for 16
        > > codes).
        > > But every duplicate eats 4 bytes!
        >
        > See the details in my 28 Aug 1999 message to this list:
        >
        > http://www.egroups.com/message/self-interest/299
        >
        > I saved 400KB.
        >

        Nice

        >
        >
        >
        > > I thought to stay more or less compatible but I think I won't follow that.
        > > Maybe it won't be bad if we could share the images between OpenSelf and Self/R.
        >
        > That would be great, but as long as we can read in each other's textual
        > Self source it would be a good start too.

        Sure, it's good. My parser is really bad :-)
        Maybe a parser written in Self would be good. Bootstrapped using a bad parser is
        possible.
        We would have then the same parser and 100% source compatibility,
        every effort would be shared and it would a good example of Self's flexibility.
        Can Mango do the job ?

        _____
        Marko
      • Jecel Assumpcao Jr
        Marko, ... Sorry about the confusion - the comments from Urs are from 1993 while the new bytecodes are from 1996 (I think), so he was thinking generically and
        Message 3 of 9 , Nov 3, 2000
        • 0 Attachment
          Marko,
          > And the PICS ? Can they be implemented with the new bytecode ?
          > What is the impact of PICS then ?

          Sorry about the confusion - the comments from Urs are from 1993 while
          the new bytecodes are from 1996 (I think), so he was thinking
          generically and not about what is in Self 4.1.2.

          Note that PICs are not implemented in the Mac port of Self. My guess is
          that their main performance impact is in providing type information for
          the SIC (which is also not implemented on the Mac).

          > > [inline "if" and "while" messages]
          >
          > but the inlining compiler does that job good, right ?

          It sure does!

          > Branch bytecodes are IMHO bad, because the parser statycally gives
          > meaning to a message (ifTrue:...)

          I don't like this about Smalltalk. Self 1.0 also had this problem
          (there were a set of messages the compiler knew about and did a good
          job on), but now all messages get equal treatment.

          > You can encode bytecodes and literals in one vector (using an additional word for bcode
          > size, and the padding max. 3 bytes) so you come out with 4+(0..3) bytes :-)

          You want an object that is a mix of normal vector and bytevectors like
          most Smalltalks used for their method objects? You can't pack the
          bytecodes in a normal vector since you only have 30 bits in a smallInt
          (not enough for four bytes).

          > > That would be great, but as long as we can read in each other's textual
          > > Self source it would be a good start too.
          >
          > Sure, it's good. My parser is really bad :-)
          > Maybe a parser written in Self would be good. Bootstrapped using a bad parser is
          > possible.

          Good idea. There is already a very outdated parser that is used as a
          benchmark in Self. It is not very well written, however (just like
          tinySelf 0 and, perhaps, yours), and would have to be extensively
          updated to understand annotations and other Self 4 things.

          > We would have then the same parser and 100% source compatibility,
          > every effort would be shared and it would a good example of Self's flexibility.

          You have to be careful about what interfaces the parser uses (sets of
          messages mirrors need to understand).

          > Can Mango do the job ?

          I don't think so - Mango needs a structured grammar that it can
          translate into a LR(1), LALR(1) or SLR(1) parse table (I once used to
          know what these were ;-) and you can't write one for Self. The
          explicit receiver expressions are likely to be the main problem, though
          the abuse of the "." token might also make it impossible to use Mango.

          On the other hand, I did try to write a Mango grammar file for Self 3.0
          (to be translated into a SLR(1) parse table) in 1996. I can't remember
          why I didn't finish it, but know I was stuck in the lexical part. You
          can get it at http://www.merlintec.com/s3.grm

          -- Jecel
        Your message has been successfully submitted and would be delivered to recipients shortly.