Loading ...
Sorry, an error occurred while loading the content.

Re: [antlr-interest] nodes, hidden tokens, garbage collection

Expand Messages
  • John Allen Green
    ... Regarding Boehm GC: From what I ve read - yes, it should handle any circular structures fine. I think a bigger question about Boehm GC is: is it as
    Message 1 of 6 , Apr 4, 2002
    • 0 Attachment
      --On 04/04/2002 11:53 AM +0200 Ric Klaren wrote:
      > Well I was planning to do some major surgery anyway ;)
      >
      > First step custom would be adding a custom new/delete. Then problably do
      > or a subclassing trick, or some template trick to select the desired
      > memory pool (and/or the Boehm GC, I'm btw not sure how well the Boehm GC
      > handles circular things...). Lately I've been experimenting with a memory
      > pool implementation found in the Efficient C++ book from Bulka and Mayhew.

      Regarding Boehm GC: From what I've read - yes, it should handle any
      circular structures fine. I think a bigger question about Boehm GC is: is
      it as portable as Antlr/C++ currently is?

      I think that for Antlr's purposes, a memory pool makes more sense than a
      full blown GC anyway. (Er, as long as it is implemented allowing Antlr
      users to easily create multiple memory pools - one for each tree required
      in memory.) With GC, there's the overhead of scanning the memory to find
      unreachable objects. With a memory pool - well - when you're done, you're
      done, right?

      It's been a while since I read about customizing new/delete in C++. I hope
      you're not talking about a global replacement for new/delete, but just
      building custom new/delete specifically for the objects which should be
      stored in memory pools...? I think I read somewhere that globally replacing
      new/delete is "evil".

      Is there an online link to that book's memory pool implementation, or do I
      have to make a trip to the bookstore? :-)

      > At least RefAST would dissappear it's causing most 'problems' (or kludges)
      > and indeed it would become a plain old pointer. RefToken is less
      > troublesome (no heterogenous tokens and stuff like that).

      If the overhead of a garbage collector is added anyway... it might not make
      sense to have the overhead of ref counting for anything. I'd like to deal
      with RefToken as well, at least eventually, because of the problems with
      hiddenTokens.

      > Can't you use your own flavour of hidden tokens? I just had a small look
      > at the current implementation and I think it can be changed a little to
      > solve the reference counting problem. (basically play dirty in a few
      > places and store pointers to the objects in stead of using the RefToken)

      Wouldn't be too tough, I don't think. I'd have to hack or replace
      CommonHiddenStreamToken, as well as CommonHiddenTokenFilter, which refers
      to CommonHiddenStreamToken. Storing dumb pointers instead of storing
      RefTokens is asking for trouble down the road though, I think. For one, you
      would quickly find that you need to pass RefToken to various functions, and
      a dumb pointer does you no good.

      > Or do it less dirty and use a single linked list of visible tokens and in
      > the visible tokens you store a vector (or single linked list) with all
      > hidden tokens before/after the visible one. I don't see why a double
      > linked list is necessary there (probably there because of direct port
      > from java).

      I thought about using a vector of backlinks instead of a double-linked AST
      structure. However, in designing it, I found that there was a fair bit less
      data overhead involved in just putting the backlinks directly into the AST.

      > Another possibility is adding a custom new/delete to the hidden token
      > implementation. Allocate them from a pool attach the pool to the
      > lexer/parser and delete the pool with the lexer/parser object.
      >
      > If you switch to another scheme of reference counting (you btw got a
      > reference on that style of ref counting?) you may have to do some antlr
      > surgery.

      A link to that style of ref counting:
      http://agora.cubik.org/wiki/view/Main/ReferenceCounting
      ...follow the link to "acyclic refcounting". Unfortunately, it's more a
      description of the motivation than it is a description of the
      implementation.

      To get me up and running quickly, I've identified where my problems are and
      have come up with a solution which should work for me, at least for the
      short term. It's certainly not as ideal as getting rid of the refcounting,
      but it'll do for now.

      There were two problems:
      1. With my double linked AST, cyclic dependencies would prevent a dropped
      branch from deleting itself. (It would get missed in my "back unlink"
      cleanup function.)
      2. With the current implementation of the hidden tokens, there was a
      problem with figuring out when to delete them. Given nodes N1 and N2,
      hidden tokens H1 and H2:
      N1 -> H1 <=> H2 <- N2
      If N1 is dropped, how do I know if H1 and H2 are to be dropped as well? It
      depends on whether N2 still exists or not, but there's no link from H2 to
      N2, so there's no easy way to find out.

      My workaround: Don't drop/dereference (what do you call it?!) any nodes. If
      a node is to be removed from the tree, move it to -another- tree, which
      I've called "looseEnds". Then, at cleanup time, clean up the main tree as
      well as the looseEnds tree. (Obviously I've already written functions for
      cleaning up back links in the AST and for cleaning up the hidden tokens
      when AST is deleted.)

      Regards,
      John
    • Ric Klaren
      Hi, ... Boehm is pretty portable. At least to common systems. Solaris and many i386 flavoured unixes. Dunno if there s a win32 port. ... 100% agree. Boehm is
      Message 2 of 6 , Apr 5, 2002
      • 0 Attachment
        Hi,

        On Thu, Apr 04, 2002 at 11:58:40AM -0800, John Allen Green wrote:
        > Regarding Boehm GC: From what I've read - yes, it should handle any
        > circular structures fine. I think a bigger question about Boehm GC is: is
        > it as portable as Antlr/C++ currently is?

        Boehm is pretty portable. At least to common systems. Solaris and many i386
        flavoured unixes. Dunno if there's a win32 port.

        > I think that for Antlr's purposes, a memory pool makes more sense than a
        > full blown GC anyway. (Er, as long as it is implemented allowing Antlr
        > users to easily create multiple memory pools - one for each tree required
        > in memory.) With GC, there's the overhead of scanning the memory to find
        > unreachable objects. With a memory pool - well - when you're done, you're
        > done, right?

        100% agree. Boehm is nice if you already use it in you app though.

        > It's been a while since I read about customizing new/delete in C++. I hope
        > you're not talking about a global replacement for new/delete, but just
        > building custom new/delete specifically for the objects which should be
        > stored in memory pools...? I think I read somewhere that globally replacing
        > new/delete is "evil".

        No was talking about the specific objects. The global one is usually only
        necessary for memory debugging tools and tricks.

        > Is there an online link to that book's memory pool implementation, or do I
        > have to make a trip to the bookstore? :-)

        I diligently typed it in from the book. No online version afaik. I'll send
        it to you. On note of interest there's a slightly more advanced memory pool
        in the Loki library from the Modern C++ Design book from Andrei
        Alexandrescu, the source should be at:
        (http://www.awl.com/cseng/titles/0-201-70431-5/) this book really rocks.
        That one also support multithreading etc.

        > > At least RefAST would dissappear it's causing most 'problems' (or kludges)
        > > and indeed it would become a plain old pointer. RefToken is less
        > > troublesome (no heterogenous tokens and stuff like that).
        >
        > If the overhead of a garbage collector is added anyway... it might not make
        > sense to have the overhead of ref counting for anything. I'd like to deal
        > with RefToken as well, at least eventually, because of the problems with
        > hiddenTokens.

        It may in some point depend on how many objects are created and the exact
        scheme of garbage collection (if any) you use. For the tokens garbage
        collection would be quite prudent :), for the AST's you might drop garbage
        collection (depending on what you are parsing)

        > > Can't you use your own flavour of hidden tokens? I just had a small look
        > > at the current implementation and I think it can be changed a little to
        > > solve the reference counting problem. (basically play dirty in a few
        > > places and store pointers to the objects in stead of using the RefToken)
        >
        > Wouldn't be too tough, I don't think. I'd have to hack or replace
        > CommonHiddenStreamToken, as well as CommonHiddenTokenFilter, which refers
        > to CommonHiddenStreamToken.

        Why hack/replace just use your own ;)

        > Storing dumb pointers instead of storing RefTokens is asking for trouble
        > down the road though, I think. For one, you would quickly find that you
        > need to pass RefToken to various functions, and a dumb pointer does you no
        > good.

        It depends a little for the preserveWhiteSpace example you might get away
        with it. Real app probably more difficult indeed.

        > > Or do it less dirty and use a single linked list of visible tokens and in
        > > the visible tokens you store a vector (or single linked list) with all
        > > hidden tokens before/after the visible one. I don't see why a double
        > > linked list is necessary there (probably there because of direct port
        > > from java).
        >
        > I thought about using a vector of backlinks instead of a double-linked AST
        > structure. However, in designing it, I found that there was a fair bit less
        > data overhead involved in just putting the backlinks directly into the AST.

        Hmm if you know how many backlinks you have you may try a dirty trick with
        a overallocated class with an array as last attribute. I used that memory
        pool to do that old C trick in C++.

        > http://agora.cubik.org/wiki/view/Main/ReferenceCounting

        Thanks!

        > My workaround: Don't drop/dereference (what do you call it?!) any nodes. If
        > a node is to be removed from the tree, move it to -another- tree, which
        > I've called "looseEnds". Then, at cleanup time, clean up the main tree as
        > well as the looseEnds tree. (Obviously I've already written functions for
        > cleaning up back links in the AST and for cleaning up the hidden tokens
        > when AST is deleted.)

        Sounds good.

        Cheers,

        Ric
        --
        -----+++++*****************************************************+++++++++-------
        ---- Ric Klaren ----- klaren@... ----- +31 53 4893722 ----
        -----+++++*****************************************************+++++++++-------
        Innovation makes enemies of all those who prospered under the old
        regime, and only lukewarm support is forthcoming from those who would
        prosper under the new. --- Niccol� Machiavelli
      Your message has been successfully submitted and would be delivered to recipients shortly.