Loading ...
Sorry, an error occurred while loading the content.

2908RE: [antlr-interest] RE: C++ stack overflow

Expand Messages
  • Ernest Pasour
    Dec 1, 2000
      I've tracked down why my parse ran so slowly, and it did turn out to be the code in ASTFactory. Or my use of the code anyway.

      RefAST ASTFactory::make(ANTLR_USE_NAMESPACE(std)vector<RefAST> nodes)
      {
      if ( nodes.size()==0 )
      return RefAST(nullASTptr);
      RefAST root = nodes[0];
      RefAST tail = RefAST(nullASTptr);
      if (root) {
      root->setFirstChild(RefAST(nullASTptr)); // don't leave any old pointers set
      }
      // link in children;
      for (unsigned int i=1; i<nodes.size(); i++) {
      if ( !nodes[i] ) continue; // ignore null nodes
      if ( !root ) {
      // Set the root and set it up for a flat list
      root = tail = nodes[i];
      }
      else if ( !tail ) {
      root->setFirstChild(nodes[i]);
      tail = root->getFirstChild();
      }
      else {
      tail->setNextSibling(nodes[i]);
      tail = tail->getNextSibling();
      }
      // Chase tail to last sibling
      *** while (tail->getNextSibling()) {
      tail = tail->getNextSibling();
      }
      }
      return root;
      }

      The last "while" loop (marked with the asterisks) walks all the way to the end of the list. So my spec file had a loop in it like the following:

      JSPTokenList:
      {
      RefAST tagList= #(nullAST);
      }
      ...
      (
      TAG:aTag
      {#tagList = #(nullAST, #tagList, #TAG);}
      )*
      ...

      This effectively appends to the tagList each new tag that is found. When it hits the while loop in the ASTFactory code above, it walks down the entire list before appending the node (n-squared algorithm). I rewrote this code to explicitly keep track of the tail node and used the setNextSibling call to append new items to the tail. It cut my parse time on a large file from 90 seconds to 12.

      As far as I can tell, the linked nodes just won't work very well in a large list.

      Ernest

      -----Original Message-----
      From: Ernest Pasour [mailto:sasecp@...]
      Sent: Tuesday, November 28, 2000 10:14 AM
      To: 'antlr-interest@egroups.com'; 'Ric Klaren'
      Subject: RE: [antlr-interest] RE: C++ stack overflow


      Okay, after a little digging, I modified my subclass of LLkParser to have its own destructor that pre-deletes the tokens iteratively. I happen to know the structure of this token tree, but I could write a generic tree walker that builds a list of items to delete. And I guess I don't actually delete the tokens, I just cut them off the list and let them delete themselves.

      I also experience a lot of slowness when I run my parse. I suspect that every time I add a node to the end of the list, it walks the entire list.

      Ernest

      JSPParser::~JSPParser()
      {
      //antlr::AST
      antlr::AST *root=getAST();
      if (root!=NULL)
      {
      antlr::AST *firstChild=root->getFirstChild();
      if (firstChild!=NULL)
      {
      CPtrArray tokenArray;
      while (firstChild!=NULL)
      {
      tokenArray.Add(firstChild);
      firstChild=firstChild->getNextSibling();
      }

      for (int i=tokenArray.GetSize()-1;i>=0;i--)
      {
      antlr::AST *token=(antlr::AST *)tokenArray[i];
      if (token!=NULL)
      {
      token->setNextSibling(nullAST);
      }
      }
      }
      }
      }

      -----Original Message-----
      From: Ernest Pasour [mailto:sasecp@...]
      Sent: Tuesday, November 28, 2000 7:33 AM
      To: 'Ric Klaren'
      Cc: 'antlr-interest@egroups.com'
      Subject: [antlr-interest] RE: C++ stack overflow


      I was probably a bit unclear with my problem so I'll try to expand a little bit.

      1. The parse completes with no problems. That is, the problem is not that the generated parsing code hits a stack overflow while parsing a file with nested elements.

      2. My parse tree is not deep at all--in fact, it's almost completely flat. My resulting parse tree looks like (for example):

      root
      |
      <html> - <body> - <%@taglib uri="/taglibs/arch.tld" prefix="blah"> - <h1> - header text - </h1> - <blah:pieChart param1="name"/> - </body> - </html>

      So my "tree" is basically just a linked list of tokens. The bigger my file, the longer the list. So increasing the stack size isn't really a viable option.

      My theory is that when I delete my LLkParser instance, it deletes the root, which in turn deletes its children. And each child deletes its sibling. But I believe that the delete occurs in a recursive manner such that the call to delete <html> is still on the stack while the call to delete <h1> is being processed. This means that the longer the token list, the more stack space gets used during the delete.

      I don't really want to rearchitect at this point. I'd like to mabye modify the library code for LLkParser (maybe elsewhere; I'm not sure exactly how the reference counting works) to delete the entire list of tokens instead of depending on the reference counting behavior.

      Thanks,
      Ernest

      -----Original Message-----
      From: Ric Klaren [mailto:klaren@...]
      Sent: Tuesday, November 28, 2000 4:06 AM
      To: Ernest Pasour
      Cc: 'antlr-interest@egroups.com'
      Subject: Re: C++ stack overflow


      Hi,

      On Mon, Nov 27, 2000 at 04:03:12PM -0500, Ernest Pasour wrote:
      > I'm using 2.7.1a2 I'm hitting a stack overflow with a large input file
      > when I try to delete the stream of tokens from the parser.

      I could recommend upgrading to 2.7.1.. this a2 is a prerelease version.
      (But that aside)

      > My parser basically just makes a stream of tokens that I interpret in my
      > C++ code. When I try to delete the token stream, I get a stack overflow
      > that looks like below. The parse itself completes without error.
      <snip>
      > CommonJSPASTNode is my node class. It looks like the way the ASTRef and
      > ASTRefCount are defined that they don't return until all siblings are
      > deleted.

      They don't actually delete things until reference counts drop to zero.

      > Can anybody confirm that this is correct?

      Well it depends... Is the generated parse tree very deep, deep enough to
      warrant a stack overflow (how big is the stack on your os/arch anyway?
      can't you increase it with a compiler option/system call?)? Or does it
      happen always? How do you build your trees automatically or manually?

      Why does this happen when you delete a TokenStream? TokenStreams are pretty
      far away from the generated parse tree?

      > Does anybody know if this can be rewritten as an iterative delete?

      Theoretically yes. But it very much depends on what you built around
      antlr's support code and you might get trouble in the future.

      If you post some more details I might have more ideas.

      Cheers,

      Ric
      --
      -----+++++*****************************************************+++++++++-------
      ---- Ric Klaren ----- klaren@... ----- +31 4893722 ----
      -----+++++*****************************************************+++++++++-------
      Why don't we just invite them to dinner and massacre them all when they're
      drunk? You heard the man. There's seven hundred thousand of them.
      Ah? ... So it'd have to be something simple with pasta, then.
      --- From: Interesting Times by Terry Pratchet
      -----+++++*****************************************************+++++++++-------
    • Show all 6 messages in this topic