Loading ...
Sorry, an error occurred while loading the content.

67525Re: bucketbrigades with html filter

Expand Messages
  • Jeff Ambrosino
    Oct 18, 2005
    • 0 Attachment
      Yikes indeed :)

      I should have clarified that in our app we don't actually process
      embedded tags. Our app lets users mangle the source HTML using RegEx,
      and since users can (and often do) perform filtering like
      s/(<body>)(.*?)(<\/body>)/$1<center>$2<\/center>$3/, we need to buffer
      it all.

      The main benefit I see in the Apache::Clean approach is that it's less
      memory intensive for large content/pages... and of course it's very
      good as an example of how to manipulate content from within a filter.
      Aside from that, you're "cleaning" many times on smaller bits of
      content vs. once if you buffer the entire page. I'm curious where you
      think the performance tradeoffs are for once-vs-many when the average
      page size of 65kb. In my specific case, even if we could operate on
      chunks, I wager that there's more overhead in running many regexes vs.
      one big one. And I suppose one has to take into account the
      per-invocation buffer size (1kb in Apache::Clean) as well as typical
      bucket sizes... (8000 bytes?)


      On 10/18/05, Geoffrey Young <geoff@...> wrote:
      > > The way to deal with this is to buffer as much content as you need
      > > (maybe the whole page) and then do your work on the buffer.
      > yikes!
      > right. see
      > http://search.cpan.org/~geoff/Apache-Clean-2.00_7/
    • Show all 7 messages in this topic