Loading ...
Sorry, an error occurred while loading the content.

24351RE: [Clip] Advice to add 'Tabs' to data.

Expand Messages
  • John Shotsky
    Feb 14, 2014
    • 0 Attachment

      Thank you! One of the things I have been meaning to get a better handle on is exactly this. With this example, I should be able to apply it to my cases.

       

      Regards,
      John
      RecipeTools Web Site: http://recipetools.gotdns.com/
      John's Mags Yahoo Group:  http://groups.yahoo.com/group/johnsmags/

       

      From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of flo.gehrke@...
      Sent: Friday, February 14, 2014 12:55
      To: ntb-clips@yahoogroups.com
      Subject: RE: [Clip] Advice to add 'Tabs' to data.

       

       

      --In ntb-clips@yahoogroups.com, <jshotsky@...> wrote:

       

      > One of the things I have been doing is enclosing multiple paren

      > phrases inside an (?= phrase), so that it won't capture, but I
      > wonder if it is captured anyway.

      No, also parens (or a 'group') inside a Lookaround are captured. Test...

      ^!Set %Cheeses%=Cheddar
      ^!Find "\b^%Cheeses%\b(?=(\x20cheese))" RS
      ^!Info ^$GetReSubstrings$

      against 'Cheddar cheese'. The output will be 'cheese' that is captured with the group inside the Lookahead Assertion.

      BTW: I think sometimes it's rather difficult to find out whether the speed of clip execution depends on the search pattern, the clip code or the way it consumes memory. Certain patterns, for example, could cause serious stack problems and lead to wrong results -- cf my message #22824 of June 20, 2012.

      Also there can be clips which are rather slow because the RegEx pattern causes a lot of backtracking. For example, take this line...

      101101010001011101110011101110001000100';'abababab!';

      and multiply it to 10,000 lines. Now run the following clip against those lines:

      ^!Find "[01]+.*(aa|bb)" WR

      For me, the clip needs almost a minute to find out that there is no match. That is, that there is no line that ends with 'aa' or 'bb'. However, the problem is not in NT or the clip code but in the RegEx. The trick is to suppress the backtracking because, actually, it isn't needed here. With an Atomic Group...

      ^!Find "(?>[01]+).*(aa|bb)" WR

      the job is done in two seconds.

      Regards,
      Flo

    • Show all 18 messages in this topic