Loading ...
Sorry, an error occurred while loading the content.

String.patternMatch tests, not quite what I expected

Expand Messages
  • Seth Dillingham
    I mentioned that I added the startingAt parameter to string.patternMatch. Just ran some tests. Darn. It s a LOT faster, there s no doubt about that. My
    Message 1 of 7 , Dec 29, 2004
    • 0 Attachment
      I mentioned that I added the startingAt parameter to string.patternMatch.

      Just ran some tests. Darn.

      It's a LOT faster, there's no doubt about that. My assumption, though,
      was that it would also be somewhat faster than re.match because of the
      extra complexity of the internals of the re (PCRE) verbs.

      I was wrong.

      The test script creates a 25 Kb string. 1023 'a' characters followed
      by a single 'b', repeated 25 times.

      The test extracts a list of indices where 'b' occurs. There are three
      different tests: one for 'manual' looping through the string, one
      using string.patternMatch, and one using re.match. Each test ran 25
      times so I'd have big enough numbers to compare. (I also kept the
      lists produced by each, and compared them, to make sure I was getting
      identical results).

      Manual Loop: 1710 ticks

      string.patternMatch: 7 ticks

      re.match: 8 ticks

      (Um, wow. 200+ times faster for this particular test.)

      A couple ticks in each test were used for updating the lists of
      indices. Take those out and the "kernelized" version is more like 300
      times faster.

      No matter how I changed the variables (number of loops, size of input
      string, frequency of the 'b'), string.patternMatch was always
      marginally faster. If I tweaked the parameters just right I could make
      string.patternMatch almost twice as fast, but it was never anything
      greatly significant.

      I still think it's worth checking in, at least to the
      as-yet-non-existent "edgy" branch. It makes sense to have it. For one
      thing, it's much easier to use than re.match when you're looking for
      characters characters that pcre thinks are 'meta', which is something
      newbies will appreciate.

      Seth

      p.s. My test script is attached, though you won't be able to run it
      without the change to string.patternMatch().
    • Marc Barrot
      ... Hi Seth To me, adding a startingAt parameter to string.patternMatch is a most valuable extension. I can t count the times I wished I had the kernel
      Message 2 of 7 , Jan 1, 2005
      • 0 Attachment
        "frontierkernel@yahoogroups.com" <frontierkernel@yahoogroups.com> wrote:

        > I mentioned that I added the startingAt parameter to string.patternMatch.
        > It's a LOT faster, there's no doubt about that. My assumption, though, was
        > that it would also be somewhat faster than re.match because of the extra
        > complexity of the internals of the re (PCRE) verbs.
        > I was wrong.

        Hi Seth

        To me, adding a 'startingAt' parameter to string.patternMatch is a most
        valuable extension. I can't count the times I wished I had the kernel
        sources available to just bake it in :-)

        Ok, now I have the sources, but since you're comparing string.patternMatch
        and re.match while paving the way for the next 'advanced' version of
        Frontier, would you spare the time to add an optional flUsePcre flag to
        string.patternMatch (defaults to false). When true, string.patternMatch
        would compile its first 'pattern' parameter to a pcre on the fly, and use
        the PCRE library.

        That would make Usertalk code simpler to write and read when there is no
        need to reuse the same compiled pcre several times.

        Just an on the fly feature request :-)

        Back to lurking...

        Marc



        -----------
        Marc Barrot
        Precision IT Management, Inc
      • Seth Dillingham
        ... Actually, I think this would rather un-simplify string.patternMatch. It s an old verb that s been around for a long time and does its job very well.
        Message 3 of 7 , Jan 1, 2005
        • 0 Attachment
          On 1/1/05, Marc Barrot said:

          >That would make Usertalk code simpler to write and read when there
          >is no need to reuse the same compiled pcre several times.

          Actually, I think this would rather "un-simplify"
          string.patternMatch. It's an old verb that's been around for a long
          time and does its job very well.

          Since *nobody* is using it for regular expression matches right now
          -- which is definitely a more advanced use than plain string
          matching -- I don't really see the point of adding an optional
          parameter. The verb would be more complicated, and anybody who
          wants to do regular expressions already has the pcre verbs.

          One-off regular expression matching can be done like this:

          re.match( re.compile( "your pattern" ), s, startingAt )

          which isn't really much more difficult than:

          string.patternMatch( "your pattern", s, startingAt, true )

          Right?

          Six (seven, eight...) years ago I was pestering UserLand for
          optional parameters all over the place, and I probably would have
          been in favor of this change because it seemed to make my immediate
          task a little simpler.

          But... I don't know. Now that I'm into the code, I'm developing a
          reticence to make changes. Call it a "respect" for this old horse.
          I have no problem teaching him some new tricks, or showing him how
          to do the old tricks in a better/faster way, but I'm reluctant to
          start making him more, ah... complicated.

          That's what it is. Efforts to "simplify" by adding optional
          parameters usually just make things more complicated. New users,
          especially, would forever have to look at string.patternMatch and
          wonder "do I want flUsePcre to be true or not? I don't know! Should
          I guess? Should I ask somebody?" Then when he asks, we tell him
          that setting flUsePcre to true opens a wormhole through space and
          time from that string verb to both the re.match and re.compile
          verbs.

          He'll understand it eventually, but then he'll ask for some
          optional parameters to string.patternMatch to match the optional
          parameters in re.match, to which we'll all just say, "Why don't you
          just use re.match()?"

          So there's a "Mail From the Future" for you, Marc. The future
          developers of Frontier, yourself included, all want to know why you
          won't just use re.match() now, and they want to tell you that
          sometimes adding "simple" optional parameters to old verbs actually
          makes things a little more complicated.

          They also want you to know that they still haven't solved that spam
          problem, and in fact they sent some of it through for you.

          Seth
        • Marc Barrot
          ... You may have a point. I reckon it s a matter of personal coding style. Anyway, there s one thing to be said about open source code: I can always make the
          Message 4 of 7 , Jan 2, 2005
          • 0 Attachment
            "frontierkernel@yahoogroups.com" <frontierkernel@yahoogroups.com> wrote:

            > One-off regular expression matching can be done like this:
            >
            > re.match( re.compile( "your pattern" ), s, startingAt )
            >
            > which isn't really much more difficult than:
            >
            > string.patternMatch( "your pattern", s, startingAt, true )

            You may have a point. I reckon it's a matter of personal coding style.

            Anyway, there's one thing to be said about open source code: I can always
            make the modification to my own copy and see what I like most :-)

            Can you please email me your version of string.patternMatch if it's not
            committed anywhere yet ?

            Thanks

            Marc


            -----------
            Marc Barrot
            Precision IT Management, Inc
            info at prec-it.com
          • Andre Radke
            ... I agree that a flag for enabling pcre matching would *not* simplify things. While I was at UserLand, we added an optional flCaseSensitive flag to
            Message 5 of 7 , Jan 2, 2005
            • 0 Attachment
              At 11:45 Uhr -0500 01.01.2005, Seth Dillingham wrote:
              >On 1/1/05, Marc Barrot said:
              >
              >>That would make Usertalk code simpler to write and read when there
              > >is no need to reuse the same compiled pcre several times.
              >
              >Actually, I think this would rather "un-simplify"
              >string.patternMatch. It's an old verb that's been around for a long
              >time and does its job very well.

              I agree that a flag for enabling pcre matching would *not* simplify things.

              While I was at UserLand, we added an optional flCaseSensitive flag to
              string.replace and string.replaceAll, defaulting to true to preserve
              previous behaviour. I think we should consider adding the same flag
              to string.patternMatch.

              Seth, how did you arrive at the name "startingAt" for the name of the
              new optional parameter? Other string verbs use "ix" for parameters
              indicating an index into the string. Should we use "ix" in this
              context, too?

              -Andre
            • Seth Dillingham
              ... Sounds good to me. Anybody else want to consider this one (out loud)? ... I didn t really think about what to call it. startingAt was the first name that
              Message 6 of 7 , Jan 2, 2005
              • 0 Attachment
                On 1/2/05, Andre Radke said:

                >While I was at UserLand, we added an optional flCaseSensitive flag
                >to string.replace and string.replaceAll, defaulting to true to
                >preserve previous behaviour. I think we should consider adding the
                >same flag to string.patternMatch.

                Sounds good to me.

                Anybody else want to consider this one (out loud)?

                >Seth, how did you arrive at the name "startingAt" for the name of
                >the new optional parameter? Other string verbs use "ix" for
                >parameters indicating an index into the string. Should we use "ix"
                >in this context, too?

                I didn't really think about what to call it. startingAt was the
                first name that came to mind, and I didn't check to see what.

                ix works for me. If you want me to change it, I'll do so, no
                problem.

                I see that re.match uses ix, which doesn't surprise me at all. It's
                not as clear, but it's somewhat more consistent.

                regex.match, which you're probably familiar with ;-), uses "start."

                Shall I check it in with ix?

                Seth
              • Philippe Martin
                ... Sounds good to me too. ... I m also in favor of consistency. Philippe -- ______________________________________________________________________ Philippe
                Message 7 of 7 , Jan 2, 2005
                • 0 Attachment
                  At 14:58 -0500 2/01/05, Seth Dillingham wrote:
                  > >While I was at UserLand, we added an optional flCaseSensitive flag
                  >>to string.replace and string.replaceAll, defaulting to true to
                  >>preserve previous behaviour. I think we should consider adding the
                  >>same flag to string.patternMatch.
                  >
                  >Sounds good to me.

                  Sounds good to me too.

                  >ix works for me. If you want me to change it, I'll do so, no
                  >problem.

                  I'm also in favor of consistency.

                  Philippe
                  --
                  ______________________________________________________________________
                  Philippe (Flip) MARTIN mailto:flip@...
                  http://flip.macrobyte.net http://www.Free-Conversant.com
                Your message has been successfully submitted and would be delivered to recipients shortly.