Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] Question about regex coding efficiency

Expand Messages
  • Axel Berger
    ... One thing I have noticed are ^!InsertSelect and ^!InsertText on a large file. For the first fifth the clip flies and then its gets progressively slower
    Message 1 of 9 , Jan 7, 2010
    • 0 Attachment
      John Shotsky wrote:
      > However, I do have processes that take minutes to complete on
      > much smaller files, so I'm going to have to investigate
      > where all the time is consumed.

      One thing I have noticed are ^!InsertSelect and ^!InsertText on a large
      file. For the first fifth the clip flies and then its gets progressively
      slower until towards the end you can easily say the words "copy" and
      "paste" in time with the actions. It seems that somehow the whole file
      gets processed for each small change near the end. One thing I have not
      tried yet - I usually run that clip on small files and use the big one
      to make coffee - is reading from one file and writing to another. Of
      course that would involving reworking the whole clip from scratch.

      Axel
    • John Shotsky
      I have found slowdowns in loops too.and I tracked it down to the Find always starting at the same place, but ending further down in the file each time. It is
      Message 2 of 9 , Jan 7, 2010
      • 0 Attachment
        I have found slowdowns in loops too.and I tracked it down to the Find always starting at the same place, but ending
        further down in the file each time. It is somewhat difficult to catch that issue, but when you force it to always make a
        new start point and end point, it quits slowing down near the end. For an obvious example, if you selected the whole
        document between each loop transit, it would have to look farther and farther down the file for each task to process,
        which would slow things down.



        Regards,

        John



        From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Axel Berger
        Sent: Thursday, January 07, 2010 1:30 PM
        To: ntb-clips@yahoogroups.com
        Subject: Re: [Clip] Question about regex coding efficiency





        John Shotsky wrote:
        > However, I do have processes that take minutes to complete on
        > much smaller files, so I'm going to have to investigate
        > where all the time is consumed.

        One thing I have noticed are ^!InsertSelect and ^!InsertText on a large
        file. For the first fifth the clip flies and then its gets progressively
        slower until towards the end you can easily say the words "copy" and
        "paste" in time with the actions. It seems that somehow the whole file
        gets processed for each small change near the end. One thing I have not
        tried yet - I usually run that clip on small files and use the big one
        to make coffee - is reading from one file and writing to another. Of
        course that would involving reworking the whole clip from scratch.

        Axel





        [Non-text portions of this message have been removed]
      • diodeom
        (Sorry: re-post - attempting to maintain line breaks) Right under my renunciations below, marked with for your skipping convenience, you ll find a
        Message 3 of 9 , Jan 11, 2010
        • 0 Attachment
          (Sorry: re-post - attempting to maintain line breaks)

          Right under my renunciations below, marked with <rant> for your skipping convenience, you'll find a (clip) approach to comparing performance of two script variants down to hundreds of a second — by capturing time via DOS.

          <rant>As far as I know (I hope to be wrong), NoteTab provides "natively" no greater time-stamping accuracy than full seconds (in ^$GetDate()$); it doesn't have something like ^$GetTime()$ and milliseconds available. This is not to say that I miss it. While I love obsessing over "different ways to skin a RegEx" like, I believe, every other normal person exposed to it, I also don't mind a bit these clip+data sets which consume enough time to allow me to pause and smell the ^%expletive% roses.</rant>

          <rant>I'd like some day to see Code Efficiency in a broader term: getting things accomplished fast enough with the lowest overall time and effort commitment, commensurately with a given skill level, the heck with elegance. As it is though, instead of being a means to an end, CE looms over me as this narcissistic, jealous, overbearing deity, insatiable of reverence and endless offerings of head-scratching; worse yet, one who's terribly short on delivery of promised rewards.</rant>

          ; Compare execution time of two script variants | 20100110 Dio
          ; Nominal accuracy: 1/100 sec | Max execution time: 60 min/variant

          ; Trivial outline:
          ; Capture %1Time%
          ;:::::E:X:E:C:U:T:E: First Variant
          ; Capture %2Time%
          ;:::::E:X:E:C:U:T:E: Second Variant
          ; Capture %3Time%
          ; Calculate/evaluate/display results

          ; Capture system time in mm;ss.cc format, where cc stands for centiseconds (and semicolon will later serve as delimiter)
          ; Just before FIRST VARIANT
          ^!Set %1Time%=^$GetDosOutput(echo %time:~3,2%;%time:~6,2%.%time:~9,2%)$

          ; <======S=T=A=R=T== FIRST VARIANT (or this mock 'timewaster' instead:)
          ^!Info FIRST SCRIPT VARIANT^%nl%Let some time pass prior to clicking OK
          ; <====F=I=N=I=S=H== FIRST VARIANT

          ; Capture FIRST finish / SECOND start time
          ^!Set %2Time%=^$GetDosOutput(echo %time:~3,2%;%time:~6,2%.%time:~9,2%)$

          ; <======S=T=A=R=T== SECOND VARIANT (or this mock 'timewaster' instead:)
          ^!Info SECOND SCRIPT VARIANT^%nl%Let some time pass prior to clicking OK
          ; <====F=I=N=I=S=H== SECOND VARIANT

          ; Capture finish time of SECOND VARIANT
          ^!Set %3Time%=^$GetDosOutput(echo %time:~3,2%;%time:~6,2%.%time:~9,2%)$

          ; Workaround (StrCopy) to free variables from their source-imposed formatting constrains and permit calculations
          ; Also, set arrays to unable manipulation of minutes and seconds
          ^!SetArray %1Time%=^$StrCopy(^%1Time%;1;8)$
          ^!SetArray %2Time%=^$StrCopy(^%2Time%;1;8)$
          ^!SetArray %3Time%=^$StrCopy(^%3Time%;1;8)$
          ; If for either variant there are more start than finish minutes (i.e. if script execution spanned the turn of an hour), add 60 (full range cycle) to the latter
          ^!If ^%1Time1%>^%2Time1% ^!Inc %2Time1% 60
          ^!If ^%2Time1%>^%3Time1% ^!Inc %3Time1% 60
          ; Convert all values to centiseconds, store totals while ditching arrays
          ^!Set %1Time%=^$Calc((^%1Time1%*6000)+(^%1Time2%*100))$
          ^!Set %2Time%=^$Calc((^%2Time1%*6000)+(^%2Time2%*100))$
          ^!Set %3Time%=^$Calc((^%3Time1%*6000)+(^%3Time2%*100))$
          ; Store execution times (of First and Second Variants) in 'recycled' variables (%1Time% and %2Time%)
          ^!Set %1Time%=^$Calc(^%2Time%-^%1Time%)$
          ^!Set %2Time%=^$Calc(^%3Time%-^%2Time%)$

          ; Compare results, proceed to the appropriate label
          ^!If ^%2Time%>^%1Time% FirstFaster
          ^!If ^%1Time%>^%2Time% SecondFaster ELSE Equal
          ; Present the results (speed difference, ratio, improvement percentage, totals per variant)
          :FirstFaster
          ;=L=O=N=G==L=I=N=E: START
          ^!Info FIRST VARIANT completed ^$Calc((^%2Time%-^%1Time%)/100;2)$ sec earlier^%nl%or ^$Calc(^%2Time%/^%1Time%)$ times faster than the SECOND one^%nl%( a performance improvement of ^$Calc(((^%2Time%-^%1Time%)*100)/^%2Time%)$ % )^%nl%^%nl%Execution time (min:sec.cc)^%nl%First variant: ^$Calc((^%1Time%/100)DIV60)$:^$Calc((^%1Time%/100)MOD60)$.^$StrCopyRight(^%1Time%;2)$ Second variant: ^$Calc((^%2Time%/100)DIV60)$:^$Calc((^%2Time%/100)MOD60)$.^$StrCopyRight(^%2Time%;2)$
          ;=L=O=N=G==L=I=N=E: END
          ^!Goto Done
          :SecondFaster
          ;=L=O=N=G==L=I=N=E: START
          ^!Info SECOND VARIANT completed ^$Calc((^%1Time%-^%2Time%)/100;2)$ sec earlier^%nl%or ^$Calc(^%1Time%/^%2Time%)$ times faster than the FIRST one^%nl%( a performance improvement of ^$Calc(((^%1Time%-^%2Time%)*100)/^%1Time%)$ % )^%nl%^%nl%Execution time (min:sec.cc)^%nl%First variant: ^$Calc((^%1Time%/100)DIV60)$:^$Calc((^%1Time%/100)MOD60)$.^$StrCopyRight(^%1Time%;2)$ Second variant: ^$Calc((^%2Time%/100)DIV60)$:^$Calc((^%2Time%/100)MOD60)$.^$StrCopyRight(^%2Time%;2)$
          ;=L=O=N=G==L=I=N=E: END
          ^!Goto Done
          :Equal
          ;=L=O=N=G==L=I=N=E: START
          ^!Info Oddly, there is no measurable difference^%nl%in the execution time (^$Calc((^%1Time%/100)DIV60)$:^$Calc((^%1Time%/100)MOD60)$.^$StrCopyRight(^%1Time%;2)$) between the variants.
          ;=L=O=N=G==L=I=N=E: END
          :Done
          ; Housekeeping
          ^!ClearVariable %1Time%
          ^!ClearVariable %2Time%
          ^!ClearVariable %3Time%
          ; End

          --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
          >
          > I often use alternation and/or character classes to do the same thing. I wonder if anyone knows which runs fastest?
          >
          >
          >
          > As an example:
          >
          > In a replace statement, I am looking for whole numbers, simple fractions, or complex fractions followed by a space and a
          > lower case letter. There are two ways to code this:
          >
          > (d+|\d\/\d+|\d+\x20\d\/\d+)(\x20[a-z])
          >
          > Or
          >
          > ([d\/\x20]+)(\x20[a-z])
          >
          >
          >
          > And, if I need to support decimals as well, I'd need to add another term to the first method, but only add a period to
          > the second method.
          >
          >
          >
          > Because the files I process have thousands of lines, I know that I could speed up the processing if I rewrote such
          > statements in the most efficient manner. In some cases, the alternation is the only way it will work because I'm looking
          > for something specific, but I'd use the character class method when possible, if it is deemed faster.
          >
          >
          >
          > Any thoughts?
          >
          > Is there a way to time just a sinle clip to get a certain answer? Let's say I set both replace statements up in a
          > clipbook and simply replace $1 and $2 in each one, not actually changing anything, so they'd both see the same data.
          > Could it be made to show the two times at the end of the run in an info box?
          >
          >
          >
          > Thanks,
          >
          > John
          >
          >
          >
          > [Non-text portions of this message have been removed]
          >
        Your message has been successfully submitted and would be delivered to recipients shortly.