Loading ...
Sorry, an error occurred while loading the content.

Re: Question about regex coding efficiency

Expand Messages
  • diodeom
    (Sorry: re-post - attempting to maintain line breaks) Right under my renunciations below, marked with for your skipping convenience, you ll find a
    Message 1 of 9 , Jan 11, 2010
    • 0 Attachment
      (Sorry: re-post - attempting to maintain line breaks)

      Right under my renunciations below, marked with <rant> for your skipping convenience, you'll find a (clip) approach to comparing performance of two script variants down to hundreds of a second — by capturing time via DOS.

      <rant>As far as I know (I hope to be wrong), NoteTab provides "natively" no greater time-stamping accuracy than full seconds (in ^$GetDate()$); it doesn't have something like ^$GetTime()$ and milliseconds available. This is not to say that I miss it. While I love obsessing over "different ways to skin a RegEx" like, I believe, every other normal person exposed to it, I also don't mind a bit these clip+data sets which consume enough time to allow me to pause and smell the ^%expletive% roses.</rant>

      <rant>I'd like some day to see Code Efficiency in a broader term: getting things accomplished fast enough with the lowest overall time and effort commitment, commensurately with a given skill level, the heck with elegance. As it is though, instead of being a means to an end, CE looms over me as this narcissistic, jealous, overbearing deity, insatiable of reverence and endless offerings of head-scratching; worse yet, one who's terribly short on delivery of promised rewards.</rant>

      ; Compare execution time of two script variants | 20100110 Dio
      ; Nominal accuracy: 1/100 sec | Max execution time: 60 min/variant

      ; Trivial outline:
      ; Capture %1Time%
      ;:::::E:X:E:C:U:T:E: First Variant
      ; Capture %2Time%
      ;:::::E:X:E:C:U:T:E: Second Variant
      ; Capture %3Time%
      ; Calculate/evaluate/display results

      ; Capture system time in mm;ss.cc format, where cc stands for centiseconds (and semicolon will later serve as delimiter)
      ; Just before FIRST VARIANT
      ^!Set %1Time%=^$GetDosOutput(echo %time:~3,2%;%time:~6,2%.%time:~9,2%)$

      ; <======S=T=A=R=T== FIRST VARIANT (or this mock 'timewaster' instead:)
      ^!Info FIRST SCRIPT VARIANT^%nl%Let some time pass prior to clicking OK
      ; <====F=I=N=I=S=H== FIRST VARIANT

      ; Capture FIRST finish / SECOND start time
      ^!Set %2Time%=^$GetDosOutput(echo %time:~3,2%;%time:~6,2%.%time:~9,2%)$

      ; <======S=T=A=R=T== SECOND VARIANT (or this mock 'timewaster' instead:)
      ^!Info SECOND SCRIPT VARIANT^%nl%Let some time pass prior to clicking OK
      ; <====F=I=N=I=S=H== SECOND VARIANT

      ; Capture finish time of SECOND VARIANT
      ^!Set %3Time%=^$GetDosOutput(echo %time:~3,2%;%time:~6,2%.%time:~9,2%)$

      ; Workaround (StrCopy) to free variables from their source-imposed formatting constrains and permit calculations
      ; Also, set arrays to unable manipulation of minutes and seconds
      ^!SetArray %1Time%=^$StrCopy(^%1Time%;1;8)$
      ^!SetArray %2Time%=^$StrCopy(^%2Time%;1;8)$
      ^!SetArray %3Time%=^$StrCopy(^%3Time%;1;8)$
      ; If for either variant there are more start than finish minutes (i.e. if script execution spanned the turn of an hour), add 60 (full range cycle) to the latter
      ^!If ^%1Time1%>^%2Time1% ^!Inc %2Time1% 60
      ^!If ^%2Time1%>^%3Time1% ^!Inc %3Time1% 60
      ; Convert all values to centiseconds, store totals while ditching arrays
      ^!Set %1Time%=^$Calc((^%1Time1%*6000)+(^%1Time2%*100))$
      ^!Set %2Time%=^$Calc((^%2Time1%*6000)+(^%2Time2%*100))$
      ^!Set %3Time%=^$Calc((^%3Time1%*6000)+(^%3Time2%*100))$
      ; Store execution times (of First and Second Variants) in 'recycled' variables (%1Time% and %2Time%)
      ^!Set %1Time%=^$Calc(^%2Time%-^%1Time%)$
      ^!Set %2Time%=^$Calc(^%3Time%-^%2Time%)$

      ; Compare results, proceed to the appropriate label
      ^!If ^%2Time%>^%1Time% FirstFaster
      ^!If ^%1Time%>^%2Time% SecondFaster ELSE Equal
      ; Present the results (speed difference, ratio, improvement percentage, totals per variant)
      :FirstFaster
      ;=L=O=N=G==L=I=N=E: START
      ^!Info FIRST VARIANT completed ^$Calc((^%2Time%-^%1Time%)/100;2)$ sec earlier^%nl%or ^$Calc(^%2Time%/^%1Time%)$ times faster than the SECOND one^%nl%( a performance improvement of ^$Calc(((^%2Time%-^%1Time%)*100)/^%2Time%)$ % )^%nl%^%nl%Execution time (min:sec.cc)^%nl%First variant: ^$Calc((^%1Time%/100)DIV60)$:^$Calc((^%1Time%/100)MOD60)$.^$StrCopyRight(^%1Time%;2)$ Second variant: ^$Calc((^%2Time%/100)DIV60)$:^$Calc((^%2Time%/100)MOD60)$.^$StrCopyRight(^%2Time%;2)$
      ;=L=O=N=G==L=I=N=E: END
      ^!Goto Done
      :SecondFaster
      ;=L=O=N=G==L=I=N=E: START
      ^!Info SECOND VARIANT completed ^$Calc((^%1Time%-^%2Time%)/100;2)$ sec earlier^%nl%or ^$Calc(^%1Time%/^%2Time%)$ times faster than the FIRST one^%nl%( a performance improvement of ^$Calc(((^%1Time%-^%2Time%)*100)/^%1Time%)$ % )^%nl%^%nl%Execution time (min:sec.cc)^%nl%First variant: ^$Calc((^%1Time%/100)DIV60)$:^$Calc((^%1Time%/100)MOD60)$.^$StrCopyRight(^%1Time%;2)$ Second variant: ^$Calc((^%2Time%/100)DIV60)$:^$Calc((^%2Time%/100)MOD60)$.^$StrCopyRight(^%2Time%;2)$
      ;=L=O=N=G==L=I=N=E: END
      ^!Goto Done
      :Equal
      ;=L=O=N=G==L=I=N=E: START
      ^!Info Oddly, there is no measurable difference^%nl%in the execution time (^$Calc((^%1Time%/100)DIV60)$:^$Calc((^%1Time%/100)MOD60)$.^$StrCopyRight(^%1Time%;2)$) between the variants.
      ;=L=O=N=G==L=I=N=E: END
      :Done
      ; Housekeeping
      ^!ClearVariable %1Time%
      ^!ClearVariable %2Time%
      ^!ClearVariable %3Time%
      ; End

      --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
      >
      > I often use alternation and/or character classes to do the same thing. I wonder if anyone knows which runs fastest?
      >
      >
      >
      > As an example:
      >
      > In a replace statement, I am looking for whole numbers, simple fractions, or complex fractions followed by a space and a
      > lower case letter. There are two ways to code this:
      >
      > (d+|\d\/\d+|\d+\x20\d\/\d+)(\x20[a-z])
      >
      > Or
      >
      > ([d\/\x20]+)(\x20[a-z])
      >
      >
      >
      > And, if I need to support decimals as well, I'd need to add another term to the first method, but only add a period to
      > the second method.
      >
      >
      >
      > Because the files I process have thousands of lines, I know that I could speed up the processing if I rewrote such
      > statements in the most efficient manner. In some cases, the alternation is the only way it will work because I'm looking
      > for something specific, but I'd use the character class method when possible, if it is deemed faster.
      >
      >
      >
      > Any thoughts?
      >
      > Is there a way to time just a sinle clip to get a certain answer? Let's say I set both replace statements up in a
      > clipbook and simply replace $1 and $2 in each one, not actually changing anything, so they'd both see the same data.
      > Could it be made to show the two times at the end of the run in an info box?
      >
      >
      >
      > Thanks,
      >
      > John
      >
      >
      >
      > [Non-text portions of this message have been removed]
      >
    Your message has been successfully submitted and would be delivered to recipients shortly.