Loading ...
Sorry, an error occurred while loading the content.

Sort & Eliminate Duplicate Lines

Expand Messages
  • Ray Shapp
    To All, Please direct me to a library of sort clips. For my current need, I am looking for a clip that will eliminate duplicate lines in a text file. It s ok
    Message 1 of 23 , Jul 29, 2010
    • 0 Attachment
      To All,

      Please direct me to a library of sort clips.

      For my current need, I am looking for a clip that will eliminate duplicate
      lines in a text file. It's ok if the clip doesn't do the sort. I can use a
      menu command to prepare the file in ascending or descending sequence before
      running the clip. It would be good if the clip can work with "folded" lines.
      I.e. lines that are a bit longer than the screen display. I normally run
      NoteTab with Word Wrap toggled ON (lines wrap to be visible on screen). If
      necessary, I could toggle Word Wrap OFF whenever I run the clip.

      Most lines in the file are less than 256 characters long. In fact, the
      average length is around 40 characters.

      I am running NoteTab Pro v6.2/fv.

      Thank you for the help.

      Ray Shapp


      [Non-text portions of this message have been removed]
    • diodeom
      ... Have a look at the function: ^$StrSort( Str ;CaseSensitive;Ascending;RemoveDuplicates)$. There is no need to worry about the word wrap. Try this: ^!Select
      Message 2 of 23 , Jul 29, 2010
      • 0 Attachment
        Ray Shapp <rayshapp@...> wrote:
        >
        > To All,
        >
        > Please direct me to a library of sort clips.
        >
        > For my current need, I am looking for a clip that will eliminate duplicate
        > lines in a text file. It's ok if the clip doesn't do the sort. I can use a
        > menu command to prepare the file in ascending or descending sequence before
        > running the clip. It would be good if the clip can work with "folded" lines.
        > I.e. lines that are a bit longer than the screen display. I normally run
        > NoteTab with Word Wrap toggled ON (lines wrap to be visible on screen). If
        > necessary, I could toggle Word Wrap OFF whenever I run the clip.
        >
        > Most lines in the file are less than 256 characters long. In fact, the
        > average length is around 40 characters.
        >
        > I am running NoteTab Pro v6.2/fv.
        >
        > Thank you for the help.
        >
        > Ray Shapp
        >
        >

        Have a look at the function: ^$StrSort("Str";CaseSensitive;Ascending;RemoveDuplicates)$.
        There is no need to worry about the word wrap. Try this:

        ^!Select All
        ^!InsertText ^$StrSort("^$GetSelection$";0;1;1)$
      • Axel Berger
        ... I take it you don t care about the original sorting being restored either, do you? Could you supply a sample to play with? Axel
        Message 3 of 23 , Jul 29, 2010
        • 0 Attachment
          Ray Shapp wrote:
          > It's ok if the clip doesn't do the sort.

          I take it you don't care about the original sorting being restored
          either, do you? Could you supply a sample to play with?

          Axel
        • Axel Berger
          diodeom wrote: ^$StrSort( ^$GetSelection$ ;0;1;1)$ Oh, heck, so simple - and would have gone and built it all step by step myself.
          Message 4 of 23 , Jul 29, 2010
          • 0 Attachment
            diodeom wrote:
            ^$StrSort("^$GetSelection$";0;1;1)$

            Oh, heck, so simple - and would have gone and built it all step by step
            myself.
          • Ray Shapp
            Hi Axel, Thanks for the quick reply. There is really nothing special about the original text file except that some of the lines are duplicated. Some lines are
            Message 5 of 23 , Jul 29, 2010
            • 0 Attachment
              Hi Axel,

              Thanks for the quick reply.

              There is really nothing special about the original text file except that
              some of the lines are duplicated. Some lines are not duplicated.

              The original sequence just happens to be random, but before I run the clip,
              I can easily sort the file into ascending order, and when I am done, I will
              keep the file in ascending order.

              Here below is a prototype of the file. Imagine the line of all "a" to
              actually be comprised of a series of numerals, letters, punctuation, and
              spaces. The line of all "b" is a different series of numerals, letters,
              punctuation, and spaces, and so forth.

              The file size is now only about 100 lines long, but it may grow to about a
              thousand lines in a year's time as I append new data. I append data several
              times per week, and some of the lines in the data could be exact duplicates
              of lines that are already in the file.

              My impression is that I saw a clip library at one time that contained about
              six or eight kinds of sorts. If I remember correctly, one of them eliminated
              duplicate lines while sorting.

              Thanks for the help.

              Ray Shapp

              ***prototype of file follows***

              aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
              aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
              bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
              bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
              cccccccccccccccccccccccccccccccccccccccccccccccc
              ddddddddddddddddddddddddddd
              ddddddddddddddddddddddddddd
              eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
              fffffffffffffff
              fffffffffffffff
              ggggggggggggggggggggggggggggggg
              ggggggggggggggggggggggggggggggg


              On Thu, Jul 29, 2010 at 7:21 PM, Axel Berger <Axel-Berger@...> wrote:

              >
              >
              > Ray Shapp wrote:
              > > It's ok if the clip doesn't do the sort.
              >
              > I take it you don't care about the original sorting being restored
              > either, do you? Could you supply a sample to play with?
              >
              > Axel
              >
              >
              >


              [Non-text portions of this message have been removed]
            • diodeom
              ... NoteSort library can be downloaded from here: http://www.notetab.com/libraries.php?cat=misc
              Message 6 of 23 , Jul 29, 2010
              • 0 Attachment
                Ray Shapp <rayshapp@...> wrote:
                >
                > Please direct me to a library of sort clips.
                >

                NoteSort library can be downloaded from here:
                http://www.notetab.com/libraries.php?cat=misc
              • Don
                however there is a built in function on the menu command first be sure under view options tools that you have remove duplicates checked -- once set it
                Message 7 of 23 , Jul 29, 2010
                • 0 Attachment
                  however there is a built in function on the menu command
                  first be sure under view > options > tools that you have remove
                  duplicates checked -- once set it stays set or vice verse until you
                  change it

                  now go to modify > lines > sort

                  you don't need no stinking clip ;-)

                  Speaking of wordwrap, does wordwrap have any effect on sort lines?

                  On 7/29/2010 7:33 PM, Axel Berger wrote:
                  > diodeom wrote:
                  > ^$StrSort("^$GetSelection$";0;1;1)$
                  >
                  > Oh, heck, so simple - and would have gone and built it all step by step
                  > myself.
                  >
                  >
                  > ------------------------------------
                  >
                  > Fookes Software: http://www.fookes.com/
                  > NoteTab website: http://www.notetab.com/
                  > NoteTab Discussion Lists: http://www.notetab.com/groups.php
                  >
                  > ***
                  > Yahoo! Groups Links
                  >
                  >
                  >
                  >
                • Ray Shapp
                  Hi Don and diodeom (your name?) Before I had asked my initial question, I was sure NTP did have a built-in way to eliminate duplicates, but I just couldn t
                  Message 8 of 23 , Jul 29, 2010
                  • 0 Attachment
                    Hi Don and diodeom (your name?)

                    Before I had asked my initial question, I was sure NTP did have a built-in
                    way to eliminate duplicates, but I just couldn't find it in options. I must
                    have skipped right past it when I scanned the "Tools" tab. Thanks for the
                    quick fix.

                    Thanks also for reference to the NoteSort library.

                    In my quick tests with Word Wrap On versus OFF, I found no difference in
                    sorting behavior.

                    Problem solved -- thanks again!

                    Ray Shapp

                    On Thu, Jul 29, 2010 at 9:03 PM, Don <don@...> wrote:

                    >
                    >
                    > however there is a built in function on the menu command
                    > first be sure under view > options > tools that you have remove
                    > duplicates checked -- once set it stays set or vice verse until you
                    > change it
                    >
                    > now go to modify > lines > sort
                    >
                    > you don't need no stinking clip ;-)
                    >
                    > Speaking of wordwrap, does wordwrap have any effect on sort lines?
                    >
                    >
                    > On 7/29/2010 7:33 PM, Axel Berger wrote:
                    > > diodeom wrote:
                    > > ^$StrSort("^$GetSelection$";0;1;1)$
                    > >
                    > > Oh, heck, so simple - and would have gone and built it all step by step
                    > > myself.
                    > >
                    > >
                    >
                    > Visit Your Group<http://groups.yahoo.com/group/ntb-clips;_ylc=X3oDMTJlZmRyNGlrBF9TAzk3MzU5NzE0BGdycElkAzEwNDk2NDYEZ3Jwc3BJZAMxNzA1MDA3Mzg5BHNlYwN2dGwEc2xrA3ZnaHAEc3RpbWUDMTI4MDQ1MTc4Nw-->
                    > <http://us.ard.yahoo.com/SIG=15o85lk7u/M=493064.13814537.14041040.10835568/D=groups/S=1705007389:MKP1/Y=YAHOO/EXP=1280458987/L=371267ac-9b76-11df-8b86-23dddc42447f/B=pQSzFWKJiVI-/J=1280451787662702/K=HQ6KkEwXoDtU4qZAREksrQ/A=6078812/R=0/SIG=114ae4ln1/*http://dogandcatanswers.yahoo.com/>
                    >
                    >


                    --


                    [Non-text portions of this message have been removed]
                  • Art Kocsis
                    Hi Ray, Although you have received lots of good tips on using a clip for your problem, at the risk of putting my foot in my mouth, did you consider the
                    Message 9 of 23 , Jul 30, 2010
                    • 0 Attachment
                      Hi Ray,

                      Although you have received lots of good tips on using a clip for your problem,
                      at the risk of putting my foot in my mouth, did you consider the obvious? -
                      Setting the sort option to remove duplicates and to put the sort and select all
                      tools at the top of your shortcut menu?

                      View | Options | Tools | Sort Removes Duplicates [check box]
                      View | Options | Shortcut Menu | Select All [check box & move]
                      View | Options | Shortcut Menu | Sort Ascending [check box & move]

                      Then two right clicks anywhere in your document will accomplish your task
                      with a minimum of mouse movement and hunting for icons/menu items to click.

                      Personally, I try to use the built in functions/tools as much as possible to
                      minimize the clutter of too many clips. I have about three dozen favorite clips
                      on my clipbar already. Using the shortcut menu takes less time and effort
                      clicking on a clipbar icon even if the clipbar real estate wasn't so precious.

                      Just in case you weren't ware of this.

                      Art

                      At 07-29-10 15:51, you wrote:
                      >To All,
                      >
                      >Please direct me to a library of sort clips.
                      >
                      >For my current need, I am looking for a clip that will eliminate duplicate
                      >lines in a text file. It's ok if the clip doesn't do the sort. I can use a
                      >menu command to prepare the file in ascending or descending sequence before
                      >running the clip. It would be good if the clip can work with "folded" lines.
                      >I.e. lines that are a bit longer than the screen display. I normally run
                      >NoteTab with Word Wrap toggled ON (lines wrap to be visible on screen). If
                      >necessary, I could toggle Word Wrap OFF whenever I run the clip.
                      >
                      >Most lines in the file are less than 256 characters long. In fact, the
                      >average length is around 40 characters.
                      >
                      >I am running NoteTab Pro v6.2/fv.
                      >
                      >Thank you for the help.
                      >
                      >Ray Shapp
                    • diodeom
                      ... If the order of sentences were to be preserved, it actually becomes an interesting task to tinker with. Here is one take: ^!Replace
                      Message 10 of 23 , Jul 30, 2010
                      • 0 Attachment
                        Axel Berger <Axel-Berger@...> wrote:
                        >
                        > I take it you don't care about the original sorting being restored
                        >

                        If the order of sentences were to be preserved, it actually becomes an interesting task to tinker with. Here is one take:

                        ^!Replace "(^.++)\R\K((\1(\R|\Z))+|(?s)(.+?)\1(\R|\Z))" >> "$5" WARSI
                        ^!IfError End Else Skip_-1
                      • flo.gehrke
                        ... diodeom, It s working fine. That alternation, however, might cause a lot of backtracking. So another idea could be to split it as follows: ^!Replace
                        Message 11 of 23 , Jul 30, 2010
                        • 0 Attachment
                          --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:

                          > If the order of sentences were to be preserved, it actually becomes
                          > an interesting task to tinker with. Here is one take:
                          >
                          > ^!Replace "(^.++)\R\K((\1(\R|\Z))+|(?s)(.+?)\1(\R|\Z))" >> "$5" WARSI
                          > ^!IfError End Else Skip_-1

                          diodeom,

                          It's working fine. That alternation, however, might cause a lot of backtracking. So another idea could be to split it as follows:

                          ^!Replace "^([^\r\n]+)(\R|\Z)\1" >> "$1" AWRS
                          ^!Replace "^([^\r\n]+)(\R|\Z)\X+\K(\1(\R|\Z))" >> "" AWRS
                          ^!IfError End Else Skip_-1

                          Also, I'm cautious with Possessive Quantifiers ;-)

                          Tested with...

                          Bertha
                          Eleonore
                          Dorothy
                          Eleonore
                          Bertha
                          Charles
                          Anthony
                          Anthony
                          Donald
                          Anthony
                          Charles
                          Charles

                          Reducing it to...

                          Bertha
                          Eleonore
                          Dorothy
                          Charles
                          Anthony
                          Donald


                          Regards,
                          Flo
                        • John Shotsky
                          Flo, Could you explain the purpose of X in your example? From what I can tell, it s only used with Unicode. Regards, John From: ntb-clips@yahoogroups.com
                          Message 12 of 23 , Jul 30, 2010
                          • 0 Attachment
                            Flo,

                            Could you explain the purpose of \X in your example? From what I can tell, it's only used with
                            Unicode.

                            Regards,
                            John

                            From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of flo.gehrke
                            Sent: Friday, July 30, 2010 16:04
                            To: ntb-clips@yahoogroups.com
                            Subject: [Clip] Re: Sort & Eliminate Duplicate Lines




                            --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> , "diodeom" <diomir@...>
                            wrote:

                            > If the order of sentences were to be preserved, it actually becomes
                            > an interesting task to tinker with. Here is one take:
                            >
                            > ^!Replace "(^.++)\R\K((\1(\R|\Z))+|(?s)(.+?)\1(\R|\Z))" >> "$5" WARSI
                            > ^!IfError End Else Skip_-1

                            diodeom,

                            It's working fine. That alternation, however, might cause a lot of backtracking. So another idea
                            could be to split it as follows:

                            ^!Replace "^([^\r\n]+)(\R|\Z)\1" >> "$1" AWRS
                            ^!Replace "^([^\r\n]+)(\R|\Z)\X+\K(\1(\R|\Z))" >> "" AWRS
                            ^!IfError End Else Skip_-1

                            Also, I'm cautious with Possessive Quantifiers ;-)

                            Tested with...

                            Bertha
                            Eleonore
                            Dorothy
                            Eleonore
                            Bertha
                            Charles
                            Anthony
                            Anthony
                            Donald
                            Anthony
                            Charles
                            Charles

                            Reducing it to...

                            Bertha
                            Eleonore
                            Dorothy
                            Charles
                            Anthony
                            Donald

                            Regards,
                            Flo



                            [Non-text portions of this message have been removed]
                          • diodeom
                            ... Interesting approach. It looks to me though like your X+ grabs as much as it can swallow (including the match), all the way to the end of the file, before
                            Message 13 of 23 , Jul 30, 2010
                            • 0 Attachment
                              "flo.gehrke" <flo.gehrke@...> wrote:
                              >
                              > --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@> wrote:
                              >
                              > > If the order of sentences were to be preserved, it actually becomes
                              > > an interesting task to tinker with. Here is one take:
                              > >
                              > > ^!Replace "(^.++)\R\K((\1(\R|\Z))+|(?s)(.+?)\1(\R|\Z))" >> "$5" WARSI
                              > > ^!IfError End Else Skip_-1
                              >
                              > diodeom,
                              >
                              > It's working fine. That alternation, however, might cause a lot of backtracking. So another idea could be to split it as follows:
                              >
                              > ^!Replace "^([^\r\n]+)(\R|\Z)\1" >> "$1" AWRS
                              > ^!Replace "^([^\r\n]+)(\R|\Z)\X+\K(\1(\R|\Z))" >> "" AWRS
                              > ^!IfError End Else Skip_-1
                              >
                              > Also, I'm cautious with Possessive Quantifiers ;-)
                              >

                              Interesting approach. It looks to me though like your \X+ grabs as much as it can swallow (including the match), all the way to the end of the file, before spitting it back out to fit the pattern and become satisfied only with the stuff before /1. Ironically, that's actually a very good illustration of backtracking. Tested on 2000 lines it took entire 18 seconds. When I made your \X+? non-greedy, the same number of lines took just 1.48 of a second. (Still, the original alteration performed at 1.31.)
                            • flo.gehrke
                              @John ... You are right, it s an an extended Unicode sequence . Nevertheless, it also matches all characters up to dec 255 including NL. So I think it might
                              Message 14 of 23 , Jul 30, 2010
                              • 0 Attachment
                                @John

                                > Flo,
                                >
                                > Could you explain the purpose of \X in your example? From what I
                                > can tell, it's only used with Unicode.

                                You are right, it's an "an extended Unicode sequence". Nevertheless, it also matches all characters up to dec 255 including NL. So I think it might be used as an alternative to to the dot-all-option (?s).


                                @diodeom

                                > When I made your \X+? non-greedy, the same number of lines took
                                > just 1.48 of a second. (Still, the original alteration
                                > performed at 1.31.)

                                There you are! So, obviously, my proposal is no improvement.

                                Flo
                              • diodeom
                                ... I wouldn t necessarily look at speed as the criterion; I submit that it s rather silly to split hair over mere centiseconds anyway. It probably doesn t
                                Message 15 of 23 , Jul 31, 2010
                                • 0 Attachment
                                  "flo.gehrke" <flo.gehrke@...> wrote:
                                  >
                                  > > performed at 1.31.)
                                  >
                                  > There you are! So, obviously, my proposal is no improvement.
                                  >

                                  I wouldn't necessarily look at speed as the criterion; I submit that it's rather silly to split hair over mere centiseconds anyway.

                                  It probably doesn't help that the initial Replace removes only the first back-to-back duplicate -- instead of any number of them. It addition, it runs only once. Should there be three or more identical lines in a row, it won't get rid of all the extras. And after the second Replace does its cycles, there are still some dupes left. (It looks like this issue wouldn't flag itself in your test because your sample lines have maximum of only two sequential repetitions.) Maybe employing the first Replace over and over (e.g. by Skip_-2 instead of Skip_-1) could be one way to solve it (at some expense of efficiency though).

                                  It appears that \Z in the first Replace, as well as the first \Z of the second Replace, serve no purpose -- there is no match possible past the end of the text. I also wonder what's the advantage in favoring the notation "[^\r\n]" over just "." in this context.

                                  I sense that there are easily better ways to handle this deduplicating task than how our drafts approach it. It's just rather challenging for me to assume another angle once my lazy mind sets into one groove. That's why I appreciate what this list provides: a chance to get a fresh perspective from the grooves of others. :)
                                • Axel Berger
                                  ... You re right of course. Though liking to split hairs I do have this issue: There is one clip of mine that usually takes a few seconds, but when run on a
                                  Message 16 of 23 , Jul 31, 2010
                                  • 0 Attachment
                                    diodeom wrote:
                                    > I wouldn't necessarily look at speed as the criterion;

                                    You're right of course. Though liking to split hairs I do have this
                                    issue: There is one clip of mine that usually takes a few seconds, but
                                    when run on a file with thousands of entries easily enables me to go and
                                    make coffee and still be back before it's finished. I'd love to find a
                                    way to speed that one up.

                                    Axel
                                  • John Shotsky
                                    I have a clip library that processes thousands of lines also. My library contains over 17,000 lines including comments, which are plentiful. There is lots of
                                    Message 17 of 23 , Jul 31, 2010
                                    • 0 Attachment
                                      I have a clip library that processes thousands of lines also. My library contains over 17,000 lines
                                      including comments, which are plentiful. There is lots of branching based on what is found in the
                                      files, and it can take over 5 minutes to run on a given source. I'm running it on a 3.2Ghz P4, and I
                                      can watch the temperature shoot up as it processes.

                                      What I'd like to see here is a discussion of the techniques that are fast and/or slow. I'm sure I
                                      have many clips that I could rewrite for speed, but I don't really know what to look for.

                                      Regards,
                                      John

                                      From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Axel Berger
                                      Sent: Saturday, July 31, 2010 09:02
                                      To: ntb-clips@yahoogroups.com
                                      Subject: Re: [Clip] Re: Sort & Eliminate Duplicate Lines


                                      diodeom wrote:
                                      > I wouldn't necessarily look at speed as the criterion;

                                      You're right of course. Though liking to split hairs I do have this
                                      issue: There is one clip of mine that usually takes a few seconds, but
                                      when run on a file with thousands of entries easily enables me to go and
                                      make coffee and still be back before it's finished. I'd love to find a
                                      way to speed that one up.

                                      Axel



                                      [Non-text portions of this message have been removed]
                                    • diodeom
                                      Axel, John, I gather it s not easy to publish any code snippets for communal scrutiny (or nitpicking :), specially the ones suspect of inefficiencies. Then
                                      Message 18 of 23 , Jul 31, 2010
                                      • 0 Attachment
                                        Axel, John,

                                        I gather it's not easy to publish any code snippets for communal scrutiny (or nitpicking :), specially the ones suspect of inefficiencies. Then even if choosing the fishy fragments among thousands of lines were feasible, stripping them beforehand of the stuff we'd rather keep private could be yet another time- & effort-prohibitive undertaking, possibly even wrecking the functionality altogether. Is there another way though to poll the collective wits?

                                        I suppose the viable alternative of "blindly" volunteering some personally resolved trouble spots and gotchas could be the start of the discussion John suggests; however, I imagine that just guessing what might be of use to others makes for a rather fragile motivation.

                                        What could I share? Well, sorry -- another sad thought. :) I rarely dare to look at my older clips, because instead of just slapping some polish here and there over their crudeness, I'm immediately sucked into a complete rewrite; an act often enjoyable, yes, but also the one I don't always have enough time for... to complete.


                                        --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
                                        >
                                        > I have a clip library that processes thousands of lines also. My library contains over 17,000 lines
                                        > including comments, which are plentiful. There is lots of branching based on what is found in the
                                        > files, and it can take over 5 minutes to run on a given source. I'm running it on a 3.2Ghz P4, and I
                                        > can watch the temperature shoot up as it processes.
                                        >
                                        > What I'd like to see here is a discussion of the techniques that are fast and/or slow. I'm sure I
                                        > have many clips that I could rewrite for speed, but I don't really know what to look for.
                                        >
                                        > Regards,
                                        > John
                                        >
                                        > From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Axel Berger
                                        > Sent: Saturday, July 31, 2010 09:02
                                        > To: ntb-clips@yahoogroups.com
                                        > Subject: Re: [Clip] Re: Sort & Eliminate Duplicate Lines
                                        >
                                        >
                                        > diodeom wrote:
                                        > > I wouldn't necessarily look at speed as the criterion;
                                        >
                                        > You're right of course. Though liking to split hairs I do have this
                                        > issue: There is one clip of mine that usually takes a few seconds, but
                                        > when run on a file with thousands of entries easily enables me to go and
                                        > make coffee and still be back before it's finished. I'd love to find a
                                        > way to speed that one up.
                                        >
                                        > Axel
                                      • diodeom
                                        ... I think this could be a useful (if obvious) hint: divide and conquer. Place in the clip a number of back-to-back pairs of timing dinglets. They would pop
                                        Message 19 of 23 , Jul 31, 2010
                                        • 0 Attachment
                                          In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
                                          > (...)
                                          > I'm sure I have many clips that I could rewrite for speed, but I
                                          > don't really know what to look for.
                                          >

                                          I think this could be a useful (if obvious) hint: divide and conquer.

                                          Place in the clip a number of back-to-back pairs of timing dinglets. They would pop up an Info box with a measure of how long <em>the snippet they frame</em> took to execute. After taking notice/closing the box the clip would resume with another timed section. Once the offending larger fragment is located, the trouble spot could be further narrowed down and eventually pinpointed with gradually closer-and-closer spaced timing commands.
                                        • flo.gehrke
                                          ... diodeom, Thanks for the effort you take in this issue! I still wonder if we couldn t make that job simpler. So if you are willing to test another version
                                          Message 20 of 23 , Jul 31, 2010
                                          • 0 Attachment
                                            --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
                                            >
                                            > I wouldn't necessarily look at speed as the criterion...

                                            diodeom,

                                            Thanks for the effort you take in this issue!

                                            I still wonder if we couldn't make that job simpler. So if you are willing to test another version -- have a look at this...

                                            ^!SetScreenUpdate Off
                                            ^!Replace "^([^\r\n]+)(\X+?)?\R\K\1(\R|\Z)" >> "" AWRS
                                            ^!IfError End Else Skip_-1

                                            I've tested it with 100,000 lines, and it made a good job within a few seconds. I suppose you have better tools than me for testing that -- I'm using just NT and nothing else.

                                            Regards,
                                            Flo


                                            P.S. I took those 12 lines from my first post...

                                            Bertha
                                            Eleonore
                                            Dorothy
                                            Eleonore
                                            Bertha
                                            Charles
                                            Anthony
                                            Anthony
                                            Donald
                                            Anthony
                                            Charles
                                            Charles

                                            and multiplied them up to 100,000 lines.
                                          • flo.gehrke
                                            ... Sorry, there s an unnecessary R in here. That s all... ^!SetScreenUpdate Off ^!Replace ^([^ r n]+)( X+?)? K 1( R| Z) AWRS ^!IfError End Else
                                            Message 21 of 23 , Jul 31, 2010
                                            • 0 Attachment
                                              --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:

                                              > I still wonder if we couldn't make that job simpler...

                                              Sorry, there's an unnecessary \R in here. That's all...

                                              ^!SetScreenUpdate Off
                                              ^!Replace "^([^\r\n]+)(\X+?)?\K\1(\R|\Z)" >> "" AWRS
                                              ^!IfError End Else Skip_-1

                                              Flo
                                            • diodeom
                                              ... NoteTab isn t big on sub-second values, so I m letting it ask (via clip) the DOS Time function to help out. If you re interested, the timing clip I m using
                                              Message 22 of 23 , Jul 31, 2010
                                              • 0 Attachment
                                                "flo.gehrke" <flo.gehrke@...> wrote:
                                                >
                                                > --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@> wrote:
                                                > >
                                                > > I wouldn't necessarily look at speed as the criterion...
                                                >
                                                > diodeom,
                                                >
                                                > Thanks for the effort you take in this issue!
                                                >
                                                > I still wonder if we couldn't make that job simpler. So if you are willing to test another version -- have a look at this...
                                                >
                                                > ^!SetScreenUpdate Off
                                                > ^!Replace "^([^\r\n]+)(\X+?)?\R\K\1(\R|\Z)" >> "" AWRS
                                                > ^!IfError End Else Skip_-1
                                                >
                                                > I've tested it with 100,000 lines, and it made a good job within a few seconds. I suppose you have better tools than me for testing that -- I'm using just NT and nothing else.
                                                >


                                                NoteTab isn't big on sub-second values, so I'm letting it ask (via clip) the DOS Time function to help out. If you're interested, the timing clip I'm using is posted here, right under Sheri's:

                                                http://tech.groups.yahoo.com/group/ntb-clips/message/20468

                                                And I apply it like this:

                                                ^!Clip TimeIt
                                                ^!Repl... <pattern>
                                                ^!IfError End Else Skip_-1
                                                :End
                                                ^!Clip TimeIt

                                                I certainly like the conciseness of your new rendition. It helped me to this hairy realization: it's all lovely if we're testing on extremely short and extremely repetitious line samples, no matter how many, but what if any of our non-greedy patterns has to sniff through looong pages of lengthy lines before it finds (or not) the desired match? As I envision it, the poor Regex engine is told to be modest, so it timidly looks at one character at the time and checks if the match is made immediately after. If not, it advances its capture by another single character and checks for the subsequent presence of the match again. And on and on. An exact reverse of "spitting out" that greedy captures have to go through. No big deal if the sought after "Bertha" is just twenty-some characters further, but what if it takes thousands and thousands of characters before one can be spotted or "found to be absent?" :) It ought to cost some time.

                                                Well, I pasted twice just 100 unique lines (averaging about 125 chars long) into a test pile (to get a hundred of singular repetitions spread exactly hundred lines apart) and ran clips on it. The previous two patterns had to slave for 11 - 16 seconds to complete; the last one I cut short (after seeing in the status bar what it took per match).

                                                It might be junk science, but I'm thinking that having a match exactly 100 lines away (in the 200-line long file) could be used to show how comparatively "forward-" and backtracking drains the resources -- when having the same <em>average</em> distance to span. When I run on this text your previous clip in both alterations, with greedy and non-greedy \X, the speed results are pretty similar, as expected.

                                                After all that fun, I'd say it's only fair to try out a no-nonsense conventional loop... I've been trying to avoid:

                                                ^!Set %n%=0
                                                :Loop
                                                ^!Inc %n%
                                                ^!If ^%n%>^$GetTextLineCount$ End
                                                ^!Jump ^%n%
                                                ^!Set %line%=^$GetParagraph(^%n%)$
                                                ^!Replace "^p^%line%" >> "" AS
                                                ^!Goto Loop

                                                ... to find out that it needs only about a third of time as compared to the nearest challenger... on either test file. :)
                                              • Ray Shapp
                                                Hi Art, I m just now going back through some unopened mail that was in a folder from an older computer. Your suggestion about using the right-click short cut
                                                Message 23 of 23 , Apr 11, 2011
                                                • 0 Attachment
                                                  Hi Art,

                                                  I'm just now going back through some unopened mail that was in a folder from
                                                  an older computer. Your suggestion about using the right-click "short cut
                                                  menu" is a good one. (I thought it was called the "context menu".) The
                                                  original problem was solved in July, but I'm using your suggestion now for
                                                  that problem and for other purposes too..

                                                  Thank you.

                                                  Ray Shapp


                                                  On Fri, Jul 30, 2010 at 7:24 AM, Art Kocsis <artkns@...> wrote:

                                                  >
                                                  >
                                                  > Hi Ray,
                                                  >
                                                  > Although you have received lots of good tips on using a clip for your
                                                  > problem,
                                                  > at the risk of putting my foot in my mouth, did you consider the obvious? -
                                                  > Setting the sort option to remove duplicates and to put the sort and select
                                                  > all
                                                  > tools at the top of your shortcut menu?
                                                  >
                                                  > View | Options | Tools | Sort Removes Duplicates [check box]
                                                  > View | Options | Shortcut Menu | Select All [check box & move]
                                                  > View | Options | Shortcut Menu | Sort Ascending [check box & move]
                                                  >
                                                  > Then two right clicks anywhere in your document will accomplish your task
                                                  > with a minimum of mouse movement and hunting for icons/menu items to click.
                                                  >
                                                  > Personally, I try to use the built in functions/tools as much as possible
                                                  > to
                                                  > minimize the clutter of too many clips. I have about three dozen favorite
                                                  > clips
                                                  > on my clipbar already. Using the shortcut menu takes less time and effort
                                                  > clicking on a clipbar icon even if the clipbar real estate wasn't so
                                                  > precious.
                                                  >
                                                  > Just in case you weren't ware of this.
                                                  >
                                                  > Art
                                                  >
                                                  > At 07-29-10 15:51, you wrote:
                                                  > >To All,
                                                  >
                                                  > >
                                                  > >Please direct me to a library of sort clips.
                                                  > >
                                                  > >For my current need, I am looking for a clip that will eliminate duplicate
                                                  > >lines in a text file. It's ok if the clip doesn't do the sort. I can use a
                                                  > >menu command to prepare the file in ascending or descending sequence
                                                  > before
                                                  > >running the clip. It would be good if the clip can work with "folded"
                                                  > lines.
                                                  > >I.e. lines that are a bit longer than the screen display. I normally run
                                                  > >NoteTab with Word Wrap toggled ON (lines wrap to be visible on screen). If
                                                  > >necessary, I could toggle Word Wrap OFF whenever I run the clip.
                                                  > >
                                                  > >Most lines in the file are less than 256 characters long. In fact, the
                                                  > >average length is around 40 characters.
                                                  > >
                                                  > >I am running NoteTab Pro v6.2/fv.
                                                  > >
                                                  > >Thank you for the help.
                                                  > >
                                                  > >Ray Shapp
                                                  >
                                                  >
                                                  >


                                                  [Non-text portions of this message have been removed]
                                                Your message has been successfully submitted and would be delivered to recipients shortly.