Loading ...
Sorry, an error occurred while loading the content.

Advice to add 'Tabs' to data.

Expand Messages
  • Robin Chapple
    I am not a guru but a user who appreciates your help. I need a clip to add a tab to a large list of data the starts with a date in the format: 1958-59 This
    Message 1 of 18 , Feb 13, 2014
    • 0 Attachment
      I am not a 'guru' but a 'user' who appreciates your help.

      I need a clip to add a tab to a large list of data the starts with a
      date in the format:

      1958-59

      This will give me a CSV file to import into an Access database.

      Many thanks,

      Robin Chapple
    • Axel Berger
      ... Your description leaves a lot of room for guessing and for guessing wrong, but this might just be what you re looking for: ^!Replace ^( d{4}- d{2})
      Message 2 of 18 , Feb 13, 2014
      • 0 Attachment
        Robin Chapple wrote:
        > a clip to add a tab

        Your description leaves a lot of room for guessing and for guessing wrong,
        but this might just be what you're looking for:

        ^!Replace "^(\d{4}-\d{2})" >> "$1\t" WRASTI

        If not then as always give us a sample of data before and after, i.e. what
        they are and what you want them become to be.

        Axel
      • Don
        Give us before and after ... where does the tab go? replace command with regex? replace (^ d{4}- d{2}) with $1 t See if that is what you wanted Robin.
        Message 3 of 18 , Feb 13, 2014
        • 0 Attachment
          Give us before and after ... where does the tab go?

          replace command with regex?
          replace
          (^\d{4}-\d{2})
          with
          $1\t

          See if that is what you wanted Robin.


          On 2/13/2014 4:38 PM, Robin Chapple wrote:
          > I am not a 'guru' but a 'user' who appreciates your help.
          >
          > I need a clip to add a tab to a large list of data the starts with a
          > date in the format:
          >
          > 1958-59
          >
          > This will give me a CSV file to import into an Access database.
          >
          > Many thanks,
          >
          > Robin Chapple
          >
        • Robin Chapple
          Thanks Axel and my apology for scant information. The existing is: 1996-97 Graeme Beck Rotary Club of Benalla 1997-98 Geoff McIlvenna Rotary Club
          Message 4 of 18 , Feb 13, 2014
          • 0 Attachment

            Thanks Axel and my apology for scant information.

            The existing is:

            1996-97 Graeme Beck      Rotary Club of Benalla
            1997-98 Geoff McIlvenna          Rotary Club of Preston
            1998-99 Neville Miles    Rotary Club of Kyabram
            1999-2000 Terry Grant    Rotary Club of Sunbury

            I need it to be

            1996-97          Graeme Beck      Rotary Club of Benalla
            1997-98          Geoff McIlvenna          Rotary Club of Preston
            1998-99          Neville Miles    Rotary Club of Kyabram
            1999-2000        Terry Grant      Rotary Club of Sunbury

            Your first suggestion did not know that date form varied and it gave me this:

            1998-99   Neville Miles   Rotary Club of Kyabram
            1999-20   00 Terry Grant   Rotary Club of Sunbury

            I hope that this helps.

            Regards,

            Robin Chapple

            At 14/02/2014 08:52 AM, you wrote:
             

            Robin Chapple wrote:
            > a clip to add a tab

            Your description leaves a lot of room for guessing and for guessing wrong,
            but this might just be what you're looking for:

            ^!Replace "^(\d{4}-\d{2})" >> "$1\t" WRASTI

            If not then as always give us a sample of data before and after, i.e. what
            they are and what you want them become to be.

            Axel
          • Don
            Give this a whirl: ^!Replace ^( d+- d+) + $1 t WRASTI I assume that the part between name and club is already as you want it to be. Otherwise try the
            Message 5 of 18 , Feb 13, 2014
            • 0 Attachment
              Give this a whirl:

              ^!Replace "^(\d+-\d+) +" >> "$1\t" WRASTI

              I assume that the part between name and club is already as you want it
              to be. Otherwise try the following:

              ^!Replace "^(\d+-\d+) +(.*?) +" >> "$1\t$2\t" WRASTI


              On 2/13/2014 5:14 PM, Robin Chapple wrote:
              >
              > Thanks Axel and my apology for scant information.
              >
              > The existing is:
              >
              > 1996-97 Graeme Beck Rotary Club of Benalla
              > 1997-98 Geoff McIlvenna Rotary Club of Preston
              > 1998-99 Neville Miles Rotary Club of Kyabram
              > 1999-2000 Terry Grant Rotary Club of Sunbury
              >
              > I need it to be
              >
              > 1996-97 Graeme Beck Rotary Club of Benalla
              > 1997-98 Geoff McIlvenna Rotary Club of Preston
              > 1998-99 Neville Miles Rotary Club of Kyabram
              > 1999-2000 Terry Grant Rotary Club of Sunbury
              >
              >
            • John Shotsky
              Slightly smaller and fasterà ^!Replace ^ d+- d+ K x20+ t ARWS Regards, John RecipeTools Web Site:
              Message 6 of 18 , Feb 13, 2014
              • 0 Attachment

                Slightly smaller and faster…

                ^!Replace "^\d+-\d+\K\x20+" >> "\t" ARWS

                 

                Regards,
                John
                RecipeTools Web Site: http://recipetools.gotdns.com/
                John's Mags Yahoo Group:  http://groups.yahoo.com/group/johnsmags/

                 

                From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Don
                Sent: Thursday, February 13, 2014 14:26
                To: ntb-clips@yahoogroups.com
                Subject: Re: [Clip] Advice to add 'Tabs' to data.

                 

                 

                Give this a whirl:

                ^!Replace "^(\d+-\d+) +" >> "$1\t" WRASTI

                I assume that the part between name and club is already as you want it
                to be. Otherwise try the following:

                ^!Replace "^(\d+-\d+) +(.*?) +" >> "$1\t$2\t" WRASTI

                On 2/13/2014 5:14 PM, Robin Chapple wrote:
                >
                > Thanks Axel and my apology for scant information.
                >
                > The existing is:
                >
                > 1996-97 Graeme Beck Rotary Club of Benalla
                > 1997-98 Geoff McIlvenna Rotary Club of Preston
                > 1998-99 Neville Miles Rotary Club of Kyabram
                > 1999-2000 Terry Grant Rotary Club of Sunbury
                >
                > I need it to be
                >
                > 1996-97 Graeme Beck Rotary Club of Benalla
                > 1997-98 Geoff McIlvenna Rotary Club of Preston
                > 1998-99 Neville Miles Rotary Club of Kyabram
                > 1999-2000 Terry Grant Rotary Club of Sunbury
                >
                >

              • Robin Chapple
                Many thanks Don, That gave the required result. I do need the name and clubn separated but that is alraey covered and I do not need yje combined clip. Agaion
                Message 7 of 18 , Feb 13, 2014
                • 0 Attachment
                  Many thanks Don,

                  That gave the required result.

                  I do need the name and clubn separated but that is alraey covered and
                  I do not need yje combined clip.

                  Agaion many thanks,

                  Robin

                  At 14/02/2014 09:25 AM, you wrote:
                  >
                  >
                  >Give this a whirl:
                  >
                  >^!Replace "^(\d+-\d+) +" >> "$1\t" WRASTI
                  >
                  >I assume that the part between name and club is already as you want it
                  >to be.
                • Robin Chapple
                  Thanks John, I will keep the clip and compare next time. Cheers, Robin ... Thanks John, I will keep the clip and compare next time. Cheers, Robin At 14/02/2014
                  Message 8 of 18 , Feb 13, 2014
                  • 0 Attachment

                    Thanks John,

                    I will keep the clip and compare next time.

                    Cheers,

                    Robin

                    At 14/02/2014 09:30 AM, you wrote:
                     

                    Slightly smaller and faster…

                    ^!Replace "^\d+-\d+\K\x20+" >> "\t" ARWS

                     

                    Regards,
                    John
                    RecipeTools Web Site: http://recipetools.gotdns.com/
                    John's Mags Yahoo Group:  http://groups.yahoo.com/group/johnsmags/

                     

                    From: ntb-clips@yahoogroups.com [ mailto:ntb-clips@yahoogroups.com] On Behalf Of Don
                    Sent: Thursday, February 13, 2014 14:26
                    To: ntb-clips@yahoogroups.com
                    Subject: Re: [Clip] Advice to add 'Tabs' to data.

                     

                     

                    Give this a whirl:

                    ^!Replace "^(\d+-\d+) +" >> "$1\t" WRASTI

                    I assume that the part between name and club is already as you want it
                    to be. Otherwise try the following:

                    ^!Replace "^(\d+-\d+) +(.*?) +" >> "$1\t$2\t" WRASTI

                    On 2/13/2014 5:14 PM, Robin Chapple wrote:
                    >
                    > Thanks Axel and my apology for scant information.
                    >
                    > The existing is:
                    >
                    > 1996-97 Graeme Beck Rotary Club of Benalla
                    > 1997-98 Geoff McIlvenna Rotary Club of Preston
                    > 1998-99 Neville Miles Rotary Club of Kyabram
                    > 1999-2000 Terry Grant Rotary Club of Sunbury
                    >
                    > I need it to be
                    >
                    > 1996-97 Graeme Beck Rotary Club of Benalla
                    > 1997-98 Geoff McIlvenna Rotary Club of Preston
                    > 1998-99 Neville Miles Rotary Club of Kyabram
                    > 1999-2000 Terry Grant Rotary Club of Sunbury
                    >
                    >

                  • Don
                    His is smarter actually Robin. K avoids the necessity of the () and the $1 since it finds but doesn t touch the date. Then x20 is just a better way to
                    Message 9 of 18 , Feb 13, 2014
                    • 0 Attachment
                      His is smarter actually Robin. \K avoids the necessity of the () and
                      the $1 since it finds but doesn't touch the date. Then \x20 is just a
                      "better" way to say space since you can actually see it.


                      On 2/13/2014 7:01 PM, Robin Chapple wrote:
                      > ^!Replace "^\d+-\d+\K\x20+" >> "\t" ARWS
                    • Ian NTnerd
                      Not getting to read many of this list but this answer was worth it. Now I understand K. Great. Neat solution. Lookout K here I come. Ian ... Not getting to
                      Message 10 of 18 , Feb 14, 2014
                      • 0 Attachment
                        Not getting to read many of this list but this answer was worth it. Now I understand \K. Great.

                        Neat solution. Lookout \K here I come.

                        Ian

                        On 14/02/2014 6:30 AM, John Shotsky wrote:
                         

                        Slightly smaller and faster…

                        ^!Replace "^\d+-\d+\K\x20+" >> "\t" ARWS

                         

                        Regards,
                        John
                        RecipeTools Web Site: http://recipetools.gotdns.com/
                        John's Mags Yahoo Group:  http://groups.yahoo.com/group/johnsmags/

                         

                        From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Don
                        Sent: Thursday, February 13, 2014 14:26
                        To: ntb-clips@yahoogroups.com
                        Subject: Re: [Clip] Advice to add 'Tabs' to data.

                         

                         

                        Give this a whirl:

                        ^!Replace "^(\d+-\d+) +" >> "$1\t" WRASTI

                        I assume that the part between name and club is already as you want it
                        to be. Otherwise try the following:

                        ^!Replace "^(\d+-\d+) +(.*?) +" >> "$1\t$2\t" WRASTI

                        On 2/13/2014 5:14 PM, Robin Chapple wrote:
                        >
                        > Thanks Axel and my apology for scant information.
                        >
                        > The existing is:
                        >
                        > 1996-97 Graeme Beck Rotary Club of Benalla
                        > 1997-98 Geoff McIlvenna Rotary Club of Preston
                        > 1998-99 Neville Miles Rotary Club of Kyabram
                        > 1999-2000 Terry Grant Rotary Club of Sunbury
                        >
                        > I need it to be
                        >
                        > 1996-97 Graeme Beck Rotary Club of Benalla
                        > 1997-98 Geoff McIlvenna Rotary Club of Preston
                        > 1998-99 Neville Miles Rotary Club of Kyabram
                        > 1999-2000 Terry Grant Rotary Club of Sunbury
                        >
                        >


                      • John Shotsky
                        In the work I do, I often need to just insert something, or just delete something, or replace something between two other items. For these, there is no need to
                        Message 11 of 18 , Feb 14, 2014
                        • 0 Attachment

                          In the work I do, I often need to just insert something, or just delete something, or replace something between two other items. For these, there is no need to use capturing parens, because you aren't 'replacing' what isn't being changed. The example this started with shows replacing space(s) following a character combination. But sometimes you need to perform an action only when there is the proper thing following the target characters. For those, you write it as below and add a term like (?= the following term). You cannot add anything AFTER this segment, it terminates the search, but it does define the end point of your search. Say in the example below that it is only to happen when a capital letter follows:

                          ^!Replace "^\d+-\d+\K\x20+(?=[A-Z])" >> "\t" ARWS

                          Again, nothing is captured, which saves time. Some spreadsheets are very large, and anything that can be done to speed up actions is helpful. My clip library contains 50,000 lines counting comments, and runs against files of up to 2 mb. I can not only go get a cup of coffee, I can drink it and get a refill while I wait. So one important rule I have made for myself is to not capture ANYTHING that doesn't HAVE to be captured, and thus not use $1 any more than is actually needed, because each capture and replace actually takes time, which adds up. My next pet peeve is loops, but of course we have to have them, it is just that some things are more quickly done than others based on how they are implemented.

                          For example, there are different ways to capitalize the first letter of a line. You can use a Find, then IfError, then some form of toolbar or other function to convert the found character to a capital letter. But there is MUCH faster way:

                          ^!Replace "^a" >> "A" ARWS

                          ^!Replace "^b" >> "B" ARWS

                          etc.

                          This gets the exact same job performed using 26 replaces in a called clip. No loop is required, and it is over 30 times faster, generally. Again, nothing is captured, only requisite letters are changed. When you run large libraries on large files, you have an opportunity to see which functions take the longest. All of my subroutines have a banner like "Now converting fractions to decimals", and the screen update is turned off. So I know what it is doing and can spot overly long subroutines easily. I figure out why they take so much time, and often come to this forum to find solutions for faster ways to do things. That's how I picked up that capitalization 'trick', which saved gobs of time. I use that to capitalize proper words in the same way, but using the (?=) method:

                          ^!Replace "\bn(?=otepad|otetab|ovember)" >> "N" ARWS

                          One replace for each letter, and all of the words that are to be capitalized are contained in a non-capturing group. If any of these could repeat in a single line, you'd have to add a loop:

                          ^!IfError Next Else Skip_-1.

                          But even with that loop, no capturing is involved, so it is faster than the typical 'find, IfError' loop method.

                          Sometimes, I have to insert a word when it is determined that the word does NOT exist following a given condition. For example, if I want the word 'cheese' to follow the word Cheddar, but only want to insert it when it is missing, I use this:

                          ^!Replace "\b(^%Cheeses%)\b\K(?! cheese)" >> " cheese" AIRSW

                          I store the names of the cheeses in a variable called %Cheeses%. If it is determined that the name of a cheese is NOT followed by the word cheese, it is inserted. Again, nothing is captured, stored or replaced, only a word inserted in position when missing.

                          NoteTab with Regex is really amazing. Nothing else even comes close, at any price.

                          I hope my examples above will inspire others to experiment with some of the more advanced techniques - there is almost nothing that can't be handled, given a set of starting conditions and ending conditions desired.

                           

                          Regards,
                          John
                          RecipeTools Web Site: http://recipetools.gotdns.com/
                          John's Mags Yahoo Group:  http://groups.yahoo.com/group/johnsmags/

                           

                          From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Ian NTnerd
                          Sent: Friday, February 14, 2014 04:18
                          To: ntb-clips@yahoogroups.com
                          Subject: Re: [Clip] Advice to add 'Tabs' to data.

                           

                           

                          Not getting to read many of this list but this answer was worth it. Now I understand \K. Great.

                          Neat solution. Lookout \K here I come.

                          Ian

                          On 14/02/2014 6:30 AM, John Shotsky wrote:

                           

                          Slightly smaller and faster…

                          ^!Replace "^\d+-\d+\K\x20+" >> "\t" ARWS

                           

                          Regards,
                          John
                          RecipeTools Web Site: http://recipetools.gotdns.com/
                          John's Mags Yahoo Group:  http://groups.yahoo.com/group/johnsmags/

                           

                          From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Don
                          Sent: Thursday, February 13, 2014 14:26
                          To: ntb-clips@yahoogroups.com
                          Subject: Re: [Clip] Advice to add 'Tabs' to data.

                           

                           

                          Give this a whirl:

                          ^!Replace "^(\d+-\d+) +" >> "$1\t" WRASTI

                          I assume that the part between name and club is already as you want it
                          to be. Otherwise try the following:

                          ^!Replace "^(\d+-\d+) +(.*?) +" >> "$1\t$2\t" WRASTI

                          On 2/13/2014 5:14 PM, Robin Chapple wrote:
                          >
                          > Thanks Axel and my apology for scant information.
                          >
                          > The existing is:
                          >
                          > 1996-97 Graeme Beck Rotary Club of Benalla
                          > 1997-98 Geoff McIlvenna Rotary Club of Preston
                          > 1998-99 Neville Miles Rotary Club of Kyabram
                          > 1999-2000 Terry Grant Rotary Club of Sunbury
                          >
                          > I need it to be
                          >
                          > 1996-97 Graeme Beck Rotary Club of Benalla
                          > 1997-98 Geoff McIlvenna Rotary Club of Preston
                          > 1998-99 Neville Miles Rotary Club of Kyabram
                          > 1999-2000 Terry Grant Rotary Club of Sunbury
                          >
                          >

                           

                        • flo.gehrke
                          ... There s a misunderstanding in this statement. The use of K does not interfere with the setting of captured substrings (Help on RegEx). So your
                          Message 12 of 18 , Feb 14, 2014
                          • 0 Attachment
                            ---In ntb-clips@yahoogroups.com, <jshotsky@...> wrote:
                            > For example, if I want the word 'cheese' to follow the word
                            > Cheddar, but only want to insert it when it is missing, I use
                            > this:
                            >
                            > ^!Replace "\b(^%Cheeses%)\b\K(?! cheese)" >> " cheese" AIRSW
                            >
                            > I store the names of the cheeses in a variable called
                            > %Cheeses%. If it is determined that the name of a cheese is NOT
                            > followed by the word cheese, it is inserted. Again, nothing is
                            > captured, stored or replaced, only a word inserted in position
                            > when missing.

                            There's a misunderstanding in this statement. "The use of \K does not interfere with the setting of captured substrings (Help on RegEx)." So your subpattern '(^%Cheeses%)' will capture the match anyway. You can find this out with the test...

                            ^!Set %Cheeses%=Cheddar
                            ^!Find "\b(^%Cheeses%)\b\K(?! cheese)" RS
                            ^!Info ^$GetReSubstrings$

                            which will output 'Cheddar' where 'cheese' is missing.

                            If we talk of superfluous capturings we should also mention superfluous parens. In your pattern, for example, there's no need to write '\b(^%Cheeses%)\b'. Writing '\b^%Cheese%\b' would work as well -- without capturing anything.

                            If there is any need to group a subpattern you could avoid capturing by using a non-capturing group like '(?:^%Cheese%)'.

                            Regards,
                            Flo

                          • Axel Berger
                            ... Mine go to something like 140 MB and I also run very many Replaces in a row. NT Clips are so fast I don t really care and write them for legibility and
                            Message 13 of 18 , Feb 14, 2014
                            • 0 Attachment
                              John Shotsky wrote:
                              > and runs against files of up to 2 mb.

                              Mine go to something like 140 MB and I also run very many Replaces in a
                              row. NT Clips are so fast I don't really care and write them for legibility
                              and maintainability. The one exception is InsertText in a loop, but for
                              anything else I just don't care if it's a bit faster or not.

                              Axel
                            • John Shotsky
                              I wasn t explicit about how my variables are stored, but they are not stored with parens, only with vertical bars and b where needed INSIDE the variable.
                              Message 14 of 18 , Feb 14, 2014
                              • 0 Attachment

                                I wasn't explicit about how my variables are stored, but they are not stored with parens, only with vertical bars and \b where needed INSIDE the variable. Then, when used, I add the parens at use time. Often, I add extra words along with the variable inside the parens. I have thought about changing that, but so far have not found it to be better. Perhaps some more testing will show it as viable. So yes, it is captured, but is not replaced.

                                I will try your suggestion though, as a learning exercise. I'm always open to learning better ways to do things.

                                One of the things I have been doing is enclosing multiple paren phrases inside an (?= phrase), so that it won't capture, but I wonder if it is captured anyway.

                                Replace ….(?= (this|that) (the other this|the other that).*$) Thoughts?

                                Regards,
                                John
                                RecipeTools Web Site: http://recipetools.gotdns.com/
                                John's Mags Yahoo Group:  http://groups.yahoo.com/group/johnsmags/

                                 

                                From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of flo.gehrke@...
                                Sent: Friday, February 14, 2014 09:03
                                To: ntb-clips@yahoogroups.com
                                Subject: RE: [Clip] Advice to add 'Tabs' to data.

                                 

                                 

                                ---In ntb-clips@yahoogroups.com, <jshotsky@...> wrote:

                                > For example, if I want the word 'cheese' to follow the word

                                > Cheddar, but only want to insert it when it is missing, I use
                                > this:
                                >
                                > ^!Replace "\b(^%Cheeses%)\b\K(?! cheese)" >> " cheese" AIRSW
                                >
                                > I store the names of the cheeses in a variable called
                                > %Cheeses%. If it is determined that the name of a cheese is NOT
                                > followed by the word cheese, it is inserted. Again, nothing is
                                > captured, stored or replaced, only a word inserted in position
                                > when missing.

                                There's a misunderstanding in this statement. "The use of \K does not interfere with the setting of captured substrings (Help on RegEx)." So your subpattern '(^%Cheeses%)' will capture the match anyway. You can find this out with the test...

                                ^!Set %Cheeses%=Cheddar
                                ^!Find "\b(^%Cheeses%)\b\K(?! cheese)" RS
                                ^!Info ^$GetReSubstrings$

                                which will output 'Cheddar' where 'cheese' is missing.

                                If we talk of superfluous capturings we should also mention superfluous parens. In your pattern, for example, there's no need to write '\b(^%Cheeses%)\b'. Writing '\b^%Cheese%\b' would work as well -- without capturing anything.

                                If there is any need to group a subpattern you could avoid capturing by using a non-capturing group like '(?:^%Cheese%)'.

                                Regards,
                                Flo

                              • flo.gehrke
                                ... No, also parens (or a group ) inside a Lookaround are captured. Test... ^!Set %Cheeses%=Cheddar ^!Find b^%Cheeses% b(?=( x20cheese)) RS ^!Info
                                Message 15 of 18 , Feb 14, 2014
                                • 0 Attachment
                                  --In ntb-clips@yahoogroups.com, <jshotsky@...> wrote:

                                  > One of the things I have been doing is enclosing multiple paren
                                  > phrases inside an (?= phrase), so that it won't capture, but I
                                  > wonder if it is captured anyway.

                                  No, also parens (or a 'group') inside a Lookaround are captured. Test...

                                  ^!Set %Cheeses%=Cheddar
                                  ^!Find "\b^%Cheeses%\b(?=(\x20cheese))" RS
                                  ^!Info ^$GetReSubstrings$

                                  against 'Cheddar cheese'. The output will be 'cheese' that is captured with the group inside the Lookahead Assertion.

                                  BTW: I think sometimes it's rather difficult to find out whether the speed of clip execution depends on the search pattern, the clip code or the way it consumes memory. Certain patterns, for example, could cause serious stack problems and lead to wrong results -- cf my message #22824 of June 20, 2012.

                                  Also there can be clips which are rather slow because the RegEx pattern causes a lot of backtracking. For example, take this line...

                                  101101010001011101110011101110001000100';'abababab!';

                                  and multiply it to 10,000 lines. Now run the following clip against those lines:

                                  ^!Find "[01]+.*(aa|bb)" WR

                                  For me, the clip needs almost a minute to find out that there is no match. That is, that there is no line that ends with 'aa' or 'bb'. However, the problem is not in NT or the clip code but in the RegEx. The trick is to suppress the backtracking because, actually, it isn't needed here. With an Atomic Group...

                                  ^!Find "(?>[01]+).*(aa|bb)" WR

                                  the job is done in two seconds.

                                  Regards,
                                  Flo

                                • John Shotsky
                                  Thank you! One of the things I have been meaning to get a better handle on is exactly this. With this example, I should be able to apply it to my cases.
                                  Message 16 of 18 , Feb 14, 2014
                                  • 0 Attachment

                                    Thank you! One of the things I have been meaning to get a better handle on is exactly this. With this example, I should be able to apply it to my cases.

                                     

                                    Regards,
                                    John
                                    RecipeTools Web Site: http://recipetools.gotdns.com/
                                    John's Mags Yahoo Group:  http://groups.yahoo.com/group/johnsmags/

                                     

                                    From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of flo.gehrke@...
                                    Sent: Friday, February 14, 2014 12:55
                                    To: ntb-clips@yahoogroups.com
                                    Subject: RE: [Clip] Advice to add 'Tabs' to data.

                                     

                                     

                                    --In ntb-clips@yahoogroups.com, <jshotsky@...> wrote:

                                     

                                    > One of the things I have been doing is enclosing multiple paren

                                    > phrases inside an (?= phrase), so that it won't capture, but I
                                    > wonder if it is captured anyway.

                                    No, also parens (or a 'group') inside a Lookaround are captured. Test...

                                    ^!Set %Cheeses%=Cheddar
                                    ^!Find "\b^%Cheeses%\b(?=(\x20cheese))" RS
                                    ^!Info ^$GetReSubstrings$

                                    against 'Cheddar cheese'. The output will be 'cheese' that is captured with the group inside the Lookahead Assertion.

                                    BTW: I think sometimes it's rather difficult to find out whether the speed of clip execution depends on the search pattern, the clip code or the way it consumes memory. Certain patterns, for example, could cause serious stack problems and lead to wrong results -- cf my message #22824 of June 20, 2012.

                                    Also there can be clips which are rather slow because the RegEx pattern causes a lot of backtracking. For example, take this line...

                                    101101010001011101110011101110001000100';'abababab!';

                                    and multiply it to 10,000 lines. Now run the following clip against those lines:

                                    ^!Find "[01]+.*(aa|bb)" WR

                                    For me, the clip needs almost a minute to find out that there is no match. That is, that there is no line that ends with 'aa' or 'bb'. However, the problem is not in NT or the clip code but in the RegEx. The trick is to suppress the backtracking because, actually, it isn't needed here. With an Atomic Group...

                                    ^!Find "(?>[01]+).*(aa|bb)" WR

                                    the job is done in two seconds.

                                    Regards,
                                    Flo

                                  • Don
                                    Atomic groups are something still a little past my grasp, though I use them when I experience slow downs. I don t always understand why or when. K ... I m
                                    Message 17 of 18 , Feb 14, 2014
                                    • 0 Attachment
                                      Atomic groups are something still a little past my grasp, though I use
                                      them when I experience slow downs. I don't always understand why or when.
                                      \K ... I'm working on it.
                                      Lookaheads and so forth I have down pretty well now.

                                      On 2/14/2014 3:54 PM, flo.gehrke@... wrote:
                                      > For me, the clip needs almost a minute to find out that there is no match. That is, that there is no line that ends with 'aa' or 'bb'. However, the problem is not in NT or the clip code but in the RegEx. The trick is to suppress the backtracking because, actually, it isn't needed here. With an Atomic Group...
                                      >
                                      > ^!Find "(?>[01]+).*(aa|bb)" WR
                                    • flo.gehrke
                                      ... Well, Don, in short, an Atomic Group says: Don t look back if you don t achieve a match! (similar to Possessive Quantifiers and certain Verbs). On the
                                      Message 18 of 18 , Feb 14, 2014
                                      • 0 Attachment
                                        --In ntb-clips@yahoogroups.com, <don@...> wrote:

                                        > Atomic groups are something still a little past my grasp, though I use
                                        > them when I experience slow downs. I don't always understand why or
                                        > when. \K ... I'm working on it. Lookaheads and so forth I have down
                                        >  pretty  well now.

                                        Well, Don, in short, an Atomic Group says: "Don't look back if you don't achieve a match!" (similar to Possessive Quantifiers and certain Verbs).

                                        On the other hand: Be careful! Backtracking is absolutely necessary where the RegEx Engine must test alternations or options.

                                        Example: Given a string of 10,000 lines like...

                                        10110101000101110111001110111000100010011

                                        The following clip uses an Atomic Group in order to test if there is any line that ends with '11'...

                                        ^!Find "^(?>[01]+)11$" WR

                                        The result will be "No match!" though there are 10,000 lines ending with '11'.

                                        Why this?

                                        Since the Engine doesn't look back, all characters are consumed already with '[01]+'. Now, at the end of each line, the Engine tests for '11'. Since there is no '11' left, the testing fails, and there is no match.

                                        Regards,
                                        Flo
                                      Your message has been successfully submitted and would be delivered to recipients shortly.