Loading ...
Sorry, an error occurred while loading the content.

Re: [NTS] Trying to perfect RegExp to match various numbers

Expand Messages
  • Sheri
    Hi Don, Let me know if you have a specific question about my patterns, I don t see anything there that should be hard to follow. If a subpattern starts with ?:
    Message 1 of 14 , Mar 29, 2011
    • 0 Attachment
      Hi Don,

      Let me know if you have a specific question about my patterns, I don't
      see anything there that should be hard to follow. If a subpattern starts
      with ?: it makes in non-capturing if that threw you.

      On 3/29/2011 11:27 AM, Don wrote:
      >
      > ^.*?\K\t([0-9,]+)\t(.*\K)
      > I get nothing found. why does my last \K not work for me?
      >
      > Thanks for helping me understand both what you are doing and my inferior
      > attempt.

      \K is for defining a split point in the pattern. Matching stuff before
      the \K is discarded. So if a pattern ends with \K, the most it could
      match would be the empty string that follows all the stuff that's been
      discarded.

      I think the only time it might make sense to have more than one \K in a
      pattern would be if they were parts of different alternatives (where
      alternatives are separated by vertical bars).

      Regards,
      Sheri
    • mycroftj
      I m terribly sorry for not being clear. I branched into quite a few directions at once. The goal was to pad all numbers in a document with spaces or zeros so
      Message 2 of 14 , Mar 30, 2011
      • 0 Attachment
        I'm terribly sorry for not being clear. I branched into quite a few directions at once.

        The goal was to pad all numbers in a document with spaces or zeros so they would sort correctly.

        Instead of writing a clip, I thought it might be done in one mighty regexp replace (777->000777 and 77->000077) but that would involve calculating lengths and taking decimal points into consideration so I don't see how that can be done. OR CAN IT?

        I DID want to learn how to pick out just the numbers and Sheri seems to have hit on that perfectly with
        (?:^|\s)\K[\+\-]?[0-9,\.]+(?=\s|$)
        which picks out just the numbers so I can manipulate the selected text in a clip. (Thanks again, Sheri!)

        The lines containing FILE were from actual data where as the rest was just a made up assortment of numbers I was experimenting with.

        Regexps are one of the most useful things I've stumbled upon in years but SO frustrating. I'll keep trying and I really do appreciate the help from everyone.

        Joy


        --- In ntb-scripts@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
        >
        > After reading through this several times, I could not determine the actual goal. To get good assistance, you should
        > provide the starting data, the result that is wanted, and the rules you want, as well as identifying any data you don't
        > want.
        >
        > So, is the goal to sort? By numbers only? Padded numbers? Or is the actual data not of interest in your message? It
        > seems that the first part of the message and the last part are not on the same subject to me.
        >
        > Regards,
        > John
        >
        >
        > From: ntb-scripts@yahoogroups.com [mailto:ntb-scripts@yahoogroups.com] On Behalf Of mycroftj
        > Sent: Monday, March 28, 2011 16:10
        > To: ntb-scripts@yahoogroups.com
        > Subject: [NTS] Trying to perfect RegExp to match various numbers
        >
        >
        > My ultimate goal was to create a script that (left) space or zero pads numbers to a fixed length for sorting.
        >
        > Actual data looks something like
        >
        > File 67817 Id.ppt
        > File 691037 20dat.sys
        > File 69870 Lock.doc
        > File 705 56968.mbs
        > File 70537 Jil.xls
        > File 71168 Gas.jpg
        >
        > I then became interested in trying to find a regexp that will match numbers surrounded by BOL, EOL, spaces and tabs with
        > signs, decimal point and commas optional.
        >
        > In the following test data. the numbers starting with 123 should be matched as well as the integers 0, 1 and 2.
        > xxx456 and 456xx should NOT be matched.
        >
        > I have something that mostly works but it also matches x456 and the t2 in hmt2. WHY IS THAT?
        > It also misses 0, 1 and 2 although it does (correctly) pick up -2.
        >
        > I put in the caret because it was not matching numbers at the start of a line. Is that how it's done?
        >
        > Thanks for your help. I ordered the Regular Expressions Cookbook today. Hope it's as good as the reviews say!
        >
        > Joy
        >
        > What I have so far [^\s][\-\+]?\d+,*\d*\.?\d*(?=\s)
        >
        >
        > 12345
        > 12345.678
        > 1234567.90
        > xxx456
        > 456xx
        >
        > xx456,123.34
        > www.45.67.hmt2
        >
        > 12,345
        > -12,345
        > 12,345.
        > +12,345.01
        >
        > 12345
        > +123,45.678
        > 1234567.90
        >
        > there are 0 lines
        > 1 or 2 more.
        >
        > 12345
        > -12345.678
        > 1234567.90
        > xxx456
        > +456xx
        >
        > xx456.34
        > www.45.67.hmt2
        >
        > +12345
        >
        > 12345
        > -12345.678
        > 1234567.90
        > xxx456
        > 456xx
        >
        > there are 0 lines. The zero should match as should the following one and negative two.
        > 1 or -2 more.
        >
        >
        >
        > [Non-text portions of this message have been removed]
        >
      • Eb
        I recall a post by Diodeom in the Clips group, with a bit of razzle-dazzle, that might could do what you want. Perhpas Dio would know what I m talking about? I
        Message 3 of 14 , Mar 30, 2011
        • 0 Attachment
          I recall a post by Diodeom in the Clips group, with a bit of razzle-dazzle, that might could do what you want. Perhpas Dio would know what I'm talking about?

          I do not remember the topic, but I believe it had to do with sorting a table of numbers, numerically, even though the numbers were left-justified.

          Cheers

          --- In ntb-scripts@yahoogroups.com, "mycroftj" <mycroftj@...> wrote:
          >
          > I'm terribly sorry for not being clear. I branched into quite a few directions at once.
          >
          > The goal was to pad all numbers in a document with spaces or zeros so they would sort correctly.
          >
          > Instead of writing a clip, I thought it might be done in one mighty regexp replace (777->000777 and 77->000077) but that would involve calculating lengths and taking decimal points into consideration so I don't see how that can be done. OR CAN IT?
          >
          > I DID want to learn how to pick out just the numbers and Sheri seems to have hit on that perfectly with
          > (?:^|\s)\K[\+\-]?[0-9,\.]+(?=\s|$)
          > which picks out just the numbers so I can manipulate the selected text in a clip. (Thanks again, Sheri!)
          >
          > The lines containing FILE were from actual data where as the rest was just a made up assortment of numbers I was experimenting with.
          >
          > Regexps are one of the most useful things I've stumbled upon in years but SO frustrating. I'll keep trying and I really do appreciate the help from everyone.
          >
          > Joy
          >
          >
          > --- In ntb-scripts@yahoogroups.com, "John Shotsky" <jshotsky@> wrote:
          > >
          > > After reading through this several times, I could not determine the actual goal. To get good assistance, you should
          > > provide the starting data, the result that is wanted, and the rules you want, as well as identifying any data you don't
          > > want.
          > >
          > > So, is the goal to sort? By numbers only? Padded numbers? Or is the actual data not of interest in your message? It
          > > seems that the first part of the message and the last part are not on the same subject to me.
          > >
          > > Regards,
          > > John
          > >
          > >
          > > From: ntb-scripts@yahoogroups.com [mailto:ntb-scripts@yahoogroups.com] On Behalf Of mycroftj
          > > Sent: Monday, March 28, 2011 16:10
          > > To: ntb-scripts@yahoogroups.com
          > > Subject: [NTS] Trying to perfect RegExp to match various numbers
          > >
          > >
          > > My ultimate goal was to create a script that (left) space or zero pads numbers to a fixed length for sorting.
          > >
          > > Actual data looks something like
          > >
          > > File 67817 Id.ppt
          > > File 691037 20dat.sys
          > > File 69870 Lock.doc
          > > File 705 56968.mbs
          > > File 70537 Jil.xls
          > > File 71168 Gas.jpg
          > >
          > > I then became interested in trying to find a regexp that will match numbers surrounded by BOL, EOL, spaces and tabs with
          > > signs, decimal point and commas optional.
          > >
          > > In the following test data. the numbers starting with 123 should be matched as well as the integers 0, 1 and 2.
          > > xxx456 and 456xx should NOT be matched.
          > >
          > > I have something that mostly works but it also matches x456 and the t2 in hmt2. WHY IS THAT?
          > > It also misses 0, 1 and 2 although it does (correctly) pick up -2.
          > >
          > > I put in the caret because it was not matching numbers at the start of a line. Is that how it's done?
          > >
          > > Thanks for your help. I ordered the Regular Expressions Cookbook today. Hope it's as good as the reviews say!
          > >
          > > Joy
          > >
          > > What I have so far [^\s][\-\+]?\d+,*\d*\.?\d*(?=\s)
          > >
          > >
          > > 12345
          > > 12345.678
          > > 1234567.90
          > > xxx456
          > > 456xx
          > >
          > > xx456,123.34
          > > www.45.67.hmt2
          > >
          > > 12,345
          > > -12,345
          > > 12,345.
          > > +12,345.01
          > >
          > > 12345
          > > +123,45.678
          > > 1234567.90
          > >
          > > there are 0 lines
          > > 1 or 2 more.
          > >
          > > 12345
          > > -12345.678
          > > 1234567.90
          > > xxx456
          > > +456xx
          > >
          > > xx456.34
          > > www.45.67.hmt2
          > >
          > > +12345
          > >
          > > 12345
          > > -12345.678
          > > 1234567.90
          > > xxx456
          > > 456xx
          > >
          > > there are 0 lines. The zero should match as should the following one and negative two.
          > > 1 or -2 more.
          > >
          > >
          > >
          > > [Non-text portions of this message have been removed]
          > >
          >
        • Alec Burgess
          cc ntb-clips (see note at end) ... Following will enforce 5 digits (zero-padded) before optional decimal and 4 after H=test B3-30 leading / trailing zeros ;
          Message 4 of 14 , Mar 30, 2011
          • 0 Attachment
            cc ntb-clips (see note at end)

            On 2011-03-30 15:21, mycroftj wrote:
            > I'm terribly sorry for not being clear. I branched into quite a few
            > directions at once.
            >
            > The goal was to pad all numbers in a document with spaces or zeros so
            > they would sort correctly.
            >
            > Instead of writing a clip, I thought it might be done in one mighty
            > regexp replace (777->000777 and 77->000077) but that would involve
            > calculating lengths and taking decimal points into consideration so I
            > don't see how that can be done. OR CAN IT?
            >
            >
            Following will enforce 5 digits (zero-padded) before optional decimal
            and 4 after

            H=test B3-30 leading / trailing zeros
            ; Alec Burgess 2011-03-30
            ; currently enforces 5 digits before (optional) decimal and 4 after
            ^!replace "\b(\d+)\.?(\d*)\b" >> "00000$1.$2===0000" rwais
            ^!replace "===" >> "" rwais
            ^!replace "\b0*(\d{5})\.(\d{4})0*\b" >> "$1.$2" rwais

            Note - I wanted the first replace to be just "00000$1.$20000" but I
            haven't figured out how to prevent clip replace from confusing $2 with
            $20 - (ie. non-existent 20th or 20000th sub-pattern.

            Does anyone know how to do this? As is just make sure "===" is any
            string which does not exist in the input.

            Note - adding line ^!replace "\.0*\b" >> "" rwais to above will
            eliminate decimal padding after unnecessary decimal point.

            sample input
            1
            123
            123.
            123.1
            123.123

            resulting output
            00001.0000
            00123.0000
            00123.0000.
            00123.1000
            00123.1230

            btw: ntb-clips group would be a better place for this discussion that
            ntb-scripts. As originally intended ntb-scripts was for discussion of
            things like using Perl and JavaScript in clip code. Its readership is
            much less than the ntb-clips though I assume everyone who follows
            ntb-scripts also follows ntb-clips :-)

            Regards ... Alec (buralex@gmail& WinLiveMess - alec.m.burgess@skype)
          • Eb
            Alec, I m not sure this will work, but the variable ought to break up the output pattern: ^!replace b( d+) .?( d*) b 00000$1.$2^%empty%0000 rwais
            Message 5 of 14 , Apr 1 1:08 PM
            • 0 Attachment
              Alec,

              I'm not sure this will work, but the variable ought to break up the output pattern:

              ^!replace "\b(\d+)\.?(\d*)\b" >> "00000$1.$2^%empty%0000" rwais


              Cheers


              Eb

              --- In ntb-scripts@yahoogroups.com, Alec Burgess <buralex@...> wrote:
              ...
              > Does anyone know how to do this? As is just make sure "===" is any
              > string which does not exist in the input.
            • Alec Burgess
              ... It does allow the $2 to be substituted but ^%empty% does not appear to get translated. I get results like this: 45.6 == 0000045.6^%empty%0000 -- Regards
              Message 6 of 14 , Apr 1 2:45 PM
              • 0 Attachment
                On 2011-04-01 16:08, Eb wrote:
                >
                >
                > I'm not sure this will work, but the variable ought to break up the
                > output pattern:
                >
                > ^!replace "\b(\d+)\.?(\d*)\b" >> "00000$1.$2^%empty%0000" rwais
                It does allow the $2 to be substituted but ^%empty% does not appear to
                get translated.
                I get results like this:
                45.6 ==> 0000045.6^%empty%0000
                --
                Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)
              • Eb
                Ok, try this (hex code x30 for the first zero): $2 x30000 Eb
                Message 7 of 14 , Apr 4 6:18 AM
                • 0 Attachment
                  Ok, try this (hex code '\x30' for the first zero):

                  $2\x30000

                  Eb


                  --- In ntb-scripts@yahoogroups.com, Alec Burgess <buralex@...> wrote:
                  >
                  > > ^!replace "\b(\d+)\.?(\d*)\b" >> "00000$1.$2^%empty%0000" rwais
                  > It does allow the $2 to be substituted but ^%empty% does not appear to
                  > get translated.
                  > I get results like this:
                  > 45.6 ==> 0000045.6^%empty%0000
                  > --
                  > Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)
                  >
                • Alec Burgess
                  Thanks Eb - x30 works. when I was messing around with this I had tried the same thing but realize now that I was trying (the meaningless) uppercase X30
                  Message 8 of 14 , Apr 4 2:07 PM
                  • 0 Attachment
                    Thanks Eb - \x30 works.
                    when I was messing around with this I had tried the same thing but
                    realize now that I was trying (the meaningless) uppercase \X30 instead
                    of the correct \x30. Ooops ! :-[

                    On 2011-04-04 09:18, Eb wrote:
                    > Ok, try this (hex code '\x30' for the first zero):
                    >
                    > $2\x30000
                    >
                    > Eb
                    >
                    > --- In ntb-scripts@yahoogroups.com
                    > <mailto:ntb-scripts%40yahoogroups.com>, Alec Burgess <buralex@...> wrote:
                    > >
                    > > > ^!replace "\b(\d+)\.?(\d*)\b" >> "00000$1.$2^%empty%0000" rwais
                    > > It does allow the $2 to be substituted but ^%empty% does not appear to
                    > > get translated.
                    > > I get results like this:
                    > > 45.6 ==> 0000045.6^%empty%0000

                    --
                    Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)


                    [Non-text portions of this message have been removed]
                  Your message has been successfully submitted and would be delivered to recipients shortly.