Loading ...
Sorry, an error occurred while loading the content.

Re: [NTS] Trying to perfect RegExp to match various numbers

Expand Messages
  • mycroftj
    I m terribly sorry for not being clear. I branched into quite a few directions at once. The goal was to pad all numbers in a document with spaces or zeros so
    Message 1 of 14 , Mar 30, 2011
    • 0 Attachment
      I'm terribly sorry for not being clear. I branched into quite a few directions at once.

      The goal was to pad all numbers in a document with spaces or zeros so they would sort correctly.

      Instead of writing a clip, I thought it might be done in one mighty regexp replace (777->000777 and 77->000077) but that would involve calculating lengths and taking decimal points into consideration so I don't see how that can be done. OR CAN IT?

      I DID want to learn how to pick out just the numbers and Sheri seems to have hit on that perfectly with
      (?:^|\s)\K[\+\-]?[0-9,\.]+(?=\s|$)
      which picks out just the numbers so I can manipulate the selected text in a clip. (Thanks again, Sheri!)

      The lines containing FILE were from actual data where as the rest was just a made up assortment of numbers I was experimenting with.

      Regexps are one of the most useful things I've stumbled upon in years but SO frustrating. I'll keep trying and I really do appreciate the help from everyone.

      Joy


      --- In ntb-scripts@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
      >
      > After reading through this several times, I could not determine the actual goal. To get good assistance, you should
      > provide the starting data, the result that is wanted, and the rules you want, as well as identifying any data you don't
      > want.
      >
      > So, is the goal to sort? By numbers only? Padded numbers? Or is the actual data not of interest in your message? It
      > seems that the first part of the message and the last part are not on the same subject to me.
      >
      > Regards,
      > John
      >
      >
      > From: ntb-scripts@yahoogroups.com [mailto:ntb-scripts@yahoogroups.com] On Behalf Of mycroftj
      > Sent: Monday, March 28, 2011 16:10
      > To: ntb-scripts@yahoogroups.com
      > Subject: [NTS] Trying to perfect RegExp to match various numbers
      >
      >
      > My ultimate goal was to create a script that (left) space or zero pads numbers to a fixed length for sorting.
      >
      > Actual data looks something like
      >
      > File 67817 Id.ppt
      > File 691037 20dat.sys
      > File 69870 Lock.doc
      > File 705 56968.mbs
      > File 70537 Jil.xls
      > File 71168 Gas.jpg
      >
      > I then became interested in trying to find a regexp that will match numbers surrounded by BOL, EOL, spaces and tabs with
      > signs, decimal point and commas optional.
      >
      > In the following test data. the numbers starting with 123 should be matched as well as the integers 0, 1 and 2.
      > xxx456 and 456xx should NOT be matched.
      >
      > I have something that mostly works but it also matches x456 and the t2 in hmt2. WHY IS THAT?
      > It also misses 0, 1 and 2 although it does (correctly) pick up -2.
      >
      > I put in the caret because it was not matching numbers at the start of a line. Is that how it's done?
      >
      > Thanks for your help. I ordered the Regular Expressions Cookbook today. Hope it's as good as the reviews say!
      >
      > Joy
      >
      > What I have so far [^\s][\-\+]?\d+,*\d*\.?\d*(?=\s)
      >
      >
      > 12345
      > 12345.678
      > 1234567.90
      > xxx456
      > 456xx
      >
      > xx456,123.34
      > www.45.67.hmt2
      >
      > 12,345
      > -12,345
      > 12,345.
      > +12,345.01
      >
      > 12345
      > +123,45.678
      > 1234567.90
      >
      > there are 0 lines
      > 1 or 2 more.
      >
      > 12345
      > -12345.678
      > 1234567.90
      > xxx456
      > +456xx
      >
      > xx456.34
      > www.45.67.hmt2
      >
      > +12345
      >
      > 12345
      > -12345.678
      > 1234567.90
      > xxx456
      > 456xx
      >
      > there are 0 lines. The zero should match as should the following one and negative two.
      > 1 or -2 more.
      >
      >
      >
      > [Non-text portions of this message have been removed]
      >
    • Eb
      I recall a post by Diodeom in the Clips group, with a bit of razzle-dazzle, that might could do what you want. Perhpas Dio would know what I m talking about? I
      Message 2 of 14 , Mar 30, 2011
      • 0 Attachment
        I recall a post by Diodeom in the Clips group, with a bit of razzle-dazzle, that might could do what you want. Perhpas Dio would know what I'm talking about?

        I do not remember the topic, but I believe it had to do with sorting a table of numbers, numerically, even though the numbers were left-justified.

        Cheers

        --- In ntb-scripts@yahoogroups.com, "mycroftj" <mycroftj@...> wrote:
        >
        > I'm terribly sorry for not being clear. I branched into quite a few directions at once.
        >
        > The goal was to pad all numbers in a document with spaces or zeros so they would sort correctly.
        >
        > Instead of writing a clip, I thought it might be done in one mighty regexp replace (777->000777 and 77->000077) but that would involve calculating lengths and taking decimal points into consideration so I don't see how that can be done. OR CAN IT?
        >
        > I DID want to learn how to pick out just the numbers and Sheri seems to have hit on that perfectly with
        > (?:^|\s)\K[\+\-]?[0-9,\.]+(?=\s|$)
        > which picks out just the numbers so I can manipulate the selected text in a clip. (Thanks again, Sheri!)
        >
        > The lines containing FILE were from actual data where as the rest was just a made up assortment of numbers I was experimenting with.
        >
        > Regexps are one of the most useful things I've stumbled upon in years but SO frustrating. I'll keep trying and I really do appreciate the help from everyone.
        >
        > Joy
        >
        >
        > --- In ntb-scripts@yahoogroups.com, "John Shotsky" <jshotsky@> wrote:
        > >
        > > After reading through this several times, I could not determine the actual goal. To get good assistance, you should
        > > provide the starting data, the result that is wanted, and the rules you want, as well as identifying any data you don't
        > > want.
        > >
        > > So, is the goal to sort? By numbers only? Padded numbers? Or is the actual data not of interest in your message? It
        > > seems that the first part of the message and the last part are not on the same subject to me.
        > >
        > > Regards,
        > > John
        > >
        > >
        > > From: ntb-scripts@yahoogroups.com [mailto:ntb-scripts@yahoogroups.com] On Behalf Of mycroftj
        > > Sent: Monday, March 28, 2011 16:10
        > > To: ntb-scripts@yahoogroups.com
        > > Subject: [NTS] Trying to perfect RegExp to match various numbers
        > >
        > >
        > > My ultimate goal was to create a script that (left) space or zero pads numbers to a fixed length for sorting.
        > >
        > > Actual data looks something like
        > >
        > > File 67817 Id.ppt
        > > File 691037 20dat.sys
        > > File 69870 Lock.doc
        > > File 705 56968.mbs
        > > File 70537 Jil.xls
        > > File 71168 Gas.jpg
        > >
        > > I then became interested in trying to find a regexp that will match numbers surrounded by BOL, EOL, spaces and tabs with
        > > signs, decimal point and commas optional.
        > >
        > > In the following test data. the numbers starting with 123 should be matched as well as the integers 0, 1 and 2.
        > > xxx456 and 456xx should NOT be matched.
        > >
        > > I have something that mostly works but it also matches x456 and the t2 in hmt2. WHY IS THAT?
        > > It also misses 0, 1 and 2 although it does (correctly) pick up -2.
        > >
        > > I put in the caret because it was not matching numbers at the start of a line. Is that how it's done?
        > >
        > > Thanks for your help. I ordered the Regular Expressions Cookbook today. Hope it's as good as the reviews say!
        > >
        > > Joy
        > >
        > > What I have so far [^\s][\-\+]?\d+,*\d*\.?\d*(?=\s)
        > >
        > >
        > > 12345
        > > 12345.678
        > > 1234567.90
        > > xxx456
        > > 456xx
        > >
        > > xx456,123.34
        > > www.45.67.hmt2
        > >
        > > 12,345
        > > -12,345
        > > 12,345.
        > > +12,345.01
        > >
        > > 12345
        > > +123,45.678
        > > 1234567.90
        > >
        > > there are 0 lines
        > > 1 or 2 more.
        > >
        > > 12345
        > > -12345.678
        > > 1234567.90
        > > xxx456
        > > +456xx
        > >
        > > xx456.34
        > > www.45.67.hmt2
        > >
        > > +12345
        > >
        > > 12345
        > > -12345.678
        > > 1234567.90
        > > xxx456
        > > 456xx
        > >
        > > there are 0 lines. The zero should match as should the following one and negative two.
        > > 1 or -2 more.
        > >
        > >
        > >
        > > [Non-text portions of this message have been removed]
        > >
        >
      • Alec Burgess
        cc ntb-clips (see note at end) ... Following will enforce 5 digits (zero-padded) before optional decimal and 4 after H=test B3-30 leading / trailing zeros ;
        Message 3 of 14 , Mar 30, 2011
        • 0 Attachment
          cc ntb-clips (see note at end)

          On 2011-03-30 15:21, mycroftj wrote:
          > I'm terribly sorry for not being clear. I branched into quite a few
          > directions at once.
          >
          > The goal was to pad all numbers in a document with spaces or zeros so
          > they would sort correctly.
          >
          > Instead of writing a clip, I thought it might be done in one mighty
          > regexp replace (777->000777 and 77->000077) but that would involve
          > calculating lengths and taking decimal points into consideration so I
          > don't see how that can be done. OR CAN IT?
          >
          >
          Following will enforce 5 digits (zero-padded) before optional decimal
          and 4 after

          H=test B3-30 leading / trailing zeros
          ; Alec Burgess 2011-03-30
          ; currently enforces 5 digits before (optional) decimal and 4 after
          ^!replace "\b(\d+)\.?(\d*)\b" >> "00000$1.$2===0000" rwais
          ^!replace "===" >> "" rwais
          ^!replace "\b0*(\d{5})\.(\d{4})0*\b" >> "$1.$2" rwais

          Note - I wanted the first replace to be just "00000$1.$20000" but I
          haven't figured out how to prevent clip replace from confusing $2 with
          $20 - (ie. non-existent 20th or 20000th sub-pattern.

          Does anyone know how to do this? As is just make sure "===" is any
          string which does not exist in the input.

          Note - adding line ^!replace "\.0*\b" >> "" rwais to above will
          eliminate decimal padding after unnecessary decimal point.

          sample input
          1
          123
          123.
          123.1
          123.123

          resulting output
          00001.0000
          00123.0000
          00123.0000.
          00123.1000
          00123.1230

          btw: ntb-clips group would be a better place for this discussion that
          ntb-scripts. As originally intended ntb-scripts was for discussion of
          things like using Perl and JavaScript in clip code. Its readership is
          much less than the ntb-clips though I assume everyone who follows
          ntb-scripts also follows ntb-clips :-)

          Regards ... Alec (buralex@gmail& WinLiveMess - alec.m.burgess@skype)
        • Eb
          Alec, I m not sure this will work, but the variable ought to break up the output pattern: ^!replace b( d+) .?( d*) b 00000$1.$2^%empty%0000 rwais
          Message 4 of 14 , Apr 1 1:08 PM
          • 0 Attachment
            Alec,

            I'm not sure this will work, but the variable ought to break up the output pattern:

            ^!replace "\b(\d+)\.?(\d*)\b" >> "00000$1.$2^%empty%0000" rwais


            Cheers


            Eb

            --- In ntb-scripts@yahoogroups.com, Alec Burgess <buralex@...> wrote:
            ...
            > Does anyone know how to do this? As is just make sure "===" is any
            > string which does not exist in the input.
          • Alec Burgess
            ... It does allow the $2 to be substituted but ^%empty% does not appear to get translated. I get results like this: 45.6 == 0000045.6^%empty%0000 -- Regards
            Message 5 of 14 , Apr 1 2:45 PM
            • 0 Attachment
              On 2011-04-01 16:08, Eb wrote:
              >
              >
              > I'm not sure this will work, but the variable ought to break up the
              > output pattern:
              >
              > ^!replace "\b(\d+)\.?(\d*)\b" >> "00000$1.$2^%empty%0000" rwais
              It does allow the $2 to be substituted but ^%empty% does not appear to
              get translated.
              I get results like this:
              45.6 ==> 0000045.6^%empty%0000
              --
              Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)
            • Eb
              Ok, try this (hex code x30 for the first zero): $2 x30000 Eb
              Message 6 of 14 , Apr 4 6:18 AM
              • 0 Attachment
                Ok, try this (hex code '\x30' for the first zero):

                $2\x30000

                Eb


                --- In ntb-scripts@yahoogroups.com, Alec Burgess <buralex@...> wrote:
                >
                > > ^!replace "\b(\d+)\.?(\d*)\b" >> "00000$1.$2^%empty%0000" rwais
                > It does allow the $2 to be substituted but ^%empty% does not appear to
                > get translated.
                > I get results like this:
                > 45.6 ==> 0000045.6^%empty%0000
                > --
                > Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)
                >
              • Alec Burgess
                Thanks Eb - x30 works. when I was messing around with this I had tried the same thing but realize now that I was trying (the meaningless) uppercase X30
                Message 7 of 14 , Apr 4 2:07 PM
                • 0 Attachment
                  Thanks Eb - \x30 works.
                  when I was messing around with this I had tried the same thing but
                  realize now that I was trying (the meaningless) uppercase \X30 instead
                  of the correct \x30. Ooops ! :-[

                  On 2011-04-04 09:18, Eb wrote:
                  > Ok, try this (hex code '\x30' for the first zero):
                  >
                  > $2\x30000
                  >
                  > Eb
                  >
                  > --- In ntb-scripts@yahoogroups.com
                  > <mailto:ntb-scripts%40yahoogroups.com>, Alec Burgess <buralex@...> wrote:
                  > >
                  > > > ^!replace "\b(\d+)\.?(\d*)\b" >> "00000$1.$2^%empty%0000" rwais
                  > > It does allow the $2 to be substituted but ^%empty% does not appear to
                  > > get translated.
                  > > I get results like this:
                  > > 45.6 ==> 0000045.6^%empty%0000

                  --
                  Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)


                  [Non-text portions of this message have been removed]
                Your message has been successfully submitted and would be delivered to recipients shortly.