Loading ...
Sorry, an error occurred while loading the content.

Re: [NTS] Trying to perfect RegExp to match various numbers

Expand Messages
  • Eb
    I recall a post by Diodeom in the Clips group, with a bit of razzle-dazzle, that might could do what you want. Perhpas Dio would know what I m talking about? I
    Message 1 of 14 , Mar 30 2:42 PM
    • 0 Attachment
      I recall a post by Diodeom in the Clips group, with a bit of razzle-dazzle, that might could do what you want. Perhpas Dio would know what I'm talking about?

      I do not remember the topic, but I believe it had to do with sorting a table of numbers, numerically, even though the numbers were left-justified.

      Cheers

      --- In ntb-scripts@yahoogroups.com, "mycroftj" <mycroftj@...> wrote:
      >
      > I'm terribly sorry for not being clear. I branched into quite a few directions at once.
      >
      > The goal was to pad all numbers in a document with spaces or zeros so they would sort correctly.
      >
      > Instead of writing a clip, I thought it might be done in one mighty regexp replace (777->000777 and 77->000077) but that would involve calculating lengths and taking decimal points into consideration so I don't see how that can be done. OR CAN IT?
      >
      > I DID want to learn how to pick out just the numbers and Sheri seems to have hit on that perfectly with
      > (?:^|\s)\K[\+\-]?[0-9,\.]+(?=\s|$)
      > which picks out just the numbers so I can manipulate the selected text in a clip. (Thanks again, Sheri!)
      >
      > The lines containing FILE were from actual data where as the rest was just a made up assortment of numbers I was experimenting with.
      >
      > Regexps are one of the most useful things I've stumbled upon in years but SO frustrating. I'll keep trying and I really do appreciate the help from everyone.
      >
      > Joy
      >
      >
      > --- In ntb-scripts@yahoogroups.com, "John Shotsky" <jshotsky@> wrote:
      > >
      > > After reading through this several times, I could not determine the actual goal. To get good assistance, you should
      > > provide the starting data, the result that is wanted, and the rules you want, as well as identifying any data you don't
      > > want.
      > >
      > > So, is the goal to sort? By numbers only? Padded numbers? Or is the actual data not of interest in your message? It
      > > seems that the first part of the message and the last part are not on the same subject to me.
      > >
      > > Regards,
      > > John
      > >
      > >
      > > From: ntb-scripts@yahoogroups.com [mailto:ntb-scripts@yahoogroups.com] On Behalf Of mycroftj
      > > Sent: Monday, March 28, 2011 16:10
      > > To: ntb-scripts@yahoogroups.com
      > > Subject: [NTS] Trying to perfect RegExp to match various numbers
      > >
      > >
      > > My ultimate goal was to create a script that (left) space or zero pads numbers to a fixed length for sorting.
      > >
      > > Actual data looks something like
      > >
      > > File 67817 Id.ppt
      > > File 691037 20dat.sys
      > > File 69870 Lock.doc
      > > File 705 56968.mbs
      > > File 70537 Jil.xls
      > > File 71168 Gas.jpg
      > >
      > > I then became interested in trying to find a regexp that will match numbers surrounded by BOL, EOL, spaces and tabs with
      > > signs, decimal point and commas optional.
      > >
      > > In the following test data. the numbers starting with 123 should be matched as well as the integers 0, 1 and 2.
      > > xxx456 and 456xx should NOT be matched.
      > >
      > > I have something that mostly works but it also matches x456 and the t2 in hmt2. WHY IS THAT?
      > > It also misses 0, 1 and 2 although it does (correctly) pick up -2.
      > >
      > > I put in the caret because it was not matching numbers at the start of a line. Is that how it's done?
      > >
      > > Thanks for your help. I ordered the Regular Expressions Cookbook today. Hope it's as good as the reviews say!
      > >
      > > Joy
      > >
      > > What I have so far [^\s][\-\+]?\d+,*\d*\.?\d*(?=\s)
      > >
      > >
      > > 12345
      > > 12345.678
      > > 1234567.90
      > > xxx456
      > > 456xx
      > >
      > > xx456,123.34
      > > www.45.67.hmt2
      > >
      > > 12,345
      > > -12,345
      > > 12,345.
      > > +12,345.01
      > >
      > > 12345
      > > +123,45.678
      > > 1234567.90
      > >
      > > there are 0 lines
      > > 1 or 2 more.
      > >
      > > 12345
      > > -12345.678
      > > 1234567.90
      > > xxx456
      > > +456xx
      > >
      > > xx456.34
      > > www.45.67.hmt2
      > >
      > > +12345
      > >
      > > 12345
      > > -12345.678
      > > 1234567.90
      > > xxx456
      > > 456xx
      > >
      > > there are 0 lines. The zero should match as should the following one and negative two.
      > > 1 or -2 more.
      > >
      > >
      > >
      > > [Non-text portions of this message have been removed]
      > >
      >
    • Alec Burgess
      cc ntb-clips (see note at end) ... Following will enforce 5 digits (zero-padded) before optional decimal and 4 after H=test B3-30 leading / trailing zeros ;
      Message 2 of 14 , Mar 30 3:30 PM
      • 0 Attachment
        cc ntb-clips (see note at end)

        On 2011-03-30 15:21, mycroftj wrote:
        > I'm terribly sorry for not being clear. I branched into quite a few
        > directions at once.
        >
        > The goal was to pad all numbers in a document with spaces or zeros so
        > they would sort correctly.
        >
        > Instead of writing a clip, I thought it might be done in one mighty
        > regexp replace (777->000777 and 77->000077) but that would involve
        > calculating lengths and taking decimal points into consideration so I
        > don't see how that can be done. OR CAN IT?
        >
        >
        Following will enforce 5 digits (zero-padded) before optional decimal
        and 4 after

        H=test B3-30 leading / trailing zeros
        ; Alec Burgess 2011-03-30
        ; currently enforces 5 digits before (optional) decimal and 4 after
        ^!replace "\b(\d+)\.?(\d*)\b" >> "00000$1.$2===0000" rwais
        ^!replace "===" >> "" rwais
        ^!replace "\b0*(\d{5})\.(\d{4})0*\b" >> "$1.$2" rwais

        Note - I wanted the first replace to be just "00000$1.$20000" but I
        haven't figured out how to prevent clip replace from confusing $2 with
        $20 - (ie. non-existent 20th or 20000th sub-pattern.

        Does anyone know how to do this? As is just make sure "===" is any
        string which does not exist in the input.

        Note - adding line ^!replace "\.0*\b" >> "" rwais to above will
        eliminate decimal padding after unnecessary decimal point.

        sample input
        1
        123
        123.
        123.1
        123.123

        resulting output
        00001.0000
        00123.0000
        00123.0000.
        00123.1000
        00123.1230

        btw: ntb-clips group would be a better place for this discussion that
        ntb-scripts. As originally intended ntb-scripts was for discussion of
        things like using Perl and JavaScript in clip code. Its readership is
        much less than the ntb-clips though I assume everyone who follows
        ntb-scripts also follows ntb-clips :-)

        Regards ... Alec (buralex@gmail& WinLiveMess - alec.m.burgess@skype)
      • Eb
        Alec, I m not sure this will work, but the variable ought to break up the output pattern: ^!replace b( d+) .?( d*) b 00000$1.$2^%empty%0000 rwais
        Message 3 of 14 , Apr 1, 2011
        • 0 Attachment
          Alec,

          I'm not sure this will work, but the variable ought to break up the output pattern:

          ^!replace "\b(\d+)\.?(\d*)\b" >> "00000$1.$2^%empty%0000" rwais


          Cheers


          Eb

          --- In ntb-scripts@yahoogroups.com, Alec Burgess <buralex@...> wrote:
          ...
          > Does anyone know how to do this? As is just make sure "===" is any
          > string which does not exist in the input.
        • Alec Burgess
          ... It does allow the $2 to be substituted but ^%empty% does not appear to get translated. I get results like this: 45.6 == 0000045.6^%empty%0000 -- Regards
          Message 4 of 14 , Apr 1, 2011
          • 0 Attachment
            On 2011-04-01 16:08, Eb wrote:
            >
            >
            > I'm not sure this will work, but the variable ought to break up the
            > output pattern:
            >
            > ^!replace "\b(\d+)\.?(\d*)\b" >> "00000$1.$2^%empty%0000" rwais
            It does allow the $2 to be substituted but ^%empty% does not appear to
            get translated.
            I get results like this:
            45.6 ==> 0000045.6^%empty%0000
            --
            Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)
          • Eb
            Ok, try this (hex code x30 for the first zero): $2 x30000 Eb
            Message 5 of 14 , Apr 4, 2011
            • 0 Attachment
              Ok, try this (hex code '\x30' for the first zero):

              $2\x30000

              Eb


              --- In ntb-scripts@yahoogroups.com, Alec Burgess <buralex@...> wrote:
              >
              > > ^!replace "\b(\d+)\.?(\d*)\b" >> "00000$1.$2^%empty%0000" rwais
              > It does allow the $2 to be substituted but ^%empty% does not appear to
              > get translated.
              > I get results like this:
              > 45.6 ==> 0000045.6^%empty%0000
              > --
              > Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)
              >
            • Alec Burgess
              Thanks Eb - x30 works. when I was messing around with this I had tried the same thing but realize now that I was trying (the meaningless) uppercase X30
              Message 6 of 14 , Apr 4, 2011
              • 0 Attachment
                Thanks Eb - \x30 works.
                when I was messing around with this I had tried the same thing but
                realize now that I was trying (the meaningless) uppercase \X30 instead
                of the correct \x30. Ooops ! :-[

                On 2011-04-04 09:18, Eb wrote:
                > Ok, try this (hex code '\x30' for the first zero):
                >
                > $2\x30000
                >
                > Eb
                >
                > --- In ntb-scripts@yahoogroups.com
                > <mailto:ntb-scripts%40yahoogroups.com>, Alec Burgess <buralex@...> wrote:
                > >
                > > > ^!replace "\b(\d+)\.?(\d*)\b" >> "00000$1.$2^%empty%0000" rwais
                > > It does allow the $2 to be substituted but ^%empty% does not appear to
                > > get translated.
                > > I get results like this:
                > > 45.6 ==> 0000045.6^%empty%0000

                --
                Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)


                [Non-text portions of this message have been removed]
              Your message has been successfully submitted and would be delivered to recipients shortly.