Loading ...
Sorry, an error occurred while loading the content.

RE: [Clip] Re: Find Command

Expand Messages
  • John Shotsky
    Yes, you re right of course - I use it everywhere, and was THINKING about w instead of b! Regards, John From: ntb-clips@yahoogroups.com
    Message 1 of 23 , Jul 23, 2011
    • 0 Attachment
      Yes, you're right of course - I use it everywhere, and was THINKING about \w instead of \b!

      Regards,
      John

      From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of flo.gehrke
      Sent: Saturday, July 23, 2011 04:01
      To: ntb-clips@yahoogroups.com
      Subject: [Clip] Re: Find Command


      --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> , "John Shotsky" <jshotsky@...> wrote:
      >
      > Lucas,
      > \b simply means not a letter or number. It can be any
      > punctuation, space, CR, etc. Any character except a letter or
      > number.
      >
      Sorry, John. This is probably not the best definition and could seriously confuse a beginner.

      '\b' is an assertion that doesn't represent any character but signifies a word border, i.e., a positon of zero length
      where a word character is preceded or followed by a non-word character. It never "can be any punctuation, space, CR,
      etc".

      For example: The pattern '99\b' when run against '99!' will match '99' but not the exclamation mark '!'.

      '99!\b' when matched against '99!.' (note the space after '!') will match nothing at all -- neither the '99' nor any
      punctuation -- because there is no word border between '!' and the space. Patterns like that could lead to unintentional
      results (cf message #21679).

      Regards,
      Flo



      [Non-text portions of this message have been removed]
    • Lucas
      Hi Flo, Thanks for the explanation about b, now i get it. About what am i doing really: I m working on a project here to indent/organize my COBOL source code.
      Message 2 of 23 , Jul 23, 2011
      • 0 Attachment
        Hi Flo,

        Thanks for the explanation about \b, now i get it.

        About what am i doing really:

        I'm working on a project here to indent/organize my COBOL source code.
        Because COBOL is all about patterns for example:

        All the reserved words 'TO' of the code should be in column 40.
        All the reserved words 'DIVISION' of the code should be in column 40 also.

        And so on...
        The use of 'TO' is like this:

        MOVE WRK-VAR TO WRK-DISPLAY-VAR.

        But, there is possible that the programmer use the 'TO' to declare a variable like:

        WRK-VAR-TO or TO-VAR

        Those MUST NOT be aligned.

        And the DIVISION can have a . (dot) after it OR not.
        Like this:

        'PROCEDURE DIVISION.'
        'PROCEDURE DIVISION USING CP-VAR'.

        In both occasions the 'DIVISION' MUST be in column 40.

        And, This is what i have done, and they are working perfectly thanks to you guys:

        ^!Find "(\R| )(TO)(\R| )" TISR2

        ^!Find "(\R| )(DIVISION)(\R| |\.)" TISR2


        Now, \R represents line break right?

        For 'TO' Find it should also be used because there can be cases like this:

        ' MOVE WRK-VAR-VERY-VERY-VERY-BIG
        TO WRK-VAR-DISPLAY'

        This should be like this:

        ' MOVE WRK-VAR-VERY-VERY-VERY-BIG
        TO WRK-VAR-DISPLAY'

        But that wont happen in DIVISION, it should never be on the start of a line.

        Anyways, I made the whole code already to move the words to the right column, my doubt was only about the ^!Find command really.

        Thanks,
        Lucas.

        --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
        >
        > --- In ntb-clips@yahoogroups.com, "Lucas" <lucas.jfelix@> wrote:
        > >
        > > Hello again,
        > >
        > > Exactly John!
        > >
        > > The cenario i got here is...
        >
        >
        > Hi Lucas,
        >
        > I would like to understand why you try to ...
        >
        > > select the 'TO' but only when it is arround spaces and only
        > > the word 'TO', it shouldnt select the spaces aswell.
        >
        > Assuming that you browse a document and search that pattern, I wonder what makes the difference between selecting the 'TO' but not the enclosing spaces? Anyway, it's a match, isn't it?
        >
        > What's next after matching the 'TO'? What's the use of this proceeding?
        >
        > Probably, we could find another appropriate solution if you could shed some light on this question. Thanks...
        >
        > Flo
        >
      • Eb
        Lucas, Below is a comment on your matching to. , and a suggestion for some tests you might want to make. Sorry, but your requirements are confusing. You
        Message 3 of 23 , Jul 23, 2011
        • 0 Attachment
          Lucas,

          Below is a comment on your matching "to.", and a suggestion for some tests you might want to make.

          Sorry, but your requirements are confusing. You started out with the statement that matching "-TO-" instead of just "TO" was undesirable, then you settled for a search pattern, that would never match a "TO" inside hyphens. Furthermore, you later wondered why "\bto\b" would include the hyphen in a match (below).


          I suggest you paste the following string into a new document, and try the different search patterns proposed on the string, to see which matches you want, and don't want:

          The Plain search (non-regexp) does not find words adjacent to $ or #. This is either an esoteric feature, or a bug.

          !to @to %to ^to &to *to (to)to-to+to =to [to {to ]to }to |to \to ;to :to 'to "to,to.to /to <to >to ?to to

          Plain text search for WHOLE words misses the following (Feature? or Bug?)
          #to $to

          Both plain text and \bto\b miss the following:
          "_to "


          ^!Find "to" CIS
          ^!Find "\bto\b" RIS
          ^!Find "(\R| )(TO)(\R| )"RSTI2
          ;the last of these three find commands severely restricts what it can find. This is desirable, if you want to exclude all other delimiters around "TO".


          SCROLL DOWN

          --- In ntb-clips@yahoogroups.com, "Lucas" <lucas.jfelix@...> wrote:
          > The cenario i got here is:
          > Select the "TO" but only when it is arround spaces and only the word "TO", it shouldnt select the spaces aswell.


          > For some reason that i havent understand yet, ^!Find "\bTO\b" RIS
          > also selects "TO" when it is has dot (.) or (-) arround the TO!

          This implies, that your dot "." and hyphen "-" may be something other than the standard Ansi set.

          Character "." should have character code "46"
          Character "-" should have character code "45"

          Here is a little clip to test for the proper character code:

          H="Get Char Code"
          ^!Set %c%=^$GetSelection$
          ^!Set %d%=^$CharToDec(^%c%)$
          ^!Info [l]Character "^%c%" has character code "^%d%"


          If your character code comes out the same as above, then perhaps you should reinstall NoteTab. Sometimes some file NoteTab uses gets corrupted, and NoteTab develops funny quirks.


          Cheers


          Eb
        • flo.gehrke
          ... Lucas, Obviously, it s less complicated to deal with DIVISION . So let s have another look at TO . The clip... ^!Find ( R| )(TO)( R| ) TISR2 seems to
          Message 4 of 23 , Jul 23, 2011
          • 0 Attachment
            --- In ntb-clips@yahoogroups.com, "Lucas" <lucas.jfelix@...> wrote:
            >
            > Hi Flo,
            >
            > Thanks for the explanation about \b, now i get it.
            >
            > About what am i doing really:
            >
            > I'm working on a project here to indent/organize my COBOL
            > source code....

            Lucas,

            Obviously, it's less complicated to deal with 'DIVISION'. So let's have another look at 'TO'.

            The clip...

            ^!Find "(\R| )(TO)(\R| )" TISR2

            seems to be OK for you. Nevertheless, you may consider the following clip (to make spaces more visible they are written in Hex \x20)...

            ^!Find "(\n|\x20)(TO)\x20" SR2

            According with your explanations, the '\R' in the third parentheses seems to be inconsistent since you are searching a 'TO' that is enclosed in spaces. So it will never be followed immediately by a CRNL. Certainly, you could omit that '\R'. You could even omit the third parentheses at all since, with the '2' option, you select the second substring only.

            I wonder why you choose the 'I' option. If 'TO' is always written in upper case letters you shouldn't ignore the case.

            The 'T' option is unnecessary here because you don't want to find 'TO' within longer words but as a whole word only. Also, it makes no sense to combine RegEx and 'T' because a RegEx doesn't match whole words only anyway unless you define word borders.

            Note that the RegEx wouldn't match a 'TO' at the start of the subject string because that position is not preceded by a CRNL or NL.

            So far, we've discussed how to find 'TO'. However, I still can't see how you indent the 'TO' to column 40 once you've found it. Are you doing this manually? Maybe a perfect clip could automate this with something like...

            ^!Jump Doc_Start
            :Search
            ^!Find "(\n|\x20)(TO)\x20" RS2
            ^!IfError End
            ; Next one long line
            ^!Replace "^$GetSelection$" >> "^$StrFill(^%Space%;40)$^$GetSelection$" HS
            ^!Goto Search

            Regards,
            Flo
          • flo.gehrke
            ... Eb, Thanks for this documentation! It s extremely important to note these problems. I can t see any rule behind this either. My vague impression is that
            Message 5 of 23 , Jul 24, 2011
            • 0 Attachment
              --- In ntb-clips@yahoogroups.com, "Eb" <ebbtidalflats@...> wrote:
              >
              > Lucas,
              >
              > Below is a comment on your matching "to.", and a suggestion for some tests you might want to make.
              > (...)
              > The Plain search (non-regexp) does not find words adjacent
              > to $ or #. This is either an esoteric feature, or a bug.
              > (...)
              > Plain text search for WHOLE words misses the following (Feature?
              > or Bug?)...

              Eb,

              Thanks for this documentation! It's extremely important to note these problems.

              I can't see any rule behind this either. My vague impression is that the experts still cannot agree on defining word characters. For example...

              > Both plain text and \bto\b miss the following: _to "_to "

              '_' is matched with '\w', i.e. it's interpreted as a word character. So there is no word border between '_' and 'to', and, consequently, '\bto\b' doesn't match.

              Below you will find two "Test Whole Words" clips which might help to check conditions like that when creating clips.

              The first clip produces a table with test data like...

              045 -word
              046 .word
              047 /word

              It starts with the decimal values from ANSI 32 to 255 followed by a string that combines the ANSI character with 'word'.

              Having produced the table, run the second clip on that table. It prompts you to edit the search term 'word'. For example, add word borders '\bword\b' or use RegEx like '[[:punct:]]word'. But always choose a pattern that matches 'word' in any way! Next, edit the search options (R,I,C,T, etc).

              After OK, the clip will display what has been matched with your criteria and what has not been matched. For example, it confirms Eb's result that a plain (non-RegEx) search (options CS whole words only) will miss '#word' or '$word' but will match '%word' or '&word'.

              Regards,
              Flo


              First clip

              ; Create a table from ANSI 32 to 255
              ^!Set %Dec%=31
              :Start
              ^!Inc %Dec%
              ^!If ^%Dec% > 255 End
              ^!If ^%Dec% <= 99 Next Else Skip
              ^!Set %Dec%=0^%Dec%
              ^!InsertText ^%Dec%^%Space%^$DecToChar(^%Dec%)$word^%NL%
              ^!Goto Start

              Second clip (to be run on the ANSI table)

              ; Paste complete ANSI-table to clipboard
              ^!SetClipboard ^$GetText$
              ^!Jump Doc_Start
              ^!SetWizardWidth=70
              ^!SetWizardLabel Enter Search Criteria
              ^!Set %SearchStr%=^?{Edit Search String (in quotes):="word"}; %Options%=^?{Choose Options=RTCIS}
              ^!IfRegexOk "^%SearchStr%" Next Else Message

              :Search
              ^!Find "^%SearchStr%" ^%Options%
              ^!IfError Out
              ^!Set %Match%=^$GetLine$
              ^!Append %True%=^%Match%^%NL%
              ^!If ^$GetRow$=^$GetTextLineCount$ Out
              ^!Jump Select_End
              ^!Goto Search

              :Out
              ^!Toolbar New Document
              ^!InsertText ^%True%
              ^!Jump Doc_End
              ; Insert complete ANSI table
              ^!Paste
              ^!Select All
              ^$StrSort("^$GetSelection$";0;1;0)$
              ; Subtract ANSI list minus list %True% (=Not found)
              ^!Replace "(^[^\r\n]+)\r\n(\1)(\R|\Z)" >> "" WARS
              ; Output and format result
              ^!Jump Doc_Start
              ^!InsertText Not found^P^$StrFill(-;9)$^P
              ^!Jump Doc_End
              ^!InsertText ^P^PWords found^P^$StrFill(-;11)$^P^%True%
              ^!ClearVariable %True%
              ^!Goto End

              :Message
              ^!Info Error in RegEx
            • Eb
              Hi Flo, I ll put your clips in my RegExp test suite. Thanks, Eb
              Message 6 of 23 , Jul 27, 2011
              • 0 Attachment
                Hi Flo,

                I'll put your clips in my RegExp test suite.

                Thanks,


                Eb
              Your message has been successfully submitted and would be delivered to recipients shortly.