Loading ...
Sorry, an error occurred while loading the content.

Re: find a specific word along with surrounding words, etc.

Expand Messages
  • diodeom
    ... To account for possible capitalization, punctuation, dashed or apostrophized words, beginning- or end-of-sentence term s presence and its (presumably
    Message 1 of 6 , Sep 5, 2011
    • 0 Attachment
      "KenH" <kenfhill84083@...> wrote:
      >
      > I need to be able to find a word and its surrounding words, up to 3 in front and 3 following. For example, in the sentences below I'd like to search for 'aid' in the source file and get the results file, including the number at the beginning of the line. Any suggestions? (I once had a small smattering of NoteTab clip knowledge but I've let it lapse to practically zero now. I've spent several hours trying to bone up but so far only dismal failure. I searched this group but could not find what I need -- maybe my group searching is rusty too.)
      >
      > source file:
      > 1. Now all good people should come to the aid of their party.
      > 2. Can't we all just get along?
      > 3. First aid is important to know.
      >
      > result file:
      > 1. come to the aid of their party
      > 3. First aid is important to
      >

      To account for possible capitalization, punctuation, dashed or apostrophized words, beginning- or end-of-sentence term's presence and its (presumably disqualifying) mid-word instances (e.g. "said" or "aide"), as in:

      4. It's a well-meant aid; however, it seems rather futile.
      5. Aid them, and they'll quadruple their numbers of needy.
      6. Yeah, I said it with some conviction.
      7. Let the effin' bleeding-heart cavalry come to their aid.

      ...one (of many) alternatives for a pattern meeting your stated needs could be:

      ^(\d+\. ).*?((([\w'-]+)([\pP ]+)){0,3}\b[Aa]id\b)(((?5)(?4)){0,3})

      ...where the first set of parentheses ($1) captures the line number, dot and space; second outer set ($2) gets up to three words and their separators plus your sample term at word boundaries; and the third outer set ($6) looks for up to three words (by recycling in reverse order the subpatterns 4 and 5 of the second set of parens).

      In the following clips the %s%earch and %r%eplacement patterns are set apart as variables for clarity:

      ;(start long line)
      ^!Set %s%=^(\d+\. ).*?((([\w'-]+)([\pP ]+)){0,3}\b[Aa]id\b)(((?5)(?4)){0,3})
      ;(end long line)
      ^!Set %r%=$1$2$6
      ^!SetClipboard ^$GetDocListAll("^%s%";"^%r%\r\n")$
      ^!Toolbar Paste New

      You might find it much more appealing though to collect context chunks by setting the maximum number of characters -- instead of words -- trimmed at word boundaries before and after the term:

      ^!Set %s%=^(\d+\. ).*?((\b\w.{0,17})?\b[Aa]id\b)((.{0,17}\w\b)?)
      ^!Set %r%=$1$2$4
      ^!SetClipboard ^$GetDocListAll("^%s%";"^%r%\r\n")$
      ^!Toolbar Paste New
    • KenH
      Every time I see an elegant solution like this I am impressed. Works great. Thanks.
      Message 2 of 6 , Sep 5, 2011
      • 0 Attachment
        Every time I see an elegant solution like this I am impressed. Works great. Thanks.

        --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
        >
        >
        > To account for possible capitalization, punctuation, dashed or apostrophized words, beginning- or end-of-sentence term's presence and its (presumably disqualifying) mid-word instances (e.g. "said" or "aide"), as in:
        >
        > 4. It's a well-meant aid; however, it seems rather futile.
        > 5. Aid them, and they'll quadruple their numbers of needy.
        > 6. Yeah, I said it with some conviction.
        > 7. Let the effin' bleeding-heart cavalry come to their aid.
        >
        > ...one (of many) alternatives for a pattern meeting your stated needs could be:
        >
        > ^(\d+\. ).*?((([\w'-]+)([\pP ]+)){0,3}\b[Aa]id\b)(((?5)(?4)){0,3})
        >
        > ...where the first set of parentheses ($1) captures the line number, dot and space; second outer set ($2) gets up to three words and their separators plus your sample term at word boundaries; and the third outer set ($6) looks for up to three words (by recycling in reverse order the subpatterns 4 and 5 of the second set of parens).
        >
        > In the following clips the %s%earch and %r%eplacement patterns are set apart as variables for clarity:
        >
        > ;(start long line)
        > ^!Set %s%=^(\d+\. ).*?((([\w'-]+)([\pP ]+)){0,3}\b[Aa]id\b)(((?5)(?4)){0,3})
        > ;(end long line)
        > ^!Set %r%=$1$2$6
        > ^!SetClipboard ^$GetDocListAll("^%s%";"^%r%\r\n")$
        > ^!Toolbar Paste New
        >
        > You might find it much more appealing though to collect context chunks by setting the maximum number of characters -- instead of words -- trimmed at word boundaries before and after the term:
        >
        > ^!Set %s%=^(\d+\. ).*?((\b\w.{0,17})?\b[Aa]id\b)((.{0,17}\w\b)?)
        > ^!Set %r%=$1$2$4
        > ^!SetClipboard ^$GetDocListAll("^%s%";"^%r%\r\n")$
        > ^!Toolbar Paste New
        >
      • diodeom
        After removing one needless grouping: ^!Set %s%=^( d+ . ).*?(( b w.{0,17})? b[Aa]id b(.{0,17} w b)?) ^!Set %r%=$1$2
        Message 3 of 6 , Sep 5, 2011
        • 0 Attachment
          After removing one needless grouping:

          ^!Set %s%=^(\d+\. ).*?((\b\w.{0,17})?\b[Aa]id\b(.{0,17}\w\b)?)
          ^!Set %r%=$1$2
        • KenH
          Thanks again. I ran across something in my research and tried it in this code. It seems to work but maybe I ll run into side effects somewhere down the line?
          Message 4 of 6 , Sep 5, 2011
          • 0 Attachment
            Thanks again. I ran across something in my research and tried it in this code. It seems to work but maybe I'll run into side effects somewhere down the line?

            [Aa]id -> (?i)aid

            --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
            >
            > After removing one needless grouping:
            >
            > ^!Set %s%=^(\d+\. ).*?((\b\w.{0,17})?\b[Aa]id\b(.{0,17}\w\b)?)
            > ^!Set %r%=$1$2
            >
          Your message has been successfully submitted and would be delivered to recipients shortly.