Loading ...
Sorry, an error occurred while loading the content.

Stripping text up to a given character

Expand Messages
  • helvetica_switzerland
    Dear all. I use both NoteTab Pro and NoteTab Light with Windows XP. I just wrote two scripts to help me make several thousand file entries for a website. The
    Message 1 of 6 , Jun 7, 2009
    • 0 Attachment
      Dear all.
      I use both NoteTab Pro and NoteTab Light with Windows XP.

      I just wrote two scripts to help me make several thousand file entries for a website. The problem is as follows: The script makes four sections of text, all similar but not identical, based on what I have entered into the lines of a question box (metal, text, reference etc).
      The "reference" is pasted in as e.g.
      _neocaesarea_AE30_RecGen_11 (this MUST be in this combination of lower and upper case when I enter it)

      In certain lines of two of the four text sections that are generated, the _neocaesarea_AE30 (= city and size) parts need to be removed and the first part capitalised, printing only RecGen_11 in one of the lines (the entire _neocaesarea_AE30_RecGen_11 is used in another area (for the .jpg, .txt, .th.jpg links)).

      So is there anyway that I can tell NoteTab Pro to e.g. strip everything up to and including the third _ character.
      Because the city names (e.g. _neocaesarea) are anything from 4 to 15 letters long, I cannot use the "delete using number of characters" function.
      Many thanks for any suggestions.
    • Don - HtmlFixIt.com
      Very easy to do. You want to use Notetab 5.x or later (current is 6.1 I think). You will use regex. I didn t fully follow (a few lines of before and
      Message 2 of 6 , Jun 7, 2009
      • 0 Attachment
        Very easy to do. You want to use Notetab 5.x or later (current is 6.1 I
        think). You will use regex.

        I didn't fully follow (a few lines of "before" and "after" would have
        helped), so we may have to play a little, but to take out everything up
        to and including the third _ on each line we use this:

        1. click on search an replace
        2. choose regex
        3. be sure you are at the top of the document
        4. type this is the search box (minus the quotes) ".*_.*_.*_"
        5. hit replace all

        If you do it often do it with a clip (include the quotes this time):
        ^!Replace ".*_.*_.*_" >> "" RAWS

        Clips are generally discussed on a different list, but since this was an
        easy one I put it there, but no further discussion of it should occur here.

        helvetica_switzerland wrote:
        > Dear all.
        > I use both NoteTab Pro and NoteTab Light with Windows XP.
        >
        > I just wrote two scripts to help me make several thousand file entries for a website. The problem is as follows: The script makes four sections of text, all similar but not identical, based on what I have entered into the lines of a question box (metal, text, reference etc).
        > The "reference" is pasted in as e.g.
        > _neocaesarea_AE30_RecGen_11 (this MUST be in this combination of lower and upper case when I enter it)
        >
        > In certain lines of two of the four text sections that are generated, the _neocaesarea_AE30 (= city and size) parts need to be removed and the first part capitalised, printing only RecGen_11 in one of the lines (the entire _neocaesarea_AE30_RecGen_11 is used in another area (for the .jpg, .txt, .th.jpg links)).
        >
        > So is there anyway that I can tell NoteTab Pro to e.g. strip everything up to and including the third _ character.
        > Because the city names (e.g. _neocaesarea) are anything from 4 to 15 letters long, I cannot use the "delete using number of characters" function.
        > Many thanks for any suggestions.
        >
      • Sheri
        ... Hi, It doesn t matter if there are only 3 underscores on a line, but that pattern (whether used in the dialog or from a clip) would match everything up to
        Message 3 of 6 , Jun 7, 2009
        • 0 Attachment
          --- In notetab@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
          >
          > to take out everything up to and including the third _ on each
          > line we use this:
          >
          > 1. click on search an replace
          > 2. choose regex
          > 3. be sure you are at the top of the document
          > 4. type this is the search box (minus the quotes) ".*_.*_.*_"
          > 5. hit replace all
          >
          > If you do it often do it with a clip (include the quotes this time):
          > ^!Replace ".*_.*_.*_" >> "" RAWS
          >
          > Clips are generally discussed on a different list, but since this
          > was an easy one I put it there, but no further discussion of it
          > should occur here.

          Hi, It doesn't matter if there are only 3 underscores on a line, but that pattern (whether used in the dialog or from a clip) would match everything up to and including the LAST underscore on each line. It needs some non-greedy indicators (question marks). Also, to match from the beginning of each line, it needs to start with a caret. Without that, the next three to be matched could occur on the same line. Any of the following would match through the third underscore per line.

          ^.*?_.*?_.*?_
          or
          ^(.*?_){3}
          or
          ^(?:.*?_){3}

          Regards,
          Sheri
        • Don - HtmlFixIt.com
          true ... exactly the kind of thing a sample would show I guess ... early in the morning I like your middle one -- repeat same time 3 very efficient ... I don t
          Message 4 of 6 , Jun 7, 2009
          • 0 Attachment
            true ... exactly the kind of thing a sample would show I guess ... early
            in the morning
            I like your middle one -- repeat same time 3 very efficient ...
            I don't quite get the last one however ... what does that ?: (at least I
            think it's a questionmark colon) do?

            Sheri wrote:
            > --- In notetab@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
            >> to take out everything up to and including the third _ on each
            >> line we use this:
            >>
            >> 1. click on search an replace
            >> 2. choose regex
            >> 3. be sure you are at the top of the document
            >> 4. type this is the search box (minus the quotes) ".*_.*_.*_"
            >> 5. hit replace all
            >>
            >> If you do it often do it with a clip (include the quotes this time):
            >> ^!Replace ".*_.*_.*_" >> "" RAWS
            >>
            >> Clips are generally discussed on a different list, but since this
            >> was an easy one I put it there, but no further discussion of it
            >> should occur here.
            >
            > Hi, It doesn't matter if there are only 3 underscores on a line, but that pattern (whether used in the dialog or from a clip) would match everything up to and including the LAST underscore on each line. It needs some non-greedy indicators (question marks). Also, to match from the beginning of each line, it needs to start with a caret. Without that, the next three to be matched could occur on the same line. Any of the following would match through the third underscore per line.
            >
            > ^.*?_.*?_.*?_
            > or
            > ^(.*?_){3}
            > or
            > ^(?:.*?_){3}
            >
            > Regards,
            > Sheri
            >
            >
            >
            > ------------------------------------
            >
            > Fookes Software: http://www.fookes.com/
            > NoteTab website: http://www.notetab.com/
            > NoteTab Discussion Lists: http://www.notetab.com/groups.php
            >
            > ***
            > Yahoo! Groups Links
            >
            >
            >
            >
          • Sheri
            ... Normally the parenthesized part of the pattern would be captured, in this case as $1. Question mark colon says don t bother to capture. Its more efficent
            Message 5 of 6 , Jun 7, 2009
            • 0 Attachment
              --- In notetab@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
              >
              > true ... exactly the kind of thing a sample would show I guess
              > ... early in the morning I like your middle one -- repeat same
              > time 3 very efficient ... I don't quite get the last one however
              > ... what does that ?: (at least I think it's a questionmark
              > colon) do?

              Normally the parenthesized part of the pattern would be captured, in this case as $1. Question mark colon says don't bother to capture. Its more efficent to disable the capture when unused, and in this case (because the repetition) the capture would be useless anyway. Without the question mark colon, $1 would exist but it would contain only the last of the three matches of .*?_

              Regards,
              Sheri
            • helvetica_switzerland
              Thanks Don and Sheri I need to upgrade my NoteTab Pro. I have the trusty old 4.95 version. I hope my upgrade is free !
              Message 6 of 6 , Jun 8, 2009
              • 0 Attachment
                Thanks Don and Sheri
                I need to upgrade my NoteTab Pro. I have the trusty old 4.95 version. I hope my upgrade is free !
              Your message has been successfully submitted and would be delivered to recipients shortly.