Loading ...
Sorry, an error occurred while loading the content.

need help with regex doing a ^!Replace command

Expand Messages
  • mfrascinella@comcast.net
    Hi, I can t quite figure out how to use regex to replace tags that surround some text. I have this situation: Sun., Nov. 11, midday
    Message 1 of 5 , Nov 6, 2007
    • 0 Attachment
      Hi,

      I can't quite figure out how to use regex to replace tags that surround some text.

      I have this situation:

      <p><span class="subhead12">Sun., Nov. 11, midday
      <br>Business Meeting</span>

      and I want it to look like this:

      <li>Sun., Nov. 11, midday - <strong>Business Meeting</strong></li>

      I figured out how to replace the tags on the first line but how do I replace the second line using regex?

      I tried this and get an error that no replacements were made.

      ^!Replace "\n<br>*[a-z0-9]</span>" >> " - <strong>*[a-z0-9]</strong>" IAR

      I thought that the expression [a-z0-9] would find the characters between the tags and keep them in the replaced text. Even the online help doesn't help much. I guess I need to see more complete examples.

      This should be easy for your regex experts.

      Yours,

      Michael F
      *************************
    • Sheri
      ... IAR I assume you have NoteTab 5.5. Regex was somewhat different in 4.95. ... [a-z0-9] will find one such character. Note that your text has a
      Message 2 of 5 , Nov 6, 2007
      • 0 Attachment
        --- In ntb-clips@yahoogroups.com, mfrascinella@... wrote:

        > ^!Replace "\n<br>*[a-z0-9]</span>" >> " - <strong>*[a-z0-9]
        </strong>" IAR

        I assume you have NoteTab 5.5. Regex was somewhat different in 4.95.

        > I thought that the expression [a-z0-9] would find the characters
        > between the tags and keep them in the replaced text. Even the online
        > help doesn't help much. I guess I need to see more complete examples.

        [a-z0-9] will find one such character. Note that your text has a space
        between Business and Meeting, so even if you allowed for multiple
        matching characters (e.g., by having plus sign after the character
        class, it wouldn't match. Your regular expression says:

        You start by matching a single linefeed. While a file is loaded as a
        document, linebreaks consist of a carriage return followed by a
        linefeed. If your replacement worked, you would be getting rid of the
        linefeed but keeping the carriage return. \R matches various entire
        linebreaks (or you could use \r\n).

        Your expression says: "match a linefeed followed by <br followed by
        one or zero >s, followed by an alphanumeric character followed by
        </span>"

        Your replacement string says replace above matches with " - <strong>*
        [a-z0-9]</strong>" literally.

        On the replacement side, you seem to be trying to write another
        regular expression. The replacement string should consist of literal
        text, backreferences to sets of parentheses in the regular expression,
        and possibly some escaped digits or hex codes.

        To get backreferences into the regular expression you need to have
        some parentheses in your regular expression (the match side of the
        replacement command). To put them in the replacement string, count
        sets of parentheses from left to right, and put the number after a
        dollar sign in the replacement text.

        Hope it helps. I don't see anything about the </li> in your
        replacement text? You should probably be doing the whole thing as one
        expresssion and replacement.

        Something like this:

        ^!Replace "<p><span class="subhead12">([^\r\n]+)\r\n *<br>([^<\r\n]+)
        </span>" >> "<li>$1 - <strong>$2</strong></li>" RAS

        Regards,
        Sheri
      • mfrascinella@comcast.net
        Sheri, [Somehow my reply from last week never got posted.] Thanks for the code examples. They worked great. But I cannot quite figure out how you built the
        Message 3 of 5 , Nov 13, 2007
        • 0 Attachment
          Sheri,

          [Somehow my reply from last week never got posted.]

          Thanks for the code examples. They worked great. But I cannot quite
          figure out how you built the regex statement.

          ([^\r\n]+)\r\n *<br>([^<\r\n]+)

          It looks like it is searching for the start of a line (^) but is that
          column 1 in the file or something else? Then it searches for a CRLF,
          then zero or more spaces and the <br> tag. Then it looks like it
          repeats some of this code and ends with the plus sign for one or more
          matches of something.

          I read through the online help but am still unclear as to how one
          builds this kind of a regex string.

          But your code does look like a good model for replacing tags around
          text but allowing you to keep the same text.

          Yours,

          Michael F.


          > To get backreferences into the regular expression you need to have
          > some parentheses in your regular expression (the match side of the
          > replacement command). To put them in the replacement string, count
          > sets of parentheses from left to right, and put the number after a
          > dollar sign in the replacement text.
          >
          > Hope it helps. I don't see anything about the </li> in your
          > replacement text? You should probably be doing the whole thing as
          one
          > expresssion and replacement.
          >
          > Something like this:

          > ^!Replace "<p><span class="subhead12">([^\r\n]+)\r\n
          *<br>([^<\r\n]+)
          > </span>" >> "<li>$1 - <strong>$2</strong></li>" RAS
          >
          > Regards,
          > Sheri
          >
        • Sheri
          ... Hi Michael, When a character class (i.e., characters in square brackets) start with a ^ it has a different meaning. It negates the character class. That
          Message 4 of 5 , Nov 13, 2007
          • 0 Attachment
            --- In ntb-clips@yahoogroups.com, mfrascinella@... wrote:
            >
            > Sheri,
            >
            > [Somehow my reply from last week never got posted.]
            >
            > Thanks for the code examples. They worked great. But I cannot quite
            > figure out how you built the regex statement.
            >
            > ([^\r\n]+)\r\n *<br>([^<\r\n]+)
            >
            > It looks like it is searching for the start of a line (^) but is
            > that column 1 in the file or something else? Then it searches for
            > a CRLF, then zero or more spaces and the <br> tag. Then it looks
            > like it repeats some of this code and ends with the plus sign for
            > one or more matches of something.
            >
            > I read through the online help but am still unclear as to how one
            > builds this kind of a regex string.
            >
            > But your code does look like a good model for replacing tags around
            > text but allowing you to keep the same text.


            Hi Michael,

            When a character class (i.e., characters in square brackets) start
            with a ^ it has a different meaning. It negates the character class.
            That means, all characters except for the ones in the character class.
            In the above, [^\r\n] says any character except a carriage return or
            line feed. The plus outside the character class makes it match
            multiple such characters. The parentheses around that puts the part
            that matches that subpattern into substring 1, which can be extracted
            in the replacement text with $1. After that subpattern, it matches a
            carriage return followed by a line feed followed by zero or more
            spaces followed by <br>. Then it matches multiples of characters that
            are not left angle bracket, carriage return or linefeeds and captures
            them as substring 2.

            You could put a ^ at the start of the pattern, it might be more
            optimized because then it would be considered anchored, e.g.:

            ^([^\r\n]+)\r\n *<br>([^<\r\n]+)

            Regards,
            Sheri
          • m_frascinella
            Sheri, Aha, now I get it. Every character is loaded with meaning. Now that I see it explained piece by piece, I see how the search pattern is built and
            Message 5 of 5 , Nov 14, 2007
            • 0 Attachment
              Sheri,

              Aha, now I get it. Every character is loaded with meaning. Now that I
              see it explained piece by piece, I see how the search pattern is built
              and replaced. Thanks for the example and for the detailed explanation.
              It goes a long way in enabling me to understand how to use regex.

              For safe keeping, I put your explanation inside the clip that uses it
              (for handy reference).

              Thanks.

              Michael F.


              --- In ntb-clips@yahoogroups.com, "Sheri" <silvermoonwoman@...> wrote:
              > Hi Michael,
              >
              > When a character class (i.e., characters in square brackets) start
              > with a ^ it has a different meaning. It negates the character class.
              > That means, all characters except for the ones in the character class.
              > In the above, [^\r\n] says any character except a carriage return or
              > line feed. The plus outside the character class makes it match
              > multiple such characters. The parentheses around that puts the part
              > that matches that subpattern into substring 1, which can be extracted
              > in the replacement text with $1. After that subpattern, it matches a
              > carriage return followed by a line feed followed by zero or more
              > spaces followed by <br>. Then it matches multiples of characters that
              > are not left angle bracket, carriage return or linefeeds and captures
              > them as substring 2.
              >
              > You could put a ^ at the start of the pattern, it might be more
              > optimized because then it would be considered anchored, e.g.:
              >
              > ^([^\r\n]+)\r\n *<br>([^<\r\n]+)
              >
              > Regards,
              > Sheri
              >
            Your message has been successfully submitted and would be delivered to recipients shortly.