Loading ...
Sorry, an error occurred while loading the content.

find/replace in tags

Expand Messages
  • Don Strack
    How do I strip the class contents of any and all tags, as below: change this: [whatever] to this: [whatever] Don Strack
    Message 1 of 6 , Oct 5, 2008
    • 0 Attachment
      How do I strip the "class" contents of any and all tags, as below:

      change this:

      <td class="[whatever]">[whatever]</td>

      to this:

      <td>[whatever]</td>

      Don Strack
    • Rudolf Horbas
      Hin Don, ... Simple. (I assume you have [whatever1] and [whatever2]) ... Reg. Exp.:
      Message 2 of 6 , Oct 5, 2008
      • 0 Attachment
        Hin Don,

        > How do I strip the "class" contents of any and all tags, as below:
        >
        > change this:
        >
        > <td class="[whatever]">[whatever]</td>
        >
        > to this:
        >
        > <td>[whatever]</td>

        Simple. (I assume you have [whatever1] and [whatever2])

        --------------------------------------------------
        Reg. Exp.: \<td class\="[^"]*"\>([^<]*)\</td\>
        Replace with: <td>$1</td>

        [ ] Case Sensitive [x] Regular Exp.
        --------------------------------------------------

        Does this help?

        I suggest you dive into Regular Expressions. The NTP help file contains
        some hints, and you might find some links to tutorials here:
        http://en.wikipedia.org/wiki/Regular_expression

        They're fun, and extremely useful!

        Rudi
      • Don Strack
        ... I guess I asked the wrong question. What I meant to ask was: How do I strip everything from inside the tags. I tried the above string, I m not able to
        Message 3 of 6 , Oct 6, 2008
        • 0 Attachment
          > --------------------------------------------------
          > Reg. Exp.: \<td class\="[^"]*"\>([^<]*)\</td\>
          > Replace with: <td>$1</td>
          >
          > [ ] Case Sensitive [x] Regular Exp.
          > --------------------------------------------------
          >
          > Does this help?
          >
          > I suggest you dive into Regular Expressions. The NTP help file
          > contains
          > some hints, and you might find some links to tutorials here:
          > http://en.wikipedia.org/wiki/Regular_expression

          I guess I asked the wrong question. What I meant to ask was:

          How do I strip everything from inside the <td> tags. I tried the above
          string, I'm not able to figure out what to leave in and what to take out.

          I see what the \ before the various characters does, to maintain the
          <td> and </td>, and the " part of the class=, but what do the other
          parts do?

          [^"]*

          and

          ([^<]*)

          Whatever I try, it either finds nothing, or selects everything between
          the first <td> and the last </td>.

          I've inherited with over 100 files that were autogenerated over the
          years by at least three different html editors. Most of the files are
          simple HTML tables. I have several <td> tags with an "a href", among
          other helpful bloat. I'm using Dreamweaver and NoteTab Pro to clean up
          the mess, and prepare them all for proper CSS implementation.

          I've stared at the help file in NTP for what seems like hours, and
          tried numerous different character combinations. Also, I've been to
          several web sites with tutorials about regular expressions. I simply
          don't see the I need to do to get them to work.

          Don Strack
        • Don Strack
          ... Find, with Reg Exp: ]* (.*?) Replace: $1 Of course, it also replaced every tag combination (5600 of them, in just one
          Message 4 of 6 , Oct 6, 2008
          • 0 Attachment
            > --------------------------------------------------
            > Reg. Exp.: \<td class\="[^"]*"\>([^<]*)\</td\>
            > Replace with: <td>$1</td>

            Find, with Reg Exp: <td\b[^>]*>(.*?)</td>

            Replace: <td>$1</td>

            Of course, it also replaced every <td></td> tag combination (5600 of
            them, in just one file), but that's okay.

            Don Strack
          • Marcelo de Castro Bastos
            ... OK, let s take one detail at a time. Square brackets define a class of characters. Meaning that they match any character that belongs to that class. The
            Message 5 of 6 , Oct 6, 2008
            • 0 Attachment
              Interviewed by CNN on 6/10/2008 21:39, Don Strack told the world:
              >> --------------------------------------------------
              >> Reg. Exp.: \<td class\="[^"]*"\>([^<]*)\</td\>
              >> Replace with: <td>$1</td>
              >>
              >> [ ] Case Sensitive [x] Regular Exp.
              >> --------------------------------------------------
              >>
              >> Does this help?
              >>
              >> I suggest you dive into Regular Expressions. The NTP help file
              >> contains
              >> some hints, and you might find some links to tutorials here:
              >> http://en.wikipedia.org/wiki/Regular_expression
              >>
              >
              > I guess I asked the wrong question. What I meant to ask was:
              >
              > How do I strip everything from inside the <td> tags. I tried the above
              > string, I'm not able to figure out what to leave in and what to take out.
              >
              > I see what the \ before the various characters does, to maintain the
              > <td> and </td>, and the " part of the class=, but what do the other
              > parts do?
              >
              > [^"]*
              >
              > and
              >
              > ([^<]*)
              >
              >
              OK, let's take one detail at a time.
              Square brackets define a "class" of characters. Meaning that they match
              any character that belongs to that class.
              The "^" as the first character inside the square brackets INVERTS the
              definition of the class. Thus, [^"] means "any character EXCEPT double
              quotes".
              The asterisk means "0 or more occurrences of the thing that comes
              before". So, [^"]* will match any string that does NOT contain a double
              quote -- that is, it will match everything from the point it begins
              until it finds a double quote (the double quote will "stay outside" of
              the match, so to speak).

              The parentheses assign whatever matched the string inside to a variable.
              So, the second example will find a string that ends just before the next
              HTML tag, and assign it to a variable. Then, in the "replace" field, you
              can use a token (like $1) to insert back that string.

              This is explained in the "Help on Regular Expressions" in the NoteTab
              Help menu. There are also lots of places around the Web where you can
              find more about regular expressions.


              Marcelo
              -=-=-
              Jury: Twelve people who decide who has the better lawyer
              * TagZilla 0.066 on Seamonkey 1.1.12
            • Alec Burgess
              Don Strack (donstrack@comcast.net) wrote (in part) (on 2008-10-06 at ... Don: In case you are still struggling after Rudolf s and Marcello s posts? Sounds
              Message 6 of 6 , Oct 6, 2008
              • 0 Attachment
                Don Strack (donstrack@...) wrote (in part) (on 2008-10-06 at
                20:39):
                > Whatever I try, it either finds nothing, or selects everything between
                > the first <td> and the last </td>

                Don: In case you are still struggling after Rudolf's and Marcello's posts?

                Sounds like you might be getting bitten by regex "greedy" mode.
                Though I'm not sure how it would get from first to last unless the html
                code is all in one line. You can control whether a . (dot) can span more
                than one line by putting either (?s) -span multiple lines or (?-s)
                -span ONLY one line at the beginning of your regular expression. If
                neither is specified, (?-s) is implied. So:

                * (?s)<td>.*</td>
                o will find from the first <td> to the LAST </td> in the
                entire file (greedy mode)
                * (?s)<td>.*?</td>
                o will find from the first <td> to the NEXT </td> even if its
                on the next or following lines. (non-greedy mode)

                Your original question was:
                > How do I strip the "class" contents of any and all tags, as below:
                >
                > change this:
                >
                > <td class="[whatever]">[whatever]<
                >
                > /td>
                >
                > to this:
                >
                > <td>[whatever]</td>
                >
                ie. class can appear in other tags than just <td> and (I assume) any
                other attributes are to be left unchanged as is - CORRECT?

                I *think* this should work:
                ^!replace "\bclass=".*?" >> "" rwsai
                In English: match from a word boundary (\b) followed by literal class="
                to the very next occurrence of quote (.*?") and replace everything
                matched by nothing.

                If this doesn't work and/or you can't make it work with a tweaking you
                can guess at .... Could you take one old file and get it to the state
                before you wanted to apply this fix and manually change it to what you
                want the result to be and zip the two files together as before.html and
                after.html?

                I created a folder in our yahoo-group files:
                http://tech.groups.yahoo.com/group/ntb-html/files/HelpMePlease/ where
                you can post the zip.

                --
                Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)




                [Non-text portions of this message have been removed]
              Your message has been successfully submitted and would be delivered to recipients shortly.