Loading ...
Sorry, an error occurred while loading the content.
 

[Clip] Re: font buster

Expand Messages
  • Charles A. Brannon
    OUCH! The previous post was mangled by egroups, so here s an attempt to correct it... ... You re right. the asterisk indicates zero (0) or more of the
    Message 1 of 5 , Jan 28, 2000
      OUCH! The previous post was mangled by egroups, so here's an attempt to
      correct it...

      --- <PRIVATE@...> wrote:
      > so - could you explain </*FONT[^>]*> to me?
      > I see two possibilities.
      > what I first thought was this:
      > The wild card character * renders the presence of the / optional
      > so that either <F or </F fit

      You're right. the asterisk indicates zero (0) or more of the preceding
      characters. This is actually a kludge because it would also match the
      incorrectly coded <//Font> as well as <FONT> and </FONT>. This is
      unlikely to happen and would be ignoired by your browser anyway, so I
      figure "good enough for our purposes"!

      > the [^>]
      > loses me completely
      >
      > *> and this looks like a normal wildcard * but here
      > no / precedes it so it functions as wildcard without the
      > slash so my above second thought is not as likely as I
      > thought! Could you clarify?

      The square brackets ('[' & ']') indicate specific characters allowed
      for that position (a class, in regexp jargon). This sort-of like the
      question mark ('?') in DOS command line regular expressions, only more
      specific. In this case, I am telling NoteTab that any character that is
      NOT ('^') a right angle bracket ('>') is a match. The trailing asterisk
      indicates, as above, that any number of these "non-matches" are valid.

      The sum effect of this is that this expression will match any of these
      _phrases_ any number of times on a given line:

      <FONT>
      </FONT>
      </////////////FONT garbagegarbagegarbagegarbagegarbagegarbage>
      <FONT code>
      </FONT code>
      <FONT code morecode this="THAT">
      </FONT code morecode this="THAT">

      but not the entire phrase

      '<FONT>don't delete words between brackets on same line</FONT>'

      which, BTW, would change as follows:

      '<FONT>don't delete words between brackets on same line</FONT>'
      ' don't delete words between brackets on same line '

      > Thanks,
      > PRIVATE SENDER

      When it comes to code *I* have written, one must NEVER thank me in
      advance ... 8^)

      Keeping in mind that NoteTab's implementation of regexp is very
      limited, be sure to check out the following address for a (slightly)
      less pitiful explanation:

      http://hill.ucs.ualberta.ca/Documentation/Info/by-chapter/gawk-3.0.3
      (chapter 4: gawk_5.html#SEC25)

      or, more appropriately, but with less explanation, search NoteTab help
      for 'Regular Expressions' or browse thusly:
      Contents | Reference | Dialog Boxes | Regular Expression

      [ THE ORIGINAL, IN ALL ITS UGLINESS, WITH LABEL NAME CHANGED TO 'LOOP' ]

      --- 8< ------ 8< ------ 8< ------ 8< ------ 8< ------ 8< ------ 8< ---

      H="remove font tags"

      :LOOP
      ; (/*) remove open AND close tags
      ; ([^>]*>) STOP AT FIRST CLOSING BRACKET <===
      ; (I) font AND FONT tags
      ; (R) </*FONT[^>]*> is a regexp
      ; (S) without user interaction
      ; (W) whole document (start from top)
      ^!Replace "</*FONT[^>]*>" >> "" IRSW
      ; Don't repeat after last tag is removed
      ^!IfError exit
      ; Could be one more; do it again.
      ^!Goto LOOP

      --- 8< ------ 8< ------ 8< ------ 8< ------ 8< ------ 8< ------ 8< ---
    Your message has been successfully submitted and would be delivered to recipients shortly.