Loading ...
Sorry, an error occurred while loading the content.

matchit patterns in HTML plugin (was: how to erase/substitute HTML tags)

Expand Messages
  • Benji Fisher
    ... Which problems? In the matchit patterns, case is handled by a separate variable. By line endings in the middle do you mean
    Message 1 of 7 , May 11, 2004
    • 0 Attachment
      On Sun, May 09, 2004 at 10:46:12PM +0300, Tuomo Latto wrote:
      > On Fri, 7 May 2004 09:20:31 -0400, Benji Fisher <benji@...>
      > wrote:
      > >>I doubt I can add much to that but it might be a good idea to use the i
      > >>flag
      > >>to make the substitution case insensitive. Remember, both cases are
      > >>valid.
      > >>(http://www.w3.org/TR/html4/cover.html)
      > >>Also to remove false positives (eg. BR when deleting all B's)
      > >>it is probably a good idea to also match the whitespace after the tag
      > >>itself.
      > >>
      > >>With these we get, for example:
      > >>:%s/<\/\?font\(\s[^>]*\)*>//gi
      > >
      > > Have a look at the patterns for matchit.vim in
      > >$VIMRUNTIME/ftplugin/html.vim . I think the pattern there for matching
      > >tags is very robust. If it can be improved, let us know!
      > I'm not much of a Vim wizard and really only use Vim as relatively
      > simple text editor with indenting, syntax highlighting and regexp
      > stuff. Also, I probably won't need the fixes for the problems.
      > However, at least in Vim 6.2 the pattern would seem to suffer from
      > some of the the problems I resolved with the corrections above
      > (note that there's a problem with line endings in the middle).

      Which problems? In the matchit patterns, case is handled by a
      separate variable. By "line endings in the middle" do you mean
      <tag open="here"
      close="next line">
      constructions? That is dealt with by the '>\|$' bit. For convenience,
      here is the pattern from ftplugin/html.vim:

      let b:match_words = '<:>,' .
      \ '<\@<=[ou]l[^>]*\%(>\|$\):<\@<=li>:<\@<=/[ou]l>,' .
      \ '<\@<=\([^/][^ \t>]*\)[^>]*\%(>\|$\):<\@<=/\1>'

      > I did find these problems:
      > 1. <LI> tags can not have any attributes (including inline style)
      > or they won't get recognized

      That is a good point. (And scrolling up one paragraph from the
      link you gave below shows that it is legal.) Perhaps we should replace
      'li>' (matches "li" followed by literal ">") with 'li\>' (matches "li"
      followed by word boundary).

      > 2. Definition lists are not handled as lists
      > (http://www.w3.org/TR/html4/struct/lists.html#h-10.3)

      We could add a line, modeled on the one for <ol> and <ul>:

      \ '<\@<=dl\>[^>]*\%(>\|$\):<\@<=d[td]\>:<\@<=/dl>,' .

      > 3. There is no elements beginning with UL or OL
      > other than the list definitions (at least in HTML 4.01)
      > so I'm not sure it there's much harm in this...
      > Nevertheless, eg. typoing a structure like <OLE><LI></LI>...</OL>
      > finds <OLE> as a start tag for OL entity.
      > This of course makes inline styles and attributes work for
      > the start tag.

      Ouch! Yes, we definitely need to replace '[ou]l[^>]*' with
      '[ou]l\>[^>]*' (enforce a word boundary after "ol" or "ul").
      I already made that change in the line for definition lists.

      > I don't think I know enough about matchit.vim to fix these
      > but changing them to something like the above might do the trick.
      > Oh btw, can the XML plugin handle these cases?

      So far, I only see a problem with the special patterns for handling
      lists, and these are not used in ftplugin/xml.vim . The ftplugin for
      xhtml reads in the one for html, so we only have to fix it in one place.

      Thanks for the feedback!

      HTH --Benji Fisher
    Your message has been successfully submitted and would be delivered to recipients shortly.