Loading ...
Sorry, an error occurred while loading the content.

Re: Bug report : Spell checking doesn't know about HTML entities

Expand Messages
  • A.J.Mechelynck
    ... They also bypass compatibility problems for users who have to upload HTML pages to servers where they don t master the headers which will be sent with the
    Message 1 of 5 , Mar 23, 2007
    • 0 Attachment
      François Pinard wrote:
      > [Bram Moolenar]
      >> Tony Mechelynck wrote:
      >>> In languages using accented letters, the Vim spell checker doesn't
      >>> recognise HTML entities (in HTML text) [...]
      >> You'll have to check if using & and ; in the middle of a word is
      >> causing trouble. Adding them to word characters will probably create
      >> different problems.
      > Character entities come from the old time people were still trying to
      > salvage the 8th bit of each byte, on communication channels, to convey
      > byte parity. And also, whatever justification people may invent, to
      > protect their laziness about using tools able to do more than ASCII.

      They also bypass compatibility problems for users who have to upload HTML
      pages to servers where they don't master the headers which will be sent with
      the HTML. (Yes, now I know about the BOM and the META
      HTTP-EQUIV="Content-Type" tag, but the former isn't mentioned and the latter
      is only mentioned but not explained, in the books I have about HTML.)

      Even now, email channels aren't guaranteed do be able to convey 8-bit text
      other than by downgrading it to 7-bit by means of conversion schemes like
      quoted-printable or base64: some servers are 8-bit-compliant, others still
      aren't. In the email I get, I sometimes notice that the body has been
      "autoconverted" between 8-bit, quoted-printable and base64 by my ISP's
      routers, with no obviously apparent rule to such behaviour.

      > One property of character entities which is apparently not so well known
      > (or maybe that property was withdrawn since then) is that the semicolon
      > is optional. It is only mandatory where ambiguity would otherwise arise
      > (for example, when a letter follows, a fairly common case after all).

      That property is not part of the present rules; it is obsolete and deprecated:
      "ce n'est pas la règle, c'est une tolérance". It is only recognised for
      downward compatibility; IIUC, it does not apply to XHTML. The semicolon has of
      course always been mandatory when the entity is immediately followed by a
      letter or semicolon (or by a digit, but that is rarer).

      > I presume that if software (or people) generating HTML were sparing
      > those semicolons wherever they may be spared, a lot of other software
      > would break, we would get a riot against people following standards :-).

      I suppose that's why the most recent standards require the semicolons.

      Best regards,
      Everything is worth precisely as much as a belch, the difference being
      that a belch is more satisfying.
      -- Ingmar Bergman
    Your message has been successfully submitted and would be delivered to recipients shortly.