Re: Bug report : Spell checking doesn't know about HTML entities
- François Pinard wrote:
> [Bram Moolenar]They also bypass compatibility problems for users who have to upload HTML
>> Tony Mechelynck wrote:
>>> In languages using accented letters, the Vim spell checker doesn't
>>> recognise HTML entities (in HTML text) [...]
>> You'll have to check if using & and ; in the middle of a word is
>> causing trouble. Adding them to word characters will probably create
>> different problems.
> Character entities come from the old time people were still trying to
> salvage the 8th bit of each byte, on communication channels, to convey
> byte parity. And also, whatever justification people may invent, to
> protect their laziness about using tools able to do more than ASCII.
pages to servers where they don't master the headers which will be sent with
the HTML. (Yes, now I know about the BOM and the META
HTTP-EQUIV="Content-Type" tag, but the former isn't mentioned and the latter
is only mentioned but not explained, in the books I have about HTML.)
Even now, email channels aren't guaranteed do be able to convey 8-bit text
other than by downgrading it to 7-bit by means of conversion schemes like
quoted-printable or base64: some servers are 8-bit-compliant, others still
aren't. In the email I get, I sometimes notice that the body has been
"autoconverted" between 8-bit, quoted-printable and base64 by my ISP's
routers, with no obviously apparent rule to such behaviour.
>That property is not part of the present rules; it is obsolete and deprecated:
> One property of character entities which is apparently not so well known
> (or maybe that property was withdrawn since then) is that the semicolon
> is optional. It is only mandatory where ambiguity would otherwise arise
> (for example, when a letter follows, a fairly common case after all).
"ce n'est pas la règle, c'est une tolérance". It is only recognised for
downward compatibility; IIUC, it does not apply to XHTML. The semicolon has of
course always been mandatory when the entity is immediately followed by a
letter or semicolon (or by a digit, but that is rarer).
>I suppose that's why the most recent standards require the semicolons.
> I presume that if software (or people) generating HTML were sparing
> those semicolons wherever they may be spared, a lot of other software
> would break, we would get a riot against people following standards :-).
Everything is worth precisely as much as a belch, the difference being
that a belch is more satisfying.
-- Ingmar Bergman