Loading ...
Sorry, an error occurred while loading the content.

Re: [rss-public] HTML and Escaped Text (was Re: textInput)

Expand Messages
  • Sam Ruby
    ... ;-) ... Perhaps. Or maybe not. I m not sure. Let me try to approach it by peeling back the layers. First, Let s put on the back burner for the moment the
    Message 1 of 16 , Feb 10, 2006
    • 0 Attachment
      rcade wrote:
      > --- In rss-public@yahoogroups.com, Sam Ruby <rubys@...> wrote:
      >
      >>Poor attempt at humor: didn't you once study journalism?
      >
      > Ouch. :-).

      ;-)

      > Here's another try:
      >
      > 1. Item description is the only element defined in the RSS
      > specification that contains HTML. All HTML markup MUST be escaped with
      > HTML entities or enclosed within a CDATA block.
      >
      > 2. All other element and attribute text is plain text.
      >
      > 3. Aggregators MAY collapse whitespace and line breaks in character data.
      >
      > 4. Elements that contain character data MUST NOT have child elements.

      Perhaps. Or maybe not. I'm not sure.

      Let me try to approach it by peeling back the layers.

      First, Let's put on the back burner for the moment the HTML entities and
      CDATA discussion. And lets also accept for the moment points 3 and 4
      (although they only apply to elements in RSS feeds that are not in a
      namespace, or to elements defined in this specification, or however you
      want to word it).

      What we now have is:

      1. Item description is the only element defined in the RSS
      specification that contains HTML.

      2. All other element and attribute text is plain text.

      Now let me get all Strunk and White on you, omit needless words and all
      that. Only elements... all other elements... what's that all about?
      Next pass:

      1. Item description contains HTML.

      2. All other element and attribute text is plain text.

      Now let's apply parallel sentence structure. Either change both to "is"
      or both to "contains". Otherwise people will attempt to read something
      into the difference in wording. Example:

      1. Item description text is HTML.

      2. All other element and attribute text is plain text.

      If we end up there, I am happy. It is clear that item description is
      not, or does not ever contain, plain text. Everything else is, or
      always contains, plain text.

      I can then update the Feed Validator to check to make sure that every
      < in an item description is immediately followed by the name of an
      HTML tag, and together we can educate the Yahoo!s of the world that
      while what they wrote may have been consistent with the definition in
      1999, it just doesn't quite cut it in 2006.

      Better yet, the updated RSS 2.0 spec will be in line with the existing
      behavior of all the aggregators that are in widespread use today, as I
      can find *NOBODY* that deals with the possibility that item descriptions
      might be plain text.

      Oh, and as far as the CDATA goes, the easiest thing to do is to define
      element text and attribute text in terms of the infoset, as that means
      that you don't need to worry about anything in Appendix D. [1]

      - Sam Ruby

      [1] http://www.w3.org/TR/xml-infoset/#omitted
    Your message has been successfully submitted and would be delivered to recipients shortly.