Loading ...
Sorry, an error occurred while loading the content.

Is xml:lang valid in an RSS Feed? Extended Attributes?

Expand Messages
  • bobwyman
    The RSS V2.0 specification only supports a single element in the channel and provides no means to flag language use at the item level. This causes
    Message 1 of 4 , Jan 20, 2004
    • 0 Attachment
      The RSS V2.0 specification only supports a single <language>
      element in the channel and provides no means to flag language use at
      the item level. This causes difficulties for sites such as ours at
      http://weblogs.pubsub.com/ where we provide a service of constructing
      custom, synthetic feeds in response to user "subscriptions" that are
      used to filter the 100K or so blogs that we read daily. The problem
      is that if users say that they are interested in items of "any
      language", we will create a mixed-language feed. We could use the ISO
      defined language code of "mul" (multiple) to fill the <language> tag,
      however, RSS V2.0 doesn't provide a mechanism to identify the
      language being used in specific items of a multi-language rss file.
      XML V1.0 defines a xml:lang attribute that can be used to define
      language usage at just about any level of granularity within an XML
      document. Given this, one might wonder why RSS bothers to define it's
      own <language> tag -- but we'll avoid that discussion.
      The RSS V2.0 specification speaks of extensibility in terms of
      adding support for new elements. However, the spec is silent on the
      subject of attribute usage. Given that the spec provides no guidance
      on this issue, let me ask: Is a feed which uses xml:lang attribute in
      its elements considered to be "RSS V2.0 compliant?"

      bob wyman
    • Mark Fletcher
      ... I can t answer that question, but I can pose/answer two related questions. Do any aggregators support xml:lang? Not that I m aware of. But the more
      Message 2 of 4 , Jan 21, 2004
      • 0 Attachment
        bobwyman wrote:

        > The RSS V2.0 specification speaks of extensibility in terms of
        >adding support for new elements. However, the spec is silent on the
        >subject of attribute usage. Given that the spec provides no guidance
        >on this issue, let me ask: Is a feed which uses xml:lang attribute in
        >its elements considered to be "RSS V2.0 compliant?"
        >
        >
        I can't answer that question, but I can pose/answer two related
        questions. Do any aggregators support xml:lang? Not that I'm aware of.
        But the more important question is, are the majority of feeds accurately
        labeled in terms of language. And in our experience, the answer is
        unfortunately a resounding no.

        With our Bloglines search feeds, we display 'translate' links next to
        matches that are from feeds in a different language than what the user
        has specified as their language. Unfortunately, in our experience,
        because most feeds aren't labeled with the correct language (it's either
        missing, or specified as English even when that's incorrect), we don't
        display a translate link for most matches.


        Mark
        --
        Mark Fletcher
        Bloglines
        http://www.bloglines.com
      • houghtoa
        ... at ... constructing ... are ... ISO ... tag, ... define ... it s ... of ... guidance ... in ... I have run into this problem, as well. The language
        Message 3 of 4 , Jan 21, 2004
        • 0 Attachment
          --- In RSS2-Support@yahoogroups.com, "bobwyman" <bob@w...> wrote:
          > The RSS V2.0 specification only supports a single <language>
          > element in the channel and provides no means to flag language use
          at
          > the item level. This causes difficulties for sites such as ours at
          > http://weblogs.pubsub.com/ where we provide a service of
          constructing
          > custom, synthetic feeds in response to user "subscriptions" that
          are
          > used to filter the 100K or so blogs that we read daily. The problem
          > is that if users say that they are interested in items of "any
          > language", we will create a mixed-language feed. We could use the
          ISO
          > defined language code of "mul" (multiple) to fill the <language>
          tag,
          > however, RSS V2.0 doesn't provide a mechanism to identify the
          > language being used in specific items of a multi-language rss file.
          > XML V1.0 defines a xml:lang attribute that can be used to
          define
          > language usage at just about any level of granularity within an XML
          > document. Given this, one might wonder why RSS bothers to define
          it's
          > own <language> tag -- but we'll avoid that discussion.
          > The RSS V2.0 specification speaks of extensibility in terms
          of
          > adding support for new elements. However, the spec is silent on the
          > subject of attribute usage. Given that the spec provides no
          guidance
          > on this issue, let me ask: Is a feed which uses xml:lang attribute
          in
          > its elements considered to be "RSS V2.0 compliant?"

          I have run into this problem, as well. The language element,
          according to the RSS 2.0 specification, is optional, so you can omit
          it in the case of a multilanguage feed. I thought somewhere in the
          RSS 2.0 specification it said that it could be extended, but the
          extensions must belong to a namespace. Since xml:lang is an
          available element to *any* XML grammer, e.g. defined in the xml:
          namespace, you can use it on the appropriate RSS 2.0 elements.

          Looking at the language element in RSS 2.0 specification, it appears,
          that you cannot use the ISO 639-2B nor ISO 639-2T codes for the
          language element. The RSS 2.0 specification says to use the Netscape
          codes or W3C codes and the W3C codes are based on RFC1766 which is
          based upon ISO 639-1. So my interpretation is that you cannot
          use "mul" (multiple) or "und" (undetermined) from ISO 639-2.

          But this begs the question that I think you are alluding to: what
          will RSS aggregators do with a missing language element or use of
          xml:lang? Most aggregators will use an XML parser for accessing the
          feed and the XML parser *should* automatically understand xml:lang.
          I don't think using xml:lang will be a problem and in my experience
          with producing multiple language feeds, it hasn't so far.

          Andy.
        • Bob Wyman
          ... But, the w3c documents point to RFC1766 as well as what was then an Internet Draft but is now RFC3066. The reference to RFC3066 is indirect. Nonetheless, I
          Message 4 of 4 , Jan 21, 2004
          • 0 Attachment
            houghta wrote:
            > The RSS 2.0 specification says to use the Netscape
            > codes or W3C codes and the W3C codes are based on RFC1766 which is
            > based upon ISO 639-1. So my interpretation is that you cannot
            > use "mul" (multiple) or "und" (undetermined) from ISO 639-2.
            But, the w3c documents point to RFC1766 as well as what was
            then an Internet Draft but is now RFC3066. The reference to RFC3066 is
            indirect. Nonetheless, I think it is clear that it is intended. (To
            see this, follow the footnote in the W3C document.[1] You'll find:

            [RFC1766]
            "Tags for the Identification of Languages", H. Alvestrand, March
            1995.
            RFC1766 is expected to be updated by
            http://www.ietf.org/internet-drafts
            /draft-alvestrand-lang-tags-v2-00.txt, currently a work in
            progress.

            Given this, I think it is fair to say that RSS V.20 delegates
            the definition of languages tags to W3C and W3C delegated that task to
            IETF which incorporates the tags from both ISO 639-1 and ISO 639-2
            while adding some additional tags and establishing a procedure to
            register more. Given this, it really does seem to me that "mul" and
            "und" are legal language values for RSS V2.0 as written. Is this
            reasonable given the additional explanation above?

            bob wyman

            [1] http://www.w3.org/TR/REC-html40/references.html#ref-RFC1766
          Your message has been successfully submitted and would be delivered to recipients shortly.