Loading ...
Sorry, an error occurred while loading the content.

Non Latin1 charsets (draft-holtman-http-negotiation-00.txt)

Expand Messages
  • Nickolay Saukh
    4.2 Accept-Charset I think last sentence of first paragraph should be written as The ISO-8859-1 character set can be assumed to be acceptable to all user
    Message 1 of 8 , Mar 1, 1996
    • 0 Attachment
      4.2 Accept-Charset

      I think last sentence of first paragraph should be written as
      "The ISO-8859-1 character set can be assumed to be acceptable
      to all user agents.". Rationale: per HTTP/1.1 draft
      (section 3.7.1) entity body without explicit charset can be
      US-ASCII only or ISO-8859-1. Thus any conforming user agent must
      be able to handle ISO-8859-1.

      4.6 Alternates

      Can media-type contain charset? Is this a valid exmaple?

      Alternates: {"TheProject.fr.html" 1.0
      {type "text/html"} {language "fr"}},
      {"TheProject.en.html" 1.0
      {type "text/html"} {language "en"}},
      {"TheProject.ru.html" 1.0
      {type "text/html;charset=iso-8859-5"} {language "ru"}}
      ("/cgi-bin/xlate?koi8-r+TheProject.ru.html" 1.0
      {type "text/html;charset=koi8-r"} language "ru"}}

      5.1 Reactive negotiation

      If two alternates are differ by charset only, how
      specify preferred one?

      Thanks
    • Mirsad Todorovac
      ... By the quality factor. This has been promoted in Lynx browser, and I havent been able to see Netscape and Mosiac NCSA support this. -- Mirsad --
      Message 2 of 8 , Mar 1, 1996
      • 0 Attachment
        >
        > 4.2 Accept-Charset
        >
        > I think last sentence of first paragraph should be written as
        > "The ISO-8859-1 character set can be assumed to be acceptable
        > to all user agents.". Rationale: per HTTP/1.1 draft
        > (section 3.7.1) entity body without explicit charset can be
        > US-ASCII only or ISO-8859-1. Thus any conforming user agent must
        > be able to handle ISO-8859-1.
        >
        > 4.6 Alternates
        >
        > Can media-type contain charset? Is this a valid exmaple?
        >
        > Alternates: {"TheProject.fr.html" 1.0
        > {type "text/html"} {language "fr"}},
        > {"TheProject.en.html" 1.0
        > {type "text/html"} {language "en"}},
        > {"TheProject.ru.html" 1.0
        > {type "text/html;charset=iso-8859-5"} {language "ru"}}
        > ("/cgi-bin/xlate?koi8-r+TheProject.ru.html" 1.0
        > {type "text/html;charset=koi8-r"} language "ru"}}
        >
        > 5.1 Reactive negotiation
        >
        > If two alternates are differ by charset only, how
        > specify preferred one?

        By the quality factor. This has been promoted in Lynx browser, and
        I havent been able to see Netscape and Mosiac NCSA support this. -- Mirsad

        --
        | Mirsad Todorovac |
        | Faculty of Electrical Engineering and Computing |
        | University of Zagreb |
        | Unska 3, Zagreb, Croatia 10000 |
        | |
        | e-mail: mirsad.todorovac@... |
      • Nickolay Saukh
        ... There is NO quality factor for Accept-Charset in current draft.
        Message 3 of 8 , Mar 1, 1996
        • 0 Attachment
          > > If two alternates are differ by charset only, how
          > > specify preferred one?
          >
          > By the quality factor. This has been promoted in Lynx browser, and
          > I havent been able to see Netscape and Mosiac NCSA support this. -- Mirsad

          There is NO quality factor for Accept-Charset in current draft.
        • Mirsad Todorovac
          ... Yeah, remembered the thread. However, from reviewing the thread I was not near to understand why. (I ve noted who started the thread Charsets
          Message 4 of 8 , Mar 1, 1996
          • 0 Attachment
            >
            > > > If two alternates are differ by charset only, how
            > > > specify preferred one?
            > >
            > > By the quality factor. This has been promoted in Lynx browser, and
            > > I havent been able to see Netscape and Mosiac NCSA support this. -- Mirsad
            >
            > There is NO quality factor for Accept-Charset in current draft.
            >

            Yeah, remembered the thread. However, from reviewing the thread I was not
            near to understand why. (I've noted who started the thread 'Charsets
            revisited').

            It seems to me like a part of HTTP problematics, because it's decided here
            which document to transfer. To those whose native language/encoding is
            en.us/us-ascii it may seem irrelevant, but to all the others it is a
            matter of high importance (the only workaround here seem to be CGI scipts
            which select language/charset to send -- yet they still need to know what
            the client side wants.

            So, IMVHO there should be a way to specify prefered language/encoding, with
            quality factors (which fit into current scheme for Accept: header), or by
            means of some other method.

            Eg. I want document in my native language/encoding,
            if there isn't one, I'd be happy with native language/us-ascii,
            and fallback would be en.us/us-ascii. -- Mirsad



            --
            | Mirsad Todorovac |
            | Faculty of Electrical Engineering and Computing |
            | University of Zagreb |
            | Unska 3, Zagreb, Croatia 10000 |
            | |
            | e-mail: mirsad.todorovac@... |
          • Daniel DuBois
            ... Charsets are not to appear in mime type tags in URI/Alternates headers. ... This variant below would be invalid by the anti-spoofing content negotiaion
            Message 5 of 8 , Mar 1, 1996
            • 0 Attachment
              At 02:04 PM 3/1/96 +0100, Mirsad Todorovac wrote:
              > Alternates: {"TheProject.fr.html" 1.0
              > {type "text/html"} {language "fr"}},
              > {"TheProject.en.html" 1.0
              > {type "text/html"} {language "en"}},
              > {"TheProject.ru.html" 1.0
              > {type "text/html;charset=iso-8859-5"} {language "ru"}}
              > ("/cgi-bin/xlate?koi8-r+TheProject.ru.html" 1.0
              > {type "text/html;charset=koi8-r"} language "ru"}}

              Charsets are not to appear in mime type tags in URI/Alternates headers.
              They ave their own slot. The above examples would be:
              > Alternates: {"TheProject.fr.html" 1.0
              > {type "text/html"} {language "fr"}},
              > {"TheProject.en.html" 1.0
              > {type "text/html"} {language "en"}},
              > {"TheProject.ru.html" 1.0
              > {type "text/html"} {charset "iso-8859-5"} {language "ru"}}

              This variant below would be invalid by the anti-spoofing content negotiaion
              clause because they don't have matching prefixes. Soemthing more valid
              would be:
              > ("TheProject.ru2.html" 1.0
              > {type "text/html" {charset "koi8-r"} {language "ru"} }

              >> There is NO quality factor for Accept-Charset in current draft.

              >So, IMVHO there should be a way to specify prefered language/encoding, with
              >quality factors (which fit into current scheme for Accept: header), or by
              >means of some other method.
              >Eg. I want document in my native language/encoding,
              >if there isn't one, I'd be happy with native language/us-ascii,

              There is a mechanism for specifying qualities on language, just not on
              charset. so you could ask for "Accept-Language: native-language; ql=1.0,
              en; ql=.7" and "Accept-Charset: native-charset" (which implicitly includes
              iso-8859-1).

              If that is not sufficient, we now that we have Koen's method for reactive
              negotiation, by which you will be able to precisely pick which varaint you
              want, which might be useful if you recieve the the language you wanted, but
              not the charset you wanted. (You recieve TheProject.ru.html, but you realize
              there was also a TheProject.ru2.html, so you ask for it by name.)
              -----
              Dan DuBois, Software Animal http://www.spyglass.com/~ddubois/
              Download a totally free copy of the Spyglass Web Server today!
              http://www.spyglass.com/products/server_download.html
            • Nickolay Saukh
              ... Holtman paper says (4.6 Alternates): The type, language, encoding, and length attributes of an alternate description refer to their Content-* header
              Message 6 of 8 , Mar 1, 1996
              • 0 Attachment
                > Charsets are not to appear in mime type tags in URI/Alternates headers.
                > They ave their own slot.

                Holtman paper says (4.6 Alternates):

                The type, language, encoding, and length attributes of an
                alternate description refer to their Content-* header
                counterparts.

                Content-Type has charset for text/html entities (for iso-8859-1
                it is implicit). Where is own slot for charset? As an extension
                postponed till HTTP/1.2?

                > If that is not sufficient, we now that we have Koen's method for reactive
                > negotiation, by which you will be able to precisely pick which varaint you
                > want, which might be useful if you recieve the the language you wanted, but
                > not the charset you wanted. (You recieve TheProject.ru.html, but you realize
                > there was also a TheProject.ru2.html, so you ask for it by name.)

                An example

                Accept-Language: ru, *;q=0
                Accept-Charset: iso-8859-5, koi8-r, unicode-1-1-utf8

                Server has alternates with all charsets. By current papers all my
                alternates has the same quality factor. With what charset I would
                receive document, if any, with preemptive negotiation? In what order
                alternates will be present to user agent for reactive negotiation? Per
                drafts the order is significant, because the first alternate is the
                best one. Why not to have quality factor charset? Like this

                Accept-Language: ru, *;q=0
                Accept-Charset: koi8-r, iso-8859-5;q=0.8; *;q=0
              • Koen Holtman
                ... There was a long discussion about ISO-8859-1 versus US-ASCII recently, and I must admit that I did not read all messages in that discussion. My impression
                Message 7 of 8 , Mar 2, 1996
                • 0 Attachment
                  Nickolay Saukh:
                  >
                  >4.2 Accept-Charset
                  >
                  >I think last sentence of first paragraph should be written as
                  >"The ISO-8859-1 character set can be assumed to be acceptable
                  >to all user agents.".

                  There was a long discussion about ISO-8859-1 versus US-ASCII recently,
                  and I must admit that I did not read all messages in that discussion.
                  My impression at the end was that most people wanted US-ASCII to stay
                  as the character set which can be assumed to be acceptable to all user
                  agents.

                  > Rationale: per HTTP/1.1 draft
                  >(section 3.7.1) entity body without explicit charset can be
                  >US-ASCII only or ISO-8859-1.

                  Yes.

                  >Thus any conforming user agent must
                  >be able to handle ISO-8859-1.

                  No, that is not a correct inference. It would make sense for every
                  user agent to be able to handle the all entity bodies without explicit
                  charset, but Section 3.7.1 does not require it.

                  >4.6 Alternates
                  >
                  >Can media-type contain charset? Is this a valid exmaple?
                  >
                  >Alternates: {"TheProject.fr.html" 1.0
                  > {type "text/html"} {language "fr"}},
                  > {"TheProject.en.html" 1.0
                  > {type "text/html"} {language "en"}},
                  > {"TheProject.ru.html" 1.0
                  > {type "text/html;charset=iso-8859-5"} {language "ru"}}
                  > ("/cgi-bin/xlate?koi8-r+TheProject.ru.html" 1.0
                  > {type "text/html;charset=koi8-r"} language "ru"}}

                  Yes. Contrary to what Daniel DuBois said in this thread,

                  {type "text/html;charset=iso-8859-5"}

                  is indeed the way to denote the charset.

                  This mirrors use of the Content-Type header, which specifies the MIME
                  type and optionally the charset. Note that we do not have a
                  Content-Charset header, but that we _do_ have an Accept-Charset
                  header. I believe that this asymmetry was caused by early versions of
                  HTTP trying to inherit as much semantics from the MIME specifications.
                  As far as I know, it is too late to fix it now.

                  Also, contrary to what Daniel DuBois said,

                  > ("/cgi-bin/xlate?koi8-r+TheProject.ru.html" 1.0
                  > {type "text/html;charset=koi8-r"} language "ru"}}

                  is a legal alternate description. What the anti-spoofing clause (the
                  origin server restriction) in Section 5.2 of draft-holtman says is
                  that origin servers may not return this alternate in a preemptive
                  negotiation response. This means that, if this alternate is the best
                  one, the origin server should send a reactive negotiation response,
                  which causes the client to retrieve the best alternate with a direct
                  request on /cgi-bin/xlate?koi8-r+TheProject.ru.html.

                  >5.1 Reactive negotiation
                  >
                  >If two alternates are differ by charset only, how
                  >specify preferred one?

                  The service author can specify the preferred one using the source
                  quality factors in the Alternates header:

                  {"notpreferred.html" 0.9 {type "text/html;charset=iso-8859-5"}}
                  {"preferred.html" 1.0 {type "text/html;charset=koi8-r"}}

                  or by the order in which the alternates are listed:

                  {"preferred.html" 1.0 {type "text/html;charset=koi8-r"}}
                  {"notpreferred.html" 1.0 {type "text/html;charset=iso-8859-5"}}

                  So it is up to the service author so decide for you which charset of
                  the ones you accept would give you the best results. The decision
                  made is reflected in the Alternates header.

                  You, as a user agent user, can not express a preference for one
                  charset over another, you can only say which ones you can handle.
                  There are no quality factors in the Accept-Charset header.

                  This means that the HTTP/1.1 draft spec assumes that if a user agent
                  puts a charset in its Accept-Charset header, it can handle this
                  charset perfectly, not just through some lossy on-the-fly filter. If
                  anything lossy happens, it must be done at the server side, and be
                  reflected in the Alternates header.

                  I don't know if this assumption of being able to handle perfectly all
                  charsets included in the Accept-Charset header is correct for all
                  current browsers. If it is not, we would have to decide if a) the
                  current browsers need to be improved, or b) the draft spec needs to be
                  extended. I would go for a), though I realize that this puts browsers
                  that don't use a bitmapped screen, like Lynx, in a difficult position.

                  Koen.
                • Albert Lunde
                  Message 8 of 8 , Mar 2, 1996
                  • 0 Attachment
                    >
                    > Nickolay Saukh:
                    > >
                    > >4.2 Accept-Charset
                    > >
                    > >I think last sentence of first paragraph should be written as
                    > >"The ISO-8859-1 character set can be assumed to be acceptable
                    > >to all user agents.".
                    >
                    > There was a long discussion about ISO-8859-1 versus US-ASCII recently,
                    > and I must admit that I did not read all messages in that discussion.
                    > My impression at the end was that most people wanted US-ASCII to stay
                    > as the character set which can be assumed to be acceptable to all user
                    > agents.

                    I wouldn't say I _want_ this.

                    One could argue that US-ASCII is the default character set in MIME
                    mail, but historically, prior versions of the HTTP/HTML spec have
                    specificed ISO-8859-1 as the default character set for the Web.

                    What I recall someone suggesting recently was that because of the
                    (IMHO broken) state of current practice, one couldn't safely
                    state that user agents actually were defaulting to ISO-8859-1.

                    Independent of this line of argument, the discussion of internationalization
                    that went into the HTML 2.0 RFC established the requirement that
                    all HTML interpreters should use an SGML document character set
                    that included at least the characters of ISO-8859-1. This
                    doesn't say anything much about the encoding, but it tends
                    to imply that support for ISO-8859-1 is a practical requirement
                    to comply with the HTML 2.0 spec.
                  Your message has been successfully submitted and would be delivered to recipients shortly.