Loading ...
Sorry, an error occurred while loading the content.

can charsets be quoted.

Expand Messages
  • Melman, Howard
    Is this legal: Content-Type: text/html; charset= iso-8859-1 Specifically are the double quotes around the charset value legal? I assume the intent is that
    Message 1 of 7 , Feb 7, 2001
    • 0 Attachment
      Is this legal:

      Content-Type: text/html; charset="iso-8859-1"

      Specifically are the double quotes around the charset value
      legal? I assume the intent is that they are, but I believe
      the spec as written doesn't allow for them. I know others
      (like WebDAV) assume you can use double quotes, and I know
      it's legal in MIME (see below)

      In RFC 2616 14.17 Content-Type refers you to 3.7 on media
      types. 3.7 defines media-type as:

      media-type = type "/" subtype *( ";" parameter )

      and refers you to 3.6 to define parameter. 3.6 says:

      Parameters are in the form of attribute/value pairs.

      parameter = attribute "=" value
      attribute = token
      value = token | quoted-string

      so the values can be a token or a quoted-string, great, it
      seems that charset values can be quoted. BUT the last
      paragraph of 3.7.1 says:

      The "charset" parameter is used with some media types to define the
      character set (section 3.4) of the data. When no explicit charset
      parameter is provided by the sender, media subtypes of the "text"
      type are defined to have a default charset value of "ISO-8859-1" when
      received via HTTP. Data in character sets other than "ISO-8859-1" or
      its subsets MUST be labeled with an appropriate charset value. See
      section 3.4.1 for compatibility problems.

      Specifically referring us to section 3.4 for the definition
      of the charset parameter. 3.4 defines charset as:

      HTTP character sets are identified by case-insensitive tokens. The
      complete set of tokens is defined by the IANA Character Set registry
      [19].

      charset = token

      And "token" doesn't allow quotes. Shouldn't this be:

      charset = token | quoted-string

      or else, doesn't the spec disallow quotes around charset
      values? Or should section 3.4 not offer a BNF for charset
      at all in which case it would be clear that it's just
      another parameter and therefore the value is token or
      quoted-string? Or, at least, section 3.4 should say that
      this BNF is semantic and that quotes around token are used
      to delimit the parameter (see below).

      If you're trying to figure out what the spec says for
      charset values, and you turn to section 3.4 since it defines
      charsets, in it's current form, you get a very different
      notion of what's allowed then I think is intended.

      Howard


      MIME's view of things, as best as I can find, is RFC 2045 section 5.1:

      Note that the value of a quoted string parameter does not include the
      quotes. That is, the quotation marks in a quoted-string are not a
      part of the value of the parameter, but are merely used to delimit
      that parameter value. In addition, comments are allowed in
      accordance with RFC 822 rules for structured header fields. Thus the
      following two forms

      Content-type: text/plain; charset=us-ascii (Plain text)

      Content-type: text/plain; charset="us-ascii"

      are completely equivalent.
    • Joris Dobbelsteen
      Interresting problem ... Till here it seems to be all right.... ... Well, it doesn t point explicitly to the value, thus: value = charset | token |
      Message 2 of 7 , Feb 7, 2001
      • 0 Attachment
        Interresting problem

        >-----Original Message-----
        >From: Melman, Howard [mailto:Howard@...]
        >Sent: Wednesday, 07 February 2001 17:46
        >To: HTTP Working Group
        >Subject: can charsets be quoted.
        >
        >
        >
        >Is this legal:
        >
        > Content-Type: text/html; charset="iso-8859-1"
        >
        >Specifically are the double quotes around the charset value
        >legal? I assume the intent is that they are, but I believe
        >the spec as written doesn't allow for them. I know others
        >(like WebDAV) assume you can use double quotes, and I know
        >it's legal in MIME (see below)
        >
        >In RFC 2616 14.17 Content-Type refers you to 3.7 on media
        >types. 3.7 defines media-type as:
        >
        > media-type = type "/" subtype *( ";" parameter )
        >
        >and refers you to 3.6 to define parameter. 3.6 says:
        >
        > Parameters are in the form of attribute/value pairs.
        >
        > parameter = attribute "=" value
        > attribute = token
        > value = token | quoted-string
        >

        Till here it seems to be all right....


        >so the values can be a token or a quoted-string, great, it
        >seems that charset values can be quoted. BUT the last
        >paragraph of 3.7.1 says:
        >
        > The "charset" parameter is used with some media types to define the
        > character set (section 3.4) of the data. When no explicit charset
        > parameter is provided by the sender, media subtypes of the "text"
        > type are defined to have a default charset value of
        >"ISO-8859-1" when
        > received via HTTP. Data in character sets other than "ISO-8859-1" or
        > its subsets MUST be labeled with an appropriate charset value. See
        > section 3.4.1 for compatibility problems.
        >
        >Specifically referring us to section 3.4 for the definition
        >of the charset parameter. 3.4 defines charset as:
        >
        > HTTP character sets are identified by case-insensitive tokens. The
        > complete set of tokens is defined by the IANA Character Set registry
        > [19].
        >
        > charset = token
        >
        >And "token" doesn't allow quotes. Shouldn't this be:
        >
        > charset = token | quoted-string

        Well, it doesn't point explicitly to the value, thus:
        value = charset | token | quoted-string

        Something like this would then have been in the spec

        I expect is to be all right what you do.

        >
        >or else, doesn't the spec disallow quotes around charset
        >values? Or should section 3.4 not offer a BNF for charset
        >at all in which case it would be clear that it's just
        >another parameter and therefore the value is token or
        >quoted-string? Or, at least, section 3.4 should say that
        >this BNF is semantic and that quotes around token are used
        >to delimit the parameter (see below).
        >
        >If you're trying to figure out what the spec says for
        >charset values, and you turn to section 3.4 since it defines
        >charsets, in it's current form, you get a very different
        >notion of what's allowed then I think is intended.
        >
        >Howard
        >
        >
        >MIME's view of things, as best as I can find, is RFC 2045 section 5.1:
        >
        > Note that the value of a quoted string parameter does not
        >include the
        > quotes. That is, the quotation marks in a quoted-string are not a
        > part of the value of the parameter, but are merely used to delimit
        > that parameter value. In addition, comments are allowed in
        > accordance with RFC 822 rules for structured header fields.
        > Thus the
        > following two forms
        >
        > Content-type: text/plain; charset=us-ascii (Plain text)
        >
        > Content-type: text/plain; charset="us-ascii"
        >
        > are completely equivalent.
        >

        HTTP has much of it's design from MIME, probably you can use the
        quoted-string, and it's compliant withe spec.

        However, I don't know if client implementation support it, but I expect they
        will, through I'm not sure, nor have any possibility to test this. This is
        actually the issue with things like this.

        The only server I found: HEAD http://www.freebsd.com/ HTTP/1.1
        returned the value of the parameter "charset" without quotes.
        I would recommend to simply not use them, just in case...



        - Joris
      • Melman, Howard
        ... I agree. But I think the spec could be clearer in two ways. I think the ABNF should be removed from 3.4 (since it s not intended) and I think the word
        Message 3 of 7 , Feb 8, 2001
        • 0 Attachment
          On Wednesday Feb 7, 2001, Paul Leach wrote:

          > I believe that the word token is being used two different ways.
          > This way:
          > value = token | quoted-string
          > and in this sentence:
          > HTTP character sets are identified by case-insensitive tokens.
          >
          > The first one is a formal ABNF definition, the second is not. I.e.,
          > there was no intent to say that char-set IDs have to be "tokens" as that
          > is specified in the HTTP ABNF.
          >
          > I would say that it is perfectly legal for them to be quoted.

          I agree. But I think the spec could be clearer in two ways.
          I think the ABNF should be removed from 3.4 (since it's not
          intended) and I think the word "token" should be replaced
          with the word "name" in all cases in section 3.4. The IANA
          does not refer to charset tokens, but rather charset names.
          See RFC 1700 or
          http://www.isi.edu/in-notes/iana/assignments/character-sets

          Howard
        • Larry Masinter
          Hopefully this will get on the errata page...
          Message 4 of 7 , Feb 9, 2001
          • 0 Attachment
            Hopefully this will get on the "errata" page...

            > -----Original Message-----
            > From: Melman, Howard [mailto:Howard@...]
            > Sent: Thursday, February 08, 2001 10:49 AM
            > To: HTTP WG
            > Cc: Joris Dobbelsteen; Paul Leach; Melman, Howard
            > Subject: RE: can charsets be quoted.
            >
            >
            >
            > On Wednesday Feb 7, 2001, Paul Leach wrote:
            >
            > > I believe that the word token is being used two different ways.
            > > This way:
            > > value = token | quoted-string
            > > and in this sentence:
            > > HTTP character sets are identified by case-insensitive tokens.
            > >
            > > The first one is a formal ABNF definition, the second is not. I.e.,
            > > there was no intent to say that char-set IDs have to be "tokens" as that
            > > is specified in the HTTP ABNF.
            > >
            > > I would say that it is perfectly legal for them to be quoted.
            >
            > I agree. But I think the spec could be clearer in two ways.
            > I think the ABNF should be removed from 3.4 (since it's not
            > intended) and I think the word "token" should be replaced
            > with the word "name" in all cases in section 3.4. The IANA
            > does not refer to charset tokens, but rather charset names.
            > See RFC 1700 or
            > http://www.isi.edu/in-notes/iana/assignments/character-sets
            >
            > Howard
            >
          • Roy T. Fielding
            ... Why? The spec is correct. It takes a great deal of imagination to believe that the use of the word token in the text should somehow imply that the HTTP
            Message 5 of 7 , Feb 9, 2001
            • 0 Attachment
              > Hopefully this will get on the "errata" page...

              Why? The spec is correct. It takes a great deal of imagination
              to believe that the use of the word token in the text should somehow imply
              that the HTTP syntax excludes a quoted-string. Any token can appear inside
              a quoted string.

              ....Roy
            • Melman, Howard
              ... Perhaps, but it s not just the word token in text. There seems to be an ABNF rule in the section which as near as I can tell adds no value to the
              Message 6 of 7 , Feb 9, 2001
              • 0 Attachment
                On Friday Feb 9, 2001, Roy T. Fielding wrote:

                > > Hopefully this will get on the "errata" page...
                >
                > Why? The spec is correct. It takes a great deal of imagination
                > to believe that the use of the word token in the text should somehow imply
                > that the HTTP syntax excludes a quoted-string. Any token can appear inside
                > a quoted string.

                Perhaps, but it's not just the word "token" in text. There
                seems to be an ABNF rule in the section which as near as I
                can tell adds no value to the description and does add
                confusion. Below is the text of section 3.4:

                Howard

                ================================================================
                3.4 Character Sets

                HTTP uses the same definition of the term "character set" as that
                described for MIME:

                The term "character set" is used in this document to refer to a
                method used with one or more tables to convert a sequence of octets
                into a sequence of characters. Note that unconditional conversion in
                the other direction is not required, in that not all characters may
                be available in a given character set and a character set may provide
                more than one sequence of octets to represent a particular character.
                This definition is intended to allow various kinds of character
                encoding, from simple single-table mappings such as US-ASCII to
                complex table switching methods such as those that use ISO-2022's
                techniques. However, the definition associated with a MIME character
                set name MUST fully specify the mapping to be performed from octets
                to characters. In particular, use of external profiling information
                to determine the exact mapping is not permitted.

                Note: This use of the term "character set" is more commonly
                referred to as a "character encoding." However, since HTTP and
                MIME share the same registry, it is important that the terminology
                also be shared.

                HTTP character sets are identified by case-insensitive tokens. The
                complete set of tokens is defined by the IANA Character Set registry
                [19].

                charset = token

                Although HTTP allows an arbitrary token to be used as a charset
                value, any token that has a predefined value within the IANA
                Character Set registry [19] MUST represent the character set defined
                by that registry. Applications SHOULD limit their use of character
                sets to those defined by the IANA registry.

                Implementors should be aware of IETF character set requirements [38]
                [41].
              • Larry Masinter
                Although HTTP allows an arbitrary token to be used as a charset value It would be useful to clarify that HTTP uses charset in two contexts: within an
                Message 7 of 7 , Feb 12, 2001
                • 0 Attachment
                  "Although HTTP allows an arbitrary token to be used as a charset value"

                  It would be useful to clarify that HTTP uses charset in two contexts:
                  within an Accept-Charset request header (in which the charset value
                  is an unquoted token) and as the value of a parameter in a Content-type
                  header (within a request or response), in which case the parameter
                  value of the charset parameter may be quoted.

                  Larry
                  --
                  http://larry.masinter.net
                Your message has been successfully submitted and would be delivered to recipients shortly.