Loading ...
Sorry, an error occurred while loading the content.

Re: [caplet] Am I paranoid enough?

Expand Messages
  • Mike Samuel
    ... Isn t it the reflection of fffe, the byte-order-marker. This is probably a very minor issue, but if one part of a parser naively delegates to another
    Message 1 of 7 , Feb 17, 2009
    • 0 Attachment
      2009/2/17 David-Sarah Hopwood <david.hopwood@...>:
      > Mike Samuel wrote:
      >> 2009/2/16 David-Sarah Hopwood <david.hopwood@...>
      >>> Suppose that S is a Unicode string in which each character matches
      >>> ValidChar below, not containing the subsequences "<!", "</" or "]]>", and
      >>> not containing ("&" followed by a character not matching AmpFollower).
      >>> S encodes a syntactically correct ES3 or ES3.1 source text chosen by
      >>> an attacker.
      >>>
      >>> ValidChar :: one of
      >>> '\u0009' '\u000A' '\u000D' // TAB, LF, CR
      >>> [\u0020-\u007E]
      >>> [\u00A0-\u00AC]
      >>> [\u00AE-\u05FF]
      >>> [\u0604-\u06DC]
      >>> [\u06DE-\u070E]
      >>> [\u0710-\u17B3]
      >>> [\u17B6-\u200A]
      >>> [\u2010-\u2027]
      >>> [\u202F-\u205F]
      >>> [\u2070-\uD7FF]
      >>
      >> So no surrogates?
      >
      > Correct. They're not characters (or even "noncharacters").
      >
      >>> [\uE000-\uFDCF]
      >>> [\uFDF0-\uFEFE]
      >>> [\uFF00-\uFFEF]
      >>
      >> Why include FFEF?
      >
      > It's unassigned, and there's no particular reason to exclude it.
      > (\uFFF0-\uFFF8 are also unassigned, but that's an area reserved
      > for "special" characters.)

      Isn't it the reflection of fffe, the byte-order-marker.
      This is probably a very minor issue, but if one part of a parser
      naively delegates to another parser that mistakenly treats its content
      as a byte string instead of code units, the presence of a BOM might
      cause the delegatee to misinterpret content when something that looks
      like a BOM appears at the beginning of a chunk of embedded language.


      >>> AmpFollower :: one of
      >>> '=' '(' '+' '-' '!' '~' '"' '/' [0-9]
      >>> '\u0027' '\u005C' '\u0020' '\u0009' '\u000A' \u000D'
      >>> // single quote, backslash, space, TAB, LF, CR
      >>>
      >>> (ValidChar excludes format control characters, and some other
      >>> characters known to be mishandled by browsers. AmpFollower is
      >>> intended to exclude characters that can start an entity reference.)
      >>>
      >>> S is inserted between "<script>" and "</script>" in a place where a
      >>> <script> tag is allowed in an otherwise valid HTML document, or
      >>> between "<script><![CDATA[" and "]]></script>" in a place where a
      >>> <script> tag is allowed in an otherwise valid XHTML document.
      >>> The HTML or XHTML document starts with a correct <!DOCTYPE or
      >>> <?xml declaration respectively, and is encoded as well-formed
      >>> UTF-8.
      >>>
      >>> Are these restrictions sufficient to ensure that the embedded
      >>> script is interpreted as it would have been if referenced from
      >>> an external file, foiling any attempts of browsers to collude
      >>> with the attacker in misparsing it?
      >>
      >> You may still be subject to encoding attacks. I'm sure there are
      >> valid scripts that look like UTF-7, so if the script appears in the
      >> first 1024B, you might need to make sure it's preceded by a <meta>
      >> element specifying an encoding, and/or use the XML prologue form that
      >> specifies an encoding.
      >
      > Right; I covered that in a follow-up. Is including a UTF-8 BOM at the
      > start sufficient for all browsers (that is, are there any browsers
      > in which a <meta> tag or other content sniffing can override an
      > explicit initial UTF-8 BOM, in either HTML or XHTML)?

      Ah cool. I don't know the answer to that question.


      > HTML5 section 8.2.2.1 seems to indicate that "if the transport layer
      > specifies an encoding" (i.e. presumably the charset specified in
      > a Content-Type header), then that should override a BOM. That's
      > irritating, because it means that you have to assume that the server
      > gets the Content-Type right, *as well as* including a BOM for the
      > browsers in which Content-Type doesn't override sniffing
      > (Internet Explorer, at least), and for the case where the document
      > is read from a local file.

      Yeah. I think the best thing to do is to use a fairly standard
      encoding like UTF-8, and make sure the XML prologue, <meta
      http-equiv="content-type">, and headers all agree.

      I don't think that you can do much about file hosting services that go
      out of their way to specify a whacky encoding. Putting a BOM at the
      front will help hosting services that make a genuine effort.


      > --
      > David-Sarah Hopwood ⚥
      >
      >
    • David-Sarah Hopwood
      ... [...] ... No, uFEFF is the BOM, and its byte-reflection uFFFE is a noncharacter, so already excluded from ValidChar. (Thought you d spotted something I d
      Message 2 of 7 , Feb 18, 2009
      • 0 Attachment
        Mike Samuel wrote:
        > 2009/2/17 David-Sarah Hopwood <david.hopwood@...>:
        >> Mike Samuel wrote:
        >>> 2009/2/16 David-Sarah Hopwood <david.hopwood@...>
        >>>> ValidChar :: one of
        [...]
        >>>> [\uFF00-\uFFEF]
        >>> Why include FFEF?
        >> It's unassigned, and there's no particular reason to exclude it.
        >> (\uFFF0-\uFFF8 are also unassigned, but that's an area reserved
        >> for "special" characters.)
        >
        > Isn't it the reflection of fffe, the byte-order-marker.

        No, \uFEFF is the BOM, and its byte-reflection \uFFFE is a noncharacter,
        so already excluded from ValidChar.

        (Thought you'd spotted something I'd missed for a second, there.)

        --
        David-Sarah Hopwood ⚥
      • Mike Samuel
        ... Ah, quite right.
        Message 3 of 7 , Feb 18, 2009
        • 0 Attachment
          2009/2/18 David-Sarah Hopwood <david.hopwood@...>:
          > Mike Samuel wrote:
          >> 2009/2/17 David-Sarah Hopwood <david.hopwood@...>:
          >>> Mike Samuel wrote:
          >>>> 2009/2/16 David-Sarah Hopwood <david.hopwood@...>
          >>>>> ValidChar :: one of
          > [...]
          >>>>> [\uFF00-\uFFEF]
          >>>> Why include FFEF?
          >>> It's unassigned, and there's no particular reason to exclude it.
          >>> (\uFFF0-\uFFF8 are also unassigned, but that's an area reserved
          >>> for "special" characters.)
          >>
          >> Isn't it the reflection of fffe, the byte-order-marker.
          >
          > No, \uFEFF is the BOM, and its byte-reflection \uFFFE is a noncharacter,
          > so already excluded from ValidChar.

          Ah, quite right.

          > (Thought you'd spotted something I'd missed for a second, there.)
          >
          > --
          > David-Sarah Hopwood ⚥
        Your message has been successfully submitted and would be delivered to recipients shortly.