Loading ...
Sorry, an error occurred while loading the content.

1601Re: [json] Re: JSON and the Unicode Standard

Expand Messages
  • Tatu Saloranta
    Feb 26, 2011
    • 0 Attachment
      On Fri, Feb 25, 2011 at 8:01 PM, johne_ganz <john.engelhart@...> wrote:
      > --- In json@yahoogroups.com, Tatu Saloranta <tsaloranta@...> wrote:
      > I have not seen a JSON implementation / parser that does such normalization.
      > On the other hand, I very strongly suspect that whether or not such normalization is taking place is not up to the writer of that parser.  In


      > my particular case (JSONKit, for Objective-C), I pass the parsed JSON String to the NSString class to instantiate an object.
      > I have ZERO control over what and how NSString interprets or manipulates the parsed JSON String that finally becomes the instantiated object that ostensibly the same as the original JSON String used to create it.  It could be that NSString decides that the instantiated object is
      > always converted to its precomposed form.  Objective-C is flexible enough where someone might decide to swizzle in some logic at run time that forces all strings to be precomposed before being handed off to the main NSString instantiation method.

      Ok. But in this case, would JSON specification itself help a lot? I
      understand that this is problematic, in that different platforms can
      choose different default (and possible opaque dealing).

      > I don't have a particular opinion on the matter one way or the other other than to highlight the point that in many practical, real-world situations, whether or not such things take place may not be under the control of the JSON parser.
      > I also suspect that it's one of those things that most people haven't really given a whole lot of consideration to- they just had the parsed string over to "the Unicode string handling code", and that's that. Most people may not realize that such string handling code may subtly alter the original Unicode text as a result (ala precomposing the string).

      Right. And if specification says nothing, it can uncover real
      complexities and ambiguities.

      >> to tackle such complexity). While it would seem wrong to punt the
      >> issue, there is the practical question of whether full solution would
      >> matter.
      > I can guarantee you that the practical question of whether a full solution would matter will be answered the first time someone exploits it in a security vulnerable way that results in a major security fiasco.

      I would be interested in how you would see this leading to security
      issues, outside of problems specific String handling on platforms has.
      Or are you equally concerned in general about parser implementation
      quality (which is understandable), above and beyond question of what
      JSON specification says? At least to me it would seem more likely that
      issues would be outside of realm of core specification itself.

      > Then it will be with 20/20 hindsight, and the question will be "Why didn't anyone address (this behavior) that allowed two keys that were not bit for bit identical, but became identical after converting them to their precomposed form, and the security checks allowed the
      > decomposed form through because it assumed that everything was in precomposed form?"

      I can see how this can be problematic from side of applications that
      make assumptions on uniqueness. And also that it is important that
      parsers will clearly define how they handle things -- not all parsers
      necessarily even check for uniqueness for same byte patterns, much
      less for normalization (and I think this is even allowed by the spec,
      i.e. uniqueness checks are not mandated).

      So in a way, it would be useful to have bit more concrete examples of
      known practical issues. Links below may give some insight -- but it
      would seem that they are typically platform specific. Which makes it
      even harder to find shared solutions, or to recommend best practices.

      > Unfortunately, the use of Unicode coupled with the fact that most JSON implementations are dependent on external code for their Unicode support means that this is an extremely non-trivial issue.  I can't think of a simple solution to the problem at the moment, other than it exists.
      > You really ought to read:
      > http://www.unicode.org/faq/security.html
      > http://www.unicode.org/reports/tr36/#Canonical_Represenation
      > Microsoft Security Bulletin (MS00-078): Patch Available for 'Web Server Folder Traversal' Vulnerability (http://www.microsoft.com/technet/security/bulletin/MS00-078.mspx, http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2000-0884)
      > Creating Arbitrary Shellcode In Unicode Expanded Strings (http://www.net-security.org/article.php?id=144)
      > There's a long history of "Those little Unicode details aren't really important" causing huge security problems later on.

      Thank you. While I had heard about issues with request to
      non-canonical UTF-8 code sequences (which were discussed to have such
      issues), I admit I had not heard much about issue regarding

      -+ Tatu +-
    • Show all 35 messages in this topic