Loading ...
Sorry, an error occurred while loading the content.

Re: [json] string vs. Unicode objects; json-py

Expand Messages
  • Jim Washington
    ... Hi, Peter You have made an interesting case for the idea that strings read from JSON in python should be python unicode objects. For the time being, I
    Message 1 of 2 , Oct 6, 2005
    • 0 Attachment
      Peter Ring wrote:

      >Is there a way to ensure that key names and string values are
      >represented as unicode objects by a JSON reader?
      >
      >Specifically, I'm using json-py in a context where key names and string
      >values from JSON files will be used in an XML application. Using
      >json-py, JSON strings arbitrarily become Python strings (with an implied
      >utf-8 encoding) or Python Unicode objects. A little demonstration (if
      >there's some garbage in the example, it's supposed to be euro signs):
      >
      ># -*- coding: utf-8 -*-
      >import json
      >
      >print json.read("""{
      > "ACCESS": [ "Access", "€cess", "\u20access" ],
      > "€CESS": [ "Access", "€cess", "\u20access" ],
      > "\u20acCESS": [ "Access", "€cess", "\u20access" ]
      > }""")
      >
      >Depending on the occurrence of '\uxxxx' characters in key names or
      >strings, the reader creates plain ol' strings (implied utf-8) or Unicode
      >objects:
      >
      >{'ACCESS': ['Access', '\xe2\x82\xaccess', u'\u20access'],
      >'\xe2\x82\xacCESS': ['Access', '\xe2\x82\xaccess', u'\u20access'],
      >u'\u20acCESS': ['Access', '\xe2\x82\xaccess', u'\u20access']}
      >
      >This is a PITA:
      >- The keys '\xe2\x82\xacCESS' and u'\u20acCESS' differ, though the
      >character values (Unicode code points) are identical; one is a (implied
      >utf-8 encoded) Python string, the other a Python Unicode object.
      >- In the application, the keys and string values will be consumed by XML
      >routines that live and breathe Unicode. Encoding issues should be
      >handled at the border to the system.
      >
      >Should I patch json-py to emit strings only as Unicode objects? Why not
      >always use Unicode objects?
      >
      >
      >
      Hi, Peter

      You have made an interesting case for the idea that strings read from
      JSON in python should be python unicode objects.

      For the time being, I think your plan for patching json-py to do what
      you need is the right answer for your situation, particularly if your
      project is time sensitive. I do not have not much say-so on json-py,
      but minjson.py is mine, and I think your solution is worth looking into,
      and might just be part of the proper answer to the question of "handling
      unicode."

      Thanks,

      -Jim Washington
    Your message has been successfully submitted and would be delivered to recipients shortly.