Loading ...
Sorry, an error occurred while loading the content.

string vs. Unicode objects; json-py

Expand Messages
  • Peter Ring
    Is there a way to ensure that key names and string values are represented as unicode objects by a JSON reader? Specifically, I m using json-py in a context
    Message 1 of 2 , Oct 6, 2005
    • 0 Attachment
      Is there a way to ensure that key names and string values are
      represented as unicode objects by a JSON reader?

      Specifically, I'm using json-py in a context where key names and string
      values from JSON files will be used in an XML application. Using
      json-py, JSON strings arbitrarily become Python strings (with an implied
      utf-8 encoding) or Python Unicode objects. A little demonstration (if
      there's some garbage in the example, it's supposed to be euro signs):

      # -*- coding: utf-8 -*-
      import json

      print json.read("""{
      "ACCESS": [ "Access", "€cess", "\u20access" ],
      "€CESS": [ "Access", "€cess", "\u20access" ],
      "\u20acCESS": [ "Access", "€cess", "\u20access" ]
      }""")

      Depending on the occurrence of '\uxxxx' characters in key names or
      strings, the reader creates plain ol' strings (implied utf-8) or Unicode
      objects:

      {'ACCESS': ['Access', '\xe2\x82\xaccess', u'\u20access'],
      '\xe2\x82\xacCESS': ['Access', '\xe2\x82\xaccess', u'\u20access'],
      u'\u20acCESS': ['Access', '\xe2\x82\xaccess', u'\u20access']}

      This is a PITA:
      - The keys '\xe2\x82\xacCESS' and u'\u20acCESS' differ, though the
      character values (Unicode code points) are identical; one is a (implied
      utf-8 encoded) Python string, the other a Python Unicode object.
      - In the application, the keys and string values will be consumed by XML
      routines that live and breathe Unicode. Encoding issues should be
      handled at the border to the system.

      Should I patch json-py to emit strings only as Unicode objects? Why not
      always use Unicode objects?

      Kind regards
      Peter Ring
    • Jim Washington
      ... Hi, Peter You have made an interesting case for the idea that strings read from JSON in python should be python unicode objects. For the time being, I
      Message 2 of 2 , Oct 6, 2005
      • 0 Attachment
        Peter Ring wrote:

        >Is there a way to ensure that key names and string values are
        >represented as unicode objects by a JSON reader?
        >
        >Specifically, I'm using json-py in a context where key names and string
        >values from JSON files will be used in an XML application. Using
        >json-py, JSON strings arbitrarily become Python strings (with an implied
        >utf-8 encoding) or Python Unicode objects. A little demonstration (if
        >there's some garbage in the example, it's supposed to be euro signs):
        >
        ># -*- coding: utf-8 -*-
        >import json
        >
        >print json.read("""{
        > "ACCESS": [ "Access", "€cess", "\u20access" ],
        > "€CESS": [ "Access", "€cess", "\u20access" ],
        > "\u20acCESS": [ "Access", "€cess", "\u20access" ]
        > }""")
        >
        >Depending on the occurrence of '\uxxxx' characters in key names or
        >strings, the reader creates plain ol' strings (implied utf-8) or Unicode
        >objects:
        >
        >{'ACCESS': ['Access', '\xe2\x82\xaccess', u'\u20access'],
        >'\xe2\x82\xacCESS': ['Access', '\xe2\x82\xaccess', u'\u20access'],
        >u'\u20acCESS': ['Access', '\xe2\x82\xaccess', u'\u20access']}
        >
        >This is a PITA:
        >- The keys '\xe2\x82\xacCESS' and u'\u20acCESS' differ, though the
        >character values (Unicode code points) are identical; one is a (implied
        >utf-8 encoded) Python string, the other a Python Unicode object.
        >- In the application, the keys and string values will be consumed by XML
        >routines that live and breathe Unicode. Encoding issues should be
        >handled at the border to the system.
        >
        >Should I patch json-py to emit strings only as Unicode objects? Why not
        >always use Unicode objects?
        >
        >
        >
        Hi, Peter

        You have made an interesting case for the idea that strings read from
        JSON in python should be python unicode objects.

        For the time being, I think your plan for patching json-py to do what
        you need is the right answer for your situation, particularly if your
        project is time sensitive. I do not have not much say-so on json-py,
        but minjson.py is mine, and I think your solution is worth looking into,
        and might just be part of the proper answer to the question of "handling
        unicode."

        Thanks,

        -Jim Washington
      Your message has been successfully submitted and would be delivered to recipients shortly.