Loading ...
Sorry, an error occurred while loading the content.

Sorting the keys of formatted (pretty printed) JSON output

Expand Messages
  • Arthur Blake
    I have written a lot of applications that make heavy use of JSON data. In fact, I often store large amounts of JSON data in configuration files instead of
    Message 1 of 7 , Apr 9 5:04 PM
    View Source
    • 0 Attachment
      I have written a lot of applications that make heavy use of JSON data.
      In fact, I often store large amounts of JSON data in configuration
      files instead of using XML.

      When I have machine generated JSON data that is stored in files, I
      like to have the JSON data be sorted alphabetically by the keys in the
      output.

      It's much easier to find what you are looking for quickly this way
      when viewing or editing the data (especially when the amount of data
      is large.)

      Because there is no defined order of the keys in JSON, it doesn't
      matter what order you put them in. So we might as well put it in an
      order that makes it more human readable in contexts where the data is
      meant to be viewed or edited by humans.

      This also aids greatly for debugging JSON-RPC transmissions by
      including logging statements on the server side that displays the JSON
      data both pretty printed and with the keys sorted alphabetically (we
      do this in the jabsorb library but only when logging at DEBUG level.)

      If anyone wants to make use of this, I've added this additional option
      to the main org.json Java library published on json.org.

      You can view my modified version of the library which is checked in
      here: http://svn.jabsorb.org/websvn/jabsorb/trunk/src/org/json/?rev=0&sc=0

      (this is a copy of the Java files from org.json with minor
      modifications to JSONObject.java and JSONArray.java, everything else
      is stock code from json.org)

      A new overloaded toString method takes an additional parameter "sort",
      a boolean which if true will sort all the keys in the pretty printed
      output, recursively.

      I hope that Doug will consider adding this change to the main org.json
      library because I think many others could use and benefit from the
      change.


      Regards

      Arthur Blake
    • Tatu Saloranta
      One question: wouldn t it be easier to just use TreeMap for storing fields, instead of HashMap, if sorting is desired? -+ Tatu +-
      Message 2 of 7 , Apr 9 9:29 PM
      View Source
      • 0 Attachment
        One question: wouldn't it be easier to just use TreeMap for storing
        fields, instead of HashMap, if sorting is desired?

        -+ Tatu +-

        On Wed, Apr 9, 2008 at 5:04 PM, Arthur Blake <arthur.blake@...> wrote:
        >
        >
        >
        >
        >
        >
        > I have written a lot of applications that make heavy use of JSON data.
        > In fact, I often store large amounts of JSON data in configuration
        > files instead of using XML.
        >
        > When I have machine generated JSON data that is stored in files, I
        > like to have the JSON data be sorted alphabetically by the keys in the
        > output.
        >
        > It's much easier to find what you are looking for quickly this way
        > when viewing or editing the data (especially when the amount of data
        > is large.)
        >
        > Because there is no defined order of the keys in JSON, it doesn't
        > matter what order you put them in. So we might as well put it in an
        > order that makes it more human readable in contexts where the data is
        > meant to be viewed or edited by humans.
        >
        > This also aids greatly for debugging JSON-RPC transmissions by
        > including logging statements on the server side that displays the JSON
        > data both pretty printed and with the keys sorted alphabetically (we
        > do this in the jabsorb library but only when logging at DEBUG level.)
        >
        > If anyone wants to make use of this, I've added this additional option
        > to the main org.json Java library published on json.org.
        >
        > You can view my modified version of the library which is checked in
        > here: http://svn.jabsorb.org/websvn/jabsorb/trunk/src/org/json/?rev=0&sc=0
        >
        > (this is a copy of the Java files from org.json with minor
        > modifications to JSONObject.java and JSONArray.java, everything else
        > is stock code from json.org)
        >
        > A new overloaded toString method takes an additional parameter "sort",
        > a boolean which if true will sort all the keys in the pretty printed
        > output, recursively.
        >
        > I hope that Doug will consider adding this change to the main org.json
        > library because I think many others could use and benefit from the
        > change.
        >
        > Regards
        >
        > Arthur Blake
        >
      • Arthur Blake
        ... That was my first thought when thinking about how to do it, as well. That would make sense if you wanted to always keep and use the JSON sorted. In one of
        Message 3 of 7 , Apr 10 5:13 AM
        View Source
        • 0 Attachment
          On Thu, Apr 10, 2008 at 12:29 AM, Tatu Saloranta <tsaloranta@...> wrote:
          >
          > One question: wouldn't it be easier to just use TreeMap for storing
          > fields, instead of HashMap, if sorting is desired?
          >
          > -+ Tatu +-

          That was my first thought when thinking about how to do it, as well.

          That would make sense if you wanted to always keep and use the JSON
          sorted. In one of my main use cases (JSON-RPC debug output on the
          server side) I only want to sort the existing (non-sorted) JSON at
          print out time because the pretty printing is a special case (when
          debugging). I didn't want to transmit the JSON sorted (this would be
          an unneeded and unwanted side effect even though it is probably
          harmless.) I wanted to keep the JSON in the same format that it would
          have been in if I wasn't sorting. Thus I sort on the fly only when
          pretty printing (using a TreeSet view of the keys.)

          There would be a penalty for keeping a TreeMap versus a HashMap.
          HashMap obviously performs better (constant time put & get) for when
          you don't care about sorting. In practice I haven't noticed a speed
          difference for even very large JSON, but I'm sure it's there (TreeMap
          is O(log n) on gets and puts.) The same penalty exists for when
          sorting on the fly like I'm doing using a TreeSet, but it only comes
          in to play when pretty printing sorted (yes the downside is if you
          print the same object out multiple times, you re-sort each time.)

          It's a trade off doing it either way, and it would be quite easy to
          switch the HashMap used to a TreeMap in JSONObject if you desired to
          just always sort your JSON (or provide a sort mode switch when the
          JSONObject is created that makes it use a TreeMap versus a HashMap.)
        • Tatu Saloranta
          ... I understand this, and it s a valid point. But I was thinking more along the lines of deferring initialization of the Map, and switching between the two
          Message 4 of 7 , Apr 10 10:04 AM
          View Source
          • 0 Attachment
            On Thu, Apr 10, 2008 at 5:13 AM, Arthur Blake <arthur.blake@...> wrote:
            >
            > On Thu, Apr 10, 2008 at 12:29 AM, Tatu Saloranta <tsaloranta@...>
            > wrote:
            > >
            > > One question: wouldn't it be easier to just use TreeMap for storing
            > > fields, instead of HashMap, if sorting is desired?
            > >
            > > -+ Tatu +-
            >
            > That was my first thought when thinking about how to do it, as well.
            >
            > That would make sense if you wanted to always keep and use the JSON
            > sorted. In one of my main use cases (JSON-RPC debug output on the
            ...
            > There would be a penalty for keeping a TreeMap versus a HashMap.
            > HashMap obviously performs better (constant time put & get) for when

            I understand this, and it's a valid point. But I was thinking more
            along the lines of deferring initialization of the Map, and switching
            between the two if/as necessary. That is, it'd be a property of Json
            object, to minimize sorting overhead for cases where it's not needed.

            At any rate, I just wanted to mention it as an option, just to make
            sure it was considered. And that seems to have been the case.

            -+ Tatu +-
          • John Cowan
            ... Actually, it isn t. Even though the javadoc claims that get and put are constant time, a moment s reflection shows that this cannot be so. HashMap is
            Message 5 of 7 , Apr 10 10:57 AM
            View Source
            • 0 Attachment
              Tatu Saloranta scripsit:

              > > There would be a penalty for keeping a TreeMap versus a HashMap.
              > > HashMap obviously performs better (constant time put & get) for when
              >
              > I understand this, and it's a valid point.

              Actually, it isn't. Even though the javadoc claims that get and put
              are constant time, a moment's reflection shows that this cannot be so.

              HashMap is implemented (in OpenJDK, and almost certainly in earlier
              versions) using an array of specialized objects that implement Map.Entry.
              Each element of the array is one hash bucket, and the Entry objects form
              a linked list. So average get/put time is approximately N/c*2, where c
              is the capacity (initially 16 unless specified otherwise), which is O(N).
              The hope is that N/c*2 is always small enough that the performance looks
              like O(1), but as N grows, c is increased using a full rehash, which is
              again O(N) and is independent of c.

              So I'd say using TreeMap throughout, with its guaranteed O(log N)
              performance, is a perfectly reasonable thing to do; it makes debugging
              easier, it is prettier, and it guarantees getting the same output on
              the same input.

              --
              On the Semantic Web, it's too hard to prove John Cowan cowan@...
              you're not a dog. --Bill de hOra http://www.ccil.org/~cowan
            • kriszyp
              ... when ... Map.Entry. ... O(N). ... Not true. Full rehashes are indeed O(N), but they take place at a frequency proportional to 1/c because each time a
              Message 6 of 7 , Apr 10 11:32 AM
              View Source
              • 0 Attachment
                --- In json@yahoogroups.com, John Cowan <cowan@...> wrote:
                >
                > Tatu Saloranta scripsit:
                >
                > > > There would be a penalty for keeping a TreeMap versus a HashMap.
                > > > HashMap obviously performs better (constant time put & get) for
                when
                > >
                > > I understand this, and it's a valid point.
                >
                > Actually, it isn't. Even though the javadoc claims that get and put
                > are constant time, a moment's reflection shows that this cannot be so.
                >
                > HashMap is implemented (in OpenJDK, and almost certainly in earlier
                > versions) using an array of specialized objects that implement
                Map.Entry.
                > Each element of the array is one hash bucket, and the Entry objects form
                > a linked list. So average get/put time is approximately N/c*2, where c
                > is the capacity (initially 16 unless specified otherwise), which is
                O(N).
                > The hope is that N/c*2 is always small enough that the performance looks
                > like O(1), but as N grows, c is increased using a full rehash, which is
                > again O(N) and is independent of c.

                Not true. Full rehashes are indeed O(N), but they take place at a
                frequency proportional to 1/c because each time a rehash takes place
                the size is doubled, so the rehashes become farther apart the larger c
                gets. Therefore the average rehash time for a put operation is
                O(N*1/c) which averages to constant time. Both get and put only have
                to iterate through their own bucket, so as long as the hash method is
                reasonable, both get and put are indeed O(1), constant time operations.
                Kris
              • Douglas Crockford
                ... Excellent suggestion. I have modified JSON.Object.
                Message 7 of 7 , Apr 10 7:35 PM
                View Source
                • 0 Attachment
                  --- In json@yahoogroups.com, "Arthur Blake" <arthur.blake@...> wrote:
                  > When I have machine generated JSON data that is stored in files, I
                  > like to have the JSON data be sorted alphabetically by the keys in the
                  > output.
                  >
                  > It's much easier to find what you are looking for quickly this way
                  > when viewing or editing the data (especially when the amount of data
                  > is large.)
                  >
                  > Because there is no defined order of the keys in JSON, it doesn't
                  > matter what order you put them in. So we might as well put it in an
                  > order that makes it more human readable in contexts where the data is
                  > meant to be viewed or edited by humans.

                  Excellent suggestion. I have modified JSON.Object.
                Your message has been successfully submitted and would be delivered to recipients shortly.