Re: Universal Binary JSON Specification
- I was hoping to get some feedback on a few changes I have planned for the Universal Binary JSON Specification (http://ubjson.org) Draft 9 before I made them official.
The group's help has been critical thus far in shaping the spec and I would certainly appreciate the guidance once again.
The spec is currently at Draft 8 which I believe you guys have seen; the last additions being the compact STRING, HUGE, ARRAY and OBJECT types (using 1-byte lengths intead of 4-byte) in addition to the streaming-support for unbounded containers.
Coming from that current place, I am working on the following changes that I would like feedback on:
1. Remove the concept of ARRAY or OBJECT (container types) having a length argument at all and simply define the two container types with a beginning [A] or [O] and an ending [E] marker currently used for "unbounded containers".
Alex, the author behind the simpleubjson implementation, pointed out that since the container length argument doesn't convey any useful information besides tracking the scope, it was redundant to support both the [E]-terminated containers and containers with a child element count.
I should have removed the length argument from the container types in Draft 8 when I added streaming support, but it didn't dawn on me at the time.
2. As a result of #1, remove the compact ARRAY and OBJECT representations ([a] and [o] lowercased markers) - they are unnecessary now.
3. There are 2 variable-length data types in UBJSON that define lengths: STRING and HUGE.
Currently the lengths are defined as either a int8 or an int32 value depending on the type-marker used (so there are duplicate definitions for STRING and HUGE just like there were for ARRAY and OBJECT).
The change I am proposing here for both clarity and implementation simplicity/normalization is to make the 'length' argument of the STRING and HUGE types one of the Universal Binary JSON integer numeric types: int8, int16 or int32.
So instead of:
you would have:
[S][i][7,168 bytes representing a string...]
The cost is the added byte, but the win is support for a length between 1-byte (int8) and 2.1 billion bytes (int32) as well as spec and implementation simplification which I think are big wins.
There is some contention here though with the existing numeric types that I'd like some feedback on to make sure I am not talking to myself inside an echo-chamber here, namely:
* no int64 length support, (REASON), not every platform plays nice with 64-bit. The lack of universal 64-bit number support was exactly what brought about the creation of the HUGE type. A value that is not universally supported cannot be part of the core spec such that some platforms cannot decode the format contents correctly. (WORKAROUND) just break the data payload into an array of multiple STRING or HUGE's.
* signed length values, (REASON), numeric types in UBJSON are all signed. This makes working with them in languages like Java straight forward and easy to grasp. It also makes the APIs straight forward for any libraries implementing it. Trying to support unsigned values gets you into a work of pain where UBJSON String's can actually be 4GB runs of characters but Java's String as well as Java's arrays can only be signed 32-bit int in size. (WORKAROUND) same as above, break the payload into 2GB chunks.
I am aware of these limitations and do not think they are show-stoppers at the cost of making changes that are currently so simple to support on all platforms but I wanted to get some verification from other smart people incase I missed the boat here on something.
4. As a result of #3, remove the compact STRING and HUGE representations ([s] and [h] lowercased markers) - they are unnecessary now.
These are the changes currently being analyzed right now for the Draft 9 spec. I think the re-simplification to the spec (after its growth between Draft 4 and Draft 8) are big wins here with minimal changes to existing implementations.
I think the re-use of the numeric types as lengths are a big win for implementation logic as well. I know in the Java API it would remove duplication complexity for me as well as the Python impl which would be nice.
Thank you all for your time, I appreciate it.
P.S.> If you have OT suggestions for the spec I am eager to hear them here or you can email me at ubjson@... if you'd rather talk privately.
- On Mon, Feb 20, 2012 at 9:42 AM, rkalla123 <rkalla@...> wrote:
> Stephan,For what it is worth, I also consider support for only signed values a
> No problem; your feedback are still very applicable and much appreciated.
> The additional view-point on the signed/unsigned issue was exactly what I was hoping for. My primary goal has always been simplicity and I know at least from the Java world, going with unsigned values would have made the impl distinctly *not* simple (and an annoying API).
> So I am glad to get some validation there that I am not alienating every other language at the cost of Java.
-+ Tatu +-