Universal Binary JSON Specification
- Hey Guys,
I am currently working on what I hope to be a 1:1 binary JSON specification (no custom data types supported like BSON or BJSON, just binary representations of the core JSON spec) and would appreciate a few extra eyes on it if anyone had interest in reading through the 2nd draft:
The specification is fundamentally very simple, essentially breaking down to a single construct that is used throughout:
[1-byte ASCII marker indicating type][4-byte int32 size][binary data]
My goals for the spec have been to strike what needs to be the perfect balance between simplicity and verbosity in the binary data for the purposes of fast parsing of the binary data. I also kept a close eye on defining streaming-friendly constructs (no scanning for null-terminators) as well as removing duplication of information when unnecessary.
The spec current defines 7 data types and the 2 collection types in JSON:
The only difference from JSON being that "Number" is broken out into:
int32, int64 and double types for the purposes of making parsing of the values as efficient as possible in Java, C, C#, Python, Erlang, PHP and any other language that has multiple concepts of the different types of numbers it can represent.
Using the test-data from the popular JVM-Serializer Benchmark project (https://github.com/eishay/jvm-serializers/wiki) in Java, I show a 28% reduction in file size along with a 73% faster deserialization step as compared to Jackson's ObjectMapper.
NOTE: Jackson is currently the fastest JSON parsing library in Java, however I believe there are ways to make the ObjectMapper run faster i just haven't tuned it yet so 73% is likely not representative of final numbers.
My goals are to gun for (on average) a 30% reduction in file size and 50% faster serialization/deserialization processing.
For what it is worth, I *expect* serialization/deserialization to run on par with Java manual encoding/decoding which is labeled "java-M" in the JVM Serialization Benchmark results (3rd fastest).
Any feedback, corrections or suggestions are all appreciated. Once I get some more refinements into the spec and eyes on it to ensure there are no glaring omissions everything will be published to http://ubjson.org and cultivated for the betterment of programmers everywhere. There is nothing at the domain yet.
My wish is to contribute this back to the developer community leveraging JSON as a companion spec if the community deems it beneficial.
- On Mon, Feb 20, 2012 at 9:42 AM, rkalla123 <rkalla@...> wrote:
> Stephan,For what it is worth, I also consider support for only signed values a
> No problem; your feedback are still very applicable and much appreciated.
> The additional view-point on the signed/unsigned issue was exactly what I was hoping for. My primary goal has always been simplicity and I know at least from the Java world, going with unsigned values would have made the impl distinctly *not* simple (and an annoying API).
> So I am glad to get some validation there that I am not alienating every other language at the cost of Java.
-+ Tatu +-