Re: Universal Binary JSON Specification
The reasons I specified UTF-8 were (1) to be consistent with the rest of the spec (e.g. you don't have to change your thinking just because you are parsing a specific construct type) and (2) UTF-8 stores ASCII characters as 1-byte-per-char, so we aren't wasting any more space than ASCII encoding in this case (as you pointed out, the required chars are all < 127)
As for the encoding you mentioned you are absolutely right something custom would be much more compressed, but that goes against the first rule of fig... binary spec :) (simplicity)
I am really trying to nail that middle group right between "this binary spec is so complex, but optimized, that I question my own existence" and "this text format is so slow to convert that I want to take up crab fishing".
--- In email@example.com, Patrick Maupin <pmaupin@...> wrote:
> I like the idea of "encoded generic number". But one of the purposes
> of a binary encoding is space savings, so I don't think unicode fits
> into that framework.
> If I understand correctly, you only need to encode the 10 digits, and
> "+", "-", "e", "E", and "."
> That's only 15 items, and you can pack 16 items into every 4 bits.
> Leaves you one item left over for an end marker.
> You could easily pack two characters into every byte, and could define
> 0 to be the end marker, and define that if there are an odd number of
> characters, then the number will be followed by 3 nybbles of 0, and if
> there is an even number of characters, then the number will be
> followed by 2 nybbles of 0. In that case, the end of number marker
> will always be a byte of zero, which you can always scan for with any
> C library.
- On Mon, Feb 20, 2012 at 9:42 AM, rkalla123 <rkalla@...> wrote:
> Stephan,For what it is worth, I also consider support for only signed values a
> No problem; your feedback are still very applicable and much appreciated.
> The additional view-point on the signed/unsigned issue was exactly what I was hoping for. My primary goal has always been simplicity and I know at least from the Java world, going with unsigned values would have made the impl distinctly *not* simple (and an annoying API).
> So I am glad to get some validation there that I am not alienating every other language at the cost of Java.
-+ Tatu +-