Re: [json] Re: Universal Binary JSON Specification
- Hey all,
It might be worth it, to take a peek at the AMF0 and AMF3 protocol. The
AM3 protocol makes a distinction between integer and number data. Taken
from http://osflash.org/documentation/amf3 :
Integer-data is probably the single most used item inAMF3. To save space
it is an integer that can be 1-4 bytes long. The first bit of the first
three bytes determine if the next byte is included (1) in this
integer-data or not (0). The last byte, if present, is read completely
(8 bits). The first bits are then removed from the first three bytes and
the remaining bits concatenated to form a big-endian integer.
The integer has a maximum of 29 bits (3*7+8) and a value range of
-268435456(int.MIN_VALUE»3) to 268435455(int.MAX_VALUE»3).
The integer is negative if it is the full 29 bits long and the first bit
is set (1). This usestwo's complementnotation and is therefore identical
to normal signed integer behaviour. So if you read the integer into a 32
bit integer, all you will need to do is extend the sign
0011 0101 = 53
1000 0001 0101 0100 = 212
1000 0110 1100 1010 0011 1111 = 107839
1111 1111 1111 1111 1111 1111 1111 1111 = -1
1100 0001 1111 1111 1111 1111 1111 1111 = -268435456
1100 0000 1000 0001 1000 0001 1000 0000 = 268435455/
On 22-9-2011 16:33, Don Owens wrote:
> I forgot to add that encoders should only use the big number format if the
> number is too big to fit in int64 (or int32, depending on which will
> be the
> largest in the spec) or a double. That way, if a decoder can't handle a
> number larger than int64 anyway, it does not need to implement decoding of
> big numbers -- you don't want a number that will fit in an int32 put
> into a
> big number format anyway.
> On Thu, Sep 22, 2011 at 7:15 AM, Don Owens <don@...
> <mailto:don%40regexguy.com>> wrote:
> > Yes, that is what I was getting at. But see comments embedded.
> > On Wed, Sep 21, 2011 at 7:50 PM, rkalla123 <rkalla@...
> <mailto:rkalla%40gmail.com>> wrote:
> >> **
> >> Don,
> >> I see your point. The way I understand it is that this would
> require 2 new
> >> data types, effectively BigInt and BigDecimal.
> >> So say something along these lines:
> >> bigint - marker 'G'
> >> [G][129 big-endian ordered bytes representing a BigInt]
> >> It should be mentioned that they are signed ints, but doing two's
> > complement and such is probably too much work. Maybe just specify
> that the
> > first bit always represents the sign (0 for no sign, 1 or minus).
> >> bigdouble - marker 'W'
> >> [W][222 big-endian ordered bytes representing a BigDecimal]
> > BigDecimal should probably be renamed to something like BigFloat, since
> > decimal is ambiguous (used to mean base-10 and floating point). I'm less
> > familiar with large floating point, but I think a floating point number
> > should consist of a sign bit plus two integers (one for the
> > mantissa/significand and one for the exponent). In the interest of space
> > savings, I think the sign bit should just be included in the
> exponent and
> > order things so they look similar to the IEEE 754 spec, e.g.,
> > [W][3 big-endian ordered bytes (where first bit is sign bit) of
> > exponent][222 big-endian ordered bytes of mantissa]
> >> Thoughts?
> > In terms of the documentation, I think the big integers and floats
> > be qualified with a "should implement" instead of a "must
> implement", since,
> > as others have mentioned, not every encoder and decoder will be able to
> > handle these. I think this matches JSON implementations well. If an
> > encoder does not handle large numbers, it could just throw an error,
> just as
> > it should throw an error now if an oversized number is encountered
> in JSON.
> > The same goes for the decoder side. If there is no good way to
> represent a
> > large number in the language your are working in, throw an error
> > that the number is too large.
> > Have you looked into using variable-length integers for length
> > If you have a lot of short strings (or big numbers, etc.) in your data,
> > these could significantly reduce your space usage (at the cost of more
> > complexity for the developer and CPU). There should be a balance between
> > space efficiency and complexity. Thoughts?
> >> --- In email@example.com <mailto:json%40yahoogroups.com>, Don
> Owens <don@...> wrote:
> >> >
> >> > I've seen very large numbers used in JSON. In Perl, that can be
> >> represented
> >> > as a Math::BigInt object. And that is the way I have implemented
> it in
> >> my
> >> > JSON module for Perl (JSON::DWIW). Python has arbitrary length
> >> > built-in. For my own language that I'm working on, I'm using
> libgmp in C
> >> to
> >> > handle arbitrary length integers.
> >> >
> >> > JSON is used as a data exchange format. I want to be able to do a
> >> > roundtrip, e.g., Python -> encoded -> Python with native integers
> >> > arbitrary length in this case). In JSON, this just works, as far
> as the
> >> > encoding is concerned. I see the need for this in any binary JSON
> >> as
> >> > well. If a large number is represented as a string, then on the
> >> > side, you don't know if that was a number or a string (just
> because it
> >> looks
> >> > like a number doesn't mean that the sender means it's a number). If,
> >> when
> >> > decoding JSON, the library can't handle large numbers, it has to
> >> an
> >> > error anyway. The same should go for binary JSON.
> >> >
> >> > ./don
> > --
> > Don Owens
> > don@... <mailto:don%40regexguy.com>
> Don Owens
> don@... <mailto:don%40regexguy.com>
> [Non-text portions of this message have been removed]
[Non-text portions of this message have been removed]
- On Mon, Feb 20, 2012 at 9:42 AM, rkalla123 <rkalla@...> wrote:
> Stephan,For what it is worth, I also consider support for only signed values a
> No problem; your feedback are still very applicable and much appreciated.
> The additional view-point on the signed/unsigned issue was exactly what I was hoping for. My primary goal has always been simplicity and I know at least from the Java world, going with unsigned values would have made the impl distinctly *not* simple (and an annoying API).
> So I am glad to get some validation there that I am not alienating every other language at the cost of Java.
-+ Tatu +-