Loading ...
Sorry, an error occurred while loading the content.

Need for "abstract data model" to support JSON, web services ("useful parts of w3c schema", not validation)

Expand Messages
  • Tatu Saloranta
    One thing that is currently missing (AFAIK) from JSON stack is the equivalent of useful parts of W3C Schema ( xml schema ). I know there is a JSON Schema
    Message 1 of 8 , Apr 25 8:58 AM
    • 0 Attachment
      One thing that is currently missing (AFAIK) from JSON stack is the
      equivalent of useful parts of W3C Schema ("xml schema").
      I know there is a JSON Schema effort underway already, but if I
      understand things correctly, its focus on validation aspects, and not
      data typing or modeling .
      Although w3c Schema can be used for that (as can DTDs and RelaxNG,
      latter of which does this better), to me the main value of Schema is
      as abstract data typing/model.

      What I mean by this is that the only real advantage of, say, SOAP over
      similar simpler JSON+Rest approach is that of having language-agnostic
      data model that can be used for full object binding and serialization;
      including code generation if need be. You can (painfully) define such
      model, and then generate or bind conforming data to objects.
      This helps in interoperability as you can describe data content in
      language+platform independent way, but with ability to get specific
      bindings reliably.
      With JSON you can already do data mapping/binding quite well, except
      for one area of problems: that of handling polymorphism (inheritance).
      Although that can be worked around on language-by-language basis
      (adding "class" element in Maps), there is no generic way to do it.

      Another thing that I feel most existing Schema languages get wrong is
      the false goal of having to be expressed in format being described
      (XML schema written in xml). Instead, notation absolutely should be a
      DSL: right tool for the job. This is especially crucial for JSON
      because of its simplicity: trying to shoehorn a definition language in
      JSON seems unnecessarily cumbersome.
      It is ok to have a one-to-one mapping between optimal DSL and JSON (or
      xml) if need be -- RelaxNG shows a good way of doing that with its
      compact (non-xml) notation, and equivalent XML serialization.

      So I would be interested in finding a solution for this problem -- it
      seems like the last missing selling point for JSON-WS, to be used for
      the use case of external entities communication over such an interface
      (less needed for internal integration where more concrete interfaces,
      client libs etc, can be used).

      One more thing: defining such a notation might not be extremely
      difficult -- since validation is NOT the main focus, range & size
      limitations could be omitted, or kept very simple; more important is
      JSON-primitive/object-inheritance data-typing and structural
      definitions.
      Validation aspects can be handled separately; or as an add-on.

      -+ Tatu +-
    • John Cowan
      ... Very interesting. I ve been thinking about this a bit, and here s a sketch of what I ve come up with. I m assuming that the target language looks like
      Message 2 of 8 , Apr 27 10:14 AM
      • 0 Attachment
        Tatu Saloranta scripsit:

        > One thing that is currently missing (AFAIK) from JSON stack is the
        > equivalent of useful parts of W3C Schema ("xml schema").

        Very interesting. I've been thinking about this a bit, and here's a
        sketch of what I've come up with. I'm assuming that the target language
        looks like C++, Java, or C#: that is, objects have statically typed
        members, there exists a subtyping relationship between types such that
        a subtype has all the members of its supertypes, and Null (the type of
        null) is a subtype of every object type. Note that it doesn't matter
        whether the language provides single or multiple inheritance.

        In XML, data binding is based on the element name (for DTD-based systems)
        or on the element name plus the types of ancestor elements (for W3C XML
        Schema based systems). JSON objects have no convenient analogue of the
        element name, which is not only in a distinguished place in the element,
        but is up front, making data binding on the fly practical. We could just
        require a "name" key in each object, but that would be IMHO against the
        spirit of JSON.

        So let's instead write down for each JSON object type a predicate that
        identifies it. Here's a tentative list of primitive predicates:

        isNull(keyName)
        isUndefined(keyName)
        isBoolean(keyName)
        isTrue(keyName)
        isFalse(keyName)
        isNumeric(keyName)
        hasNumericValue(keyName, number)
        hasStringValue(keyName, number)
        hasParentTypes(typename, ...)

        The hasParentTypes predicate accepts an arbitrary number of type names,
        which are in order the type of the parent object, the grandparent object, ....
        This predicate is required for WXS-equivalent capability; omitting
        it gives DTD-equivalent capability. In addition, the usual and, or,
        and not operators can be applied to construct complex predicates from
        these primitives. The type-assignment engine would have to validate
        that no object can be assigned to more than one independent type.

        In addition to the predicate, each type also has a set of key->type
        mappings, which will be validated during type assignment. The type in
        such a mapping can be a primitive type (boolean, number), an object
        type, an array type, or undefined. So if a type specifies that key
        x is boolean, then it must be so. If it specifies that key y is of
        type z, then we make sure the value of key y matches the predicate
        for type z or any of the subtypes of z, or is null. Arrays come in
        four flavors: general arrays, boolean arrays, numeric arrays, and
        arrays-of-object-type-x. Undefined means that the key-value pair is
        not represented in the object model.

        We must have a lattice of subtype-supertype mappings. For convenience,
        it's probably best to specify in the mapping language one or more
        supertypes for each type, with the understanding that the key->type
        mappings of the supertype(s) are also assumed for the subtype. This is
        just a notational convenience, but OO programmers expect it. In addition,
        the subtype predicate as written is logically ANDed with the supertype
        predicate(s) to form the effective predicate for the subtype. This
        hopefully preserves the Liskov substitution principle for the bound types.
        This is the only exception to the rule that type predicates must be
        independent.

        Does this look like what you had in mind?

        --
        Her he asked if O'Hare Doctor tidings sent from far John Cowan
        coast and she with grameful sigh him answered that http://ccil.org/~cowan
        O'Hare Doctor in heaven was. Sad was the man that word cowan@...
        to hear that him so heavied in bowels ruthful. All
        she there told him, ruing death for friend so young, James Joyce, Ulysses
        algate sore unwilling God's rightwiseness to withsay. "Oxen of the Sun"
      • John David Duncan
        Tatu, I don t really understand XML schema or WSDL (which I think you re referring to), so I could be quite off target here ... but maybe Google Protocol
        Message 3 of 8 , Apr 27 10:36 AM
        • 0 Attachment
          Tatu,

          I don't really understand XML schema or WSDL (which I think you're
          referring to), so I could be quite off target here ...

          but maybe Google Protocol Buffers fit the problem?

          http://code.google.com/apis/protocolbuffers/docs/proto.html

          JD



          On Apr 25, 2009, at 8:58 AM, Tatu Saloranta wrote:

          > One thing that is currently missing (AFAIK) from JSON stack is the
          > equivalent of useful parts of W3C Schema ("xml schema").
          > I know there is a JSON Schema effort underway already, but if I
          > understand things correctly, its focus on validation aspects, and not
          > data typing or modeling .
          > Although w3c Schema can be used for that (as can DTDs and RelaxNG,
          > latter of which does this better), to me the main value of Schema is
          > as abstract data typing/model.
          >
          > What I mean by this is that the only real advantage of, say, SOAP over
          > similar simpler JSON+Rest approach is that of having language-agnostic
          > data model that can be used for full object binding and serialization;
          > including code generation if need be. You can (painfully) define such
          > model, and then generate or bind conforming data to objects.
          > This helps in interoperability as you can describe data content in
          > language+platform independent way, but with ability to get specific
          > bindings reliably.
          > With JSON you can already do data mapping/binding quite well, except
          > for one area of problems: that of handling polymorphism (inheritance).
          > Although that can be worked around on language-by-language basis
          > (adding "class" element in Maps), there is no generic way to do it.
          >
          > Another thing that I feel most existing Schema languages get wrong is
          > the false goal of having to be expressed in format being described
          > (XML schema written in xml). Instead, notation absolutely should be a
          > DSL: right tool for the job. This is especially crucial for JSON
          > because of its simplicity: trying to shoehorn a definition language in
          > JSON seems unnecessarily cumbersome.
          > It is ok to have a one-to-one mapping between optimal DSL and JSON (or
          > xml) if need be -- RelaxNG shows a good way of doing that with its
          > compact (non-xml) notation, and equivalent XML serialization.
          >
          > So I would be interested in finding a solution for this problem -- it
          > seems like the last missing selling point for JSON-WS, to be used for
          > the use case of external entities communication over such an interface
          > (less needed for internal integration where more concrete interfaces,
          > client libs etc, can be used).
          >
          > One more thing: defining such a notation might not be extremely
          > difficult -- since validation is NOT the main focus, range & size
          > limitations could be omitted, or kept very simple; more important is
          > JSON-primitive/object-inheritance data-typing and structural
          > definitions.
          > Validation aspects can be handled separately; or as an add-on.
          >
          > -+ Tatu +-
          >
          >
          > ------------------------------------
          >
          > Yahoo! Groups Links
          >
          >
          >
        • Tatu Saloranta
          ... Exactly, since more dynamic languages can use duck typing , or more loose conversions. It s static languages that need extra help. ... Yes. I don t think
          Message 4 of 8 , Apr 27 11:17 AM
          • 0 Attachment
            On Mon, Apr 27, 2009 at 10:14 AM, John Cowan <cowan@...> wrote:
            > Tatu Saloranta scripsit:
            >
            >> One thing that is currently missing (AFAIK) from JSON stack is the
            >> equivalent of useful parts of W3C Schema ("xml schema").
            >
            > Very interesting. I've been thinking about this a bit, and here's a
            > sketch of what I've come up with. I'm assuming that the target language
            > looks like C++, Java, or C#: that is, objects have statically typed

            Exactly, since more dynamic languages can use "duck typing", or more
            loose conversions. It's static languages that need extra help.

            > members, there exists a subtyping relationship between types such that
            > a subtype has all the members of its supertypes, and Null (the type of
            > null) is a subtype of every object type. Note that it doesn't matter

            Yes. I don't think nulls should be typed; and probably only limitation there

            > whether the language provides single or multiple inheritance.

            Hmmh. I have to think about that a bit -- you may be right. I have
            only considered single-inheritance case so far.

            > In XML, data binding is based on the element name (for DTD-based systems)
            > or on the element name plus the types of ancestor elements (for W3C XML
            > Schema based systems). JSON objects have no convenient analogue of the
            > element name, which is not only in a distinguished place in the

            True. But part of it is just "off by one" problem -- there's no name
            at root level, but beyond this, there are field names that can be
            mapped to schema contextually.

            One thing I forgot to mention is that I am effectively using Java
            objects as the schema: and there get/set methods and fields do have
            names that you can match to JSON object property names (with
            overriding by annotations or other configuration).
            Same can not be done with explicit schema however, problem has to be
            resolved differently like you explain.

            > but is up front, making data binding on the fly practical. We could just
            > require a "name" key in each object, but that would be IMHO against the
            > spirit of JSON.

            Agreed, that seems wrong.

            > So let's instead write down for each JSON object type a predicate that
            > identifies it. Here's a tentative list of primitive predicates:
            >
            > isNull(keyName)

            Would this mean it has to be null, or that it is nullable (allowsNull)?

            > isUndefined(keyName)

            does this mean "any type"? Sort of fallback, xml "any" type.

            > isBoolean(keyName)
            > isTrue(keyName)
            > isFalse(keyName)
            > isNumeric(keyName)

            perhaps also isInteger/integral?

            > hasNumericValue(keyName, number)
            > hasStringValue(keyName, number)

            (string instead of 'number')

            > hasParentTypes(typename, ...)
            >
            > The hasParentTypes predicate accepts an arbitrary number of type names,
            > which are in order the type of the parent object, the grandparent object, ....
            > This predicate is required for WXS-equivalent capability; omitting
            > it gives DTD-equivalent capability. In addition, the usual and, or,
            > and not operators can be applied to construct complex predicates from
            > these primitives. The type-assignment engine would have to validate
            > that no object can be assigned to more than one independent type.

            Yes.

            > In addition to the predicate, each type also has a set of key->type
            > mappings, which will be validated during type assignment. The type in
            ...
            > it's probably best to specify in the mapping language one or more
            > supertypes for each type, with the understanding that the key->type
            > mappings of the supertype(s) are also assumed for the subtype. This is
            > just a notational convenience, but OO programmers expect it. In addition,
            > the subtype predicate as written is logically ANDed with the supertype
            > predicate(s) to form the effective predicate for the subtype. This
            > hopefully preserves the Liskov substitution principle for the bound types.
            > This is the only exception to the rule that type predicates must be
            > independent.
            >
            > Does this look like what you had in mind?

            It sounds good so far, although I need to read this with more thought.
            You have thought more deeply about hard problems than I have, I think. :)

            Some additional constraints, compared to other schema work that I
            thought of were:

            - Not allowing union types that are not mappable to OO ("value can be
            either an array or boolean"): this just means there are legal JSON
            constructs for which no strict can be defined.
            This is in line with the idea of "schema" being geared towards
            mapping to/from OO instances, not for constraining general JSON
            content.

            -+ Tatu +-
          • Tatu Saloranta
            On Mon, Apr 27, 2009 at 10:36 AM, John David Duncan ... I don t think it is directly applicable in PB, schema has much more fundamental role, being mandatory
            Message 5 of 8 , Apr 27 11:20 AM
            • 0 Attachment
              On Mon, Apr 27, 2009 at 10:36 AM, John David Duncan
              <john.david.duncan@...> wrote:
              > Tatu,
              >
              > I don't really understand XML schema or WSDL (which I think you're
              > referring to), so I could be quite off target here ...
              >
              > but maybe Google Protocol Buffers fit the problem?
              >
              > http://code.google.com/apis/protocolbuffers/docs/proto.html

              I don't think it is directly applicable in PB, schema has much more
              fundamental role, being mandatory for operating anything, and to
              generate code (unless I'm mistaken). So in a way it is a strict
              schema-first format. I am hoping for something that allows both
              schema- and code-first approaches, somewhat similar to XML Schema in
              that respect.

              But maybe some aspects of PB might be useful: syntax, other ideas
              regarding versioning?

              -+ Tatu +-
            • John Cowan
              ... All these predicates are on the JSON object itself, because they are used before the object s type is known. So isNull( foo ) means that there is a key
              Message 6 of 8 , Apr 27 11:35 AM
              • 0 Attachment
                Tatu Saloranta scripsit:

                > > isNull(keyName)
                >
                > Would this mean it has to be null, or that it is nullable (allowsNull)?

                All these predicates are on the JSON object itself, because they are used
                before the object's type is known. So isNull("foo") means that there
                is a key "foo" in the JSON object value is null, and likewise with all
                the other primitives.

                > > isUndefined(keyName)
                >
                > does this mean "any type"? Sort of fallback, xml "any" type.

                No, it means that the specified key does not exist. I use the name
                "undefined" for cultural compatibility with JavaScript. There would be
                little point in an isAnyType predicate, because it would match anything
                *except* a missing key.

                > > isNumeric(keyName)
                >
                > perhaps also isInteger/integral?

                That's reasonable. In that case it would be useful to be able to map
                keys to integer as well, and perhaps standardized subtypes of integer too.

                > - [U]nion types that are not mappable to OO ("value can be
                > either an array or boolean"): this just means there are legal JSON
                > constructs for which no strict can be defined.

                Multiple inheritance is a subset of this (doesn't handle primitives).
                If type foobar is a subtype of types foo and bar, then effectively it
                is a union of them.

                One question: should the predicates and the maps do on-the-fly conversion?
                For example, if we specify a predicate of hasNumericValue("foo", 0),
                does "foo": "0" match, or do we require "foo": 0? Likewise, when
                mapping, if the map specifies bar->String and the object contains
                "bar": 123, does the bar field get "123", or is that an error?
                These questions are independent. I'd favor doing the conversions.

                --
                You let them out again, Old Man Willow! John Cowan
                What you be a-thinking of? You should not be waking! cowan@...
                Eat earth! Dig deep! Drink water! Go to sleep!
                Bombadil is talking. http://ccil.org/~cowan
              • Kris Zyp
                ... JSON Schema can certainly be used for that purpose, I am using JSON Schema for typing in Dojo and Persevere, and not just for validating existing JSON
                Message 7 of 8 , Apr 27 2:25 PM
                • 0 Attachment
                  Tatu Saloranta wrote:
                  >
                  >
                  > One thing that is currently missing (AFAIK) from JSON stack is the
                  > equivalent of useful parts of W3C Schema ("xml schema").
                  > I know there is a JSON Schema effort underway already, but if I
                  > understand things correctly, its focus on validation aspects, and not
                  > data typing or modeling .
                  > Although w3c Schema can be used for that (as can DTDs and RelaxNG,
                  > latter of which does this better), to me the main value of Schema is
                  > as abstract data typing/model.

                  JSON Schema can certainly be used for that purpose, I am using JSON
                  Schema for typing in Dojo and Persevere, and not just for validating
                  existing JSON structures.

                  > What I mean by this is that the only real advantage of, say, SOAP over
                  > similar simpler JSON+Rest approach is that of having language-agnostic
                  > data model that can be used for full object binding and serialization;

                  I am not sure if this exactly what you are talking, but we have been
                  discussing using JSON Schema as a way to define hyperlink properties in
                  JSON structures to provide interoperability in RESTful JSON:
                  -------------------------------------

                  > including code generation if need be. You can (painfully) define such
                  > model, and then generate or bind conforming data to objects.
                  > This helps in interoperability as you can describe data content in
                  > language+platform independent way, but with ability to get specific
                  > bindings reliably.
                  > With JSON you can already do data mapping/binding quite well, except
                  > for one area of problems: that of handling polymorphism (inheritance).
                  > Although that can be worked around on language-by-language basis
                  > (adding "class" element in Maps), there is no generic way to do it.

                  The "extends" attribute of JSON Schema is designed specifically to meet
                  the needs of describing data structures with inheritance. Is this
                  insufficient for your polymorphic description needs?

                  > Another thing that I feel most existing Schema languages get wrong is
                  > the false goal of having to be expressed in format being described
                  > (XML schema written in xml). Instead, notation absolutely should be a
                  > DSL: right tool for the job. This is especially crucial for JSON
                  > because of its simplicity: trying to shoehorn a definition language in
                  > JSON seems unnecessarily cumbersome.
                  > It is ok to have a one-to-one mapping between optimal DSL and JSON (or
                  > xml) if need be -- RelaxNG shows a good way of doing that with its
                  > compact (non-xml) notation, and equivalent XML serialization.
                  >
                  > So I would be interested in finding a solution for this problem -- it
                  > seems like the last missing selling point for JSON-WS, to be used for
                  > the use case of external entities communication over such an interface
                  > (less needed for internal integration where more concrete interfaces,
                  > client libs etc, can be used).

                  "JSON-WS" hints more at RPC type communication, which is what SMD is
                  designed for (which uses JSON Schema):


                  > One more thing: defining such a notation might not be extremely
                  > difficult -- since validation is NOT the main focus, range & size
                  > limitations could be omitted, or kept very simple; more important is
                  > JSON-primitive/object-inheritance data-typing and structural
                  > definitions.
                  > Validation aspects can be handled separately; or as an add-on.

                  JSON Schema can as simple as the subset you want to use. {} is a valid
                  schema (although not particularly useful). We have discussed formally
                  defining a subset of JSON Schema for more traditional data-typing style
                  constraints, but when after some discussion of how easy it is to ignore
                  non-relevant parts of the spec, it seemed like kind of a silly exercise.
                  Of course, if you want to define a non JSON-based data type modeling
                  system, that is indeed a completely different matter and exercise. Or
                  maybe I am missing what you are really after.

                  Kris
                • Tatu Saloranta
                  ... Ok. I did notice extends property (but only after sending email), which when combined with other pieces should allow for defining type structures? ...
                  Message 8 of 8 , Apr 27 10:55 PM
                  • 0 Attachment
                    On Mon, Apr 27, 2009 at 2:25 PM, Kris Zyp <kriszyp@...> wrote:
                    >
                    > Tatu Saloranta wrote:
                    >>
                    >> One thing that is currently missing (AFAIK) from JSON stack is the
                    >> equivalent of useful parts of W3C Schema ("xml schema").
                    ...
                    >
                    > JSON Schema can certainly be used for that purpose, I am using JSON
                    > Schema for typing in Dojo and Persevere, and not just for validating
                    > existing JSON structures.

                    Ok. I did notice 'extends' property (but only after sending email),
                    which when combined with other pieces should allow for defining type
                    structures?

                    >> What I mean by this is that the only real advantage of, say, SOAP over
                    >> similar simpler JSON+Rest approach is that of having language-agnostic
                    >> data model that can be used for full object binding and serialization;
                    >
                    > I am not sure if this exactly what you are talking, but we have been
                    > discussing using JSON Schema as a way to define hyperlink properties in
                    > JSON structures to provide interoperability in RESTful JSON:

                    Basically I am thinking of supporting both schema-first model
                    (generaring classes/stubs from definition) and code first (annotating
                    to produce schema).
                    And for Java, perhaps generate bean-validation annotations, for schema
                    first case.

                    Perhaps what would be nice would be more along the lines of providing
                    more convenient syntax to express already defined json schema model.

                    ...
                    > The "extends" attribute of JSON Schema is designed specifically to meet
                    > the needs of describing data structures with inheritance. Is this
                    > insufficient for your polymorphic description needs?

                    Probably not -- initially I just missed it. :-/

                    It does however need to be coupled with some definition of how to
                    include type identifier in json content ("#type" field? standard name,
                    or perhaps configurable with schema). Is there something in Json
                    schema to define this? Or if not, could it be added?
                    Type is needed to deserialize specific sub-type instance; on
                    serialization full type is known, but on deserialization only declared
                    type, and full type needs to be available from json content (can
                    optimize this out for leaf types).

                    ...
                    > "JSON-WS" hints more at RPC type communication, which is what SMD is
                    > designed for (which uses JSON Schema):

                    True, I should have clarified that. This would be the data/type model
                    that is useful (and sort of required) for WS, but specifically not the
                    web service definition itself (no end points, operations etc defined).
                    Like XML Schema part that Soap/WSDL build on.

                    ...
                    > JSON Schema can as simple as the subset you want to use. {} is a valid
                    > schema (although not particularly useful). We have discussed formally
                    > defining a subset of JSON Schema for more traditional data-typing style
                    > constraints, but when after some discussion of how easy it is to ignore
                    > non-relevant parts of the spec, it seemed like kind of a silly exercise.
                    > Of course, if you want to define a non JSON-based data type modeling
                    > system, that is indeed a completely different matter and exercise. Or
                    > maybe I am missing what you are really after.

                    Ok, thanks. Sub-setting could definitely work. I need to read json
                    schema proposal with some more thought. I am interested in syntax
                    part, as well as some limitations on what kinds of schemas would be
                    allowed -- certain kinds of unions might not be representable on OO
                    side.

                    Also: is the use of json as format for schema a fundamental goal? To
                    me it seems that there are benefits to using more compact and
                    expressive notation, but not much benefit from using JSON (i.e.
                    complexity is not within parsing of schema but in processing it).
                    I don't mind having a json serialization (RelaxNG - like "dual"
                    model), but really like the idea of having something more optimal for
                    the domain.

                    -+ Tatu +-
                  Your message has been successfully submitted and would be delivered to recipients shortly.