Loading ...
Sorry, an error occurred while loading the content.

Re: [json] Need for "abstract data model" to support JSON, web services ("useful parts of w3c schema", not validation)

Expand Messages
  • John Cowan
    ... Very interesting. I ve been thinking about this a bit, and here s a sketch of what I ve come up with. I m assuming that the target language looks like
    Message 1 of 8 , Apr 27, 2009
    • 0 Attachment
      Tatu Saloranta scripsit:

      > One thing that is currently missing (AFAIK) from JSON stack is the
      > equivalent of useful parts of W3C Schema ("xml schema").

      Very interesting. I've been thinking about this a bit, and here's a
      sketch of what I've come up with. I'm assuming that the target language
      looks like C++, Java, or C#: that is, objects have statically typed
      members, there exists a subtyping relationship between types such that
      a subtype has all the members of its supertypes, and Null (the type of
      null) is a subtype of every object type. Note that it doesn't matter
      whether the language provides single or multiple inheritance.

      In XML, data binding is based on the element name (for DTD-based systems)
      or on the element name plus the types of ancestor elements (for W3C XML
      Schema based systems). JSON objects have no convenient analogue of the
      element name, which is not only in a distinguished place in the element,
      but is up front, making data binding on the fly practical. We could just
      require a "name" key in each object, but that would be IMHO against the
      spirit of JSON.

      So let's instead write down for each JSON object type a predicate that
      identifies it. Here's a tentative list of primitive predicates:

      isNull(keyName)
      isUndefined(keyName)
      isBoolean(keyName)
      isTrue(keyName)
      isFalse(keyName)
      isNumeric(keyName)
      hasNumericValue(keyName, number)
      hasStringValue(keyName, number)
      hasParentTypes(typename, ...)

      The hasParentTypes predicate accepts an arbitrary number of type names,
      which are in order the type of the parent object, the grandparent object, ....
      This predicate is required for WXS-equivalent capability; omitting
      it gives DTD-equivalent capability. In addition, the usual and, or,
      and not operators can be applied to construct complex predicates from
      these primitives. The type-assignment engine would have to validate
      that no object can be assigned to more than one independent type.

      In addition to the predicate, each type also has a set of key->type
      mappings, which will be validated during type assignment. The type in
      such a mapping can be a primitive type (boolean, number), an object
      type, an array type, or undefined. So if a type specifies that key
      x is boolean, then it must be so. If it specifies that key y is of
      type z, then we make sure the value of key y matches the predicate
      for type z or any of the subtypes of z, or is null. Arrays come in
      four flavors: general arrays, boolean arrays, numeric arrays, and
      arrays-of-object-type-x. Undefined means that the key-value pair is
      not represented in the object model.

      We must have a lattice of subtype-supertype mappings. For convenience,
      it's probably best to specify in the mapping language one or more
      supertypes for each type, with the understanding that the key->type
      mappings of the supertype(s) are also assumed for the subtype. This is
      just a notational convenience, but OO programmers expect it. In addition,
      the subtype predicate as written is logically ANDed with the supertype
      predicate(s) to form the effective predicate for the subtype. This
      hopefully preserves the Liskov substitution principle for the bound types.
      This is the only exception to the rule that type predicates must be
      independent.

      Does this look like what you had in mind?

      --
      Her he asked if O'Hare Doctor tidings sent from far John Cowan
      coast and she with grameful sigh him answered that http://ccil.org/~cowan
      O'Hare Doctor in heaven was. Sad was the man that word cowan@...
      to hear that him so heavied in bowels ruthful. All
      she there told him, ruing death for friend so young, James Joyce, Ulysses
      algate sore unwilling God's rightwiseness to withsay. "Oxen of the Sun"
    • John David Duncan
      Tatu, I don t really understand XML schema or WSDL (which I think you re referring to), so I could be quite off target here ... but maybe Google Protocol
      Message 2 of 8 , Apr 27, 2009
      • 0 Attachment
        Tatu,

        I don't really understand XML schema or WSDL (which I think you're
        referring to), so I could be quite off target here ...

        but maybe Google Protocol Buffers fit the problem?

        http://code.google.com/apis/protocolbuffers/docs/proto.html

        JD



        On Apr 25, 2009, at 8:58 AM, Tatu Saloranta wrote:

        > One thing that is currently missing (AFAIK) from JSON stack is the
        > equivalent of useful parts of W3C Schema ("xml schema").
        > I know there is a JSON Schema effort underway already, but if I
        > understand things correctly, its focus on validation aspects, and not
        > data typing or modeling .
        > Although w3c Schema can be used for that (as can DTDs and RelaxNG,
        > latter of which does this better), to me the main value of Schema is
        > as abstract data typing/model.
        >
        > What I mean by this is that the only real advantage of, say, SOAP over
        > similar simpler JSON+Rest approach is that of having language-agnostic
        > data model that can be used for full object binding and serialization;
        > including code generation if need be. You can (painfully) define such
        > model, and then generate or bind conforming data to objects.
        > This helps in interoperability as you can describe data content in
        > language+platform independent way, but with ability to get specific
        > bindings reliably.
        > With JSON you can already do data mapping/binding quite well, except
        > for one area of problems: that of handling polymorphism (inheritance).
        > Although that can be worked around on language-by-language basis
        > (adding "class" element in Maps), there is no generic way to do it.
        >
        > Another thing that I feel most existing Schema languages get wrong is
        > the false goal of having to be expressed in format being described
        > (XML schema written in xml). Instead, notation absolutely should be a
        > DSL: right tool for the job. This is especially crucial for JSON
        > because of its simplicity: trying to shoehorn a definition language in
        > JSON seems unnecessarily cumbersome.
        > It is ok to have a one-to-one mapping between optimal DSL and JSON (or
        > xml) if need be -- RelaxNG shows a good way of doing that with its
        > compact (non-xml) notation, and equivalent XML serialization.
        >
        > So I would be interested in finding a solution for this problem -- it
        > seems like the last missing selling point for JSON-WS, to be used for
        > the use case of external entities communication over such an interface
        > (less needed for internal integration where more concrete interfaces,
        > client libs etc, can be used).
        >
        > One more thing: defining such a notation might not be extremely
        > difficult -- since validation is NOT the main focus, range & size
        > limitations could be omitted, or kept very simple; more important is
        > JSON-primitive/object-inheritance data-typing and structural
        > definitions.
        > Validation aspects can be handled separately; or as an add-on.
        >
        > -+ Tatu +-
        >
        >
        > ------------------------------------
        >
        > Yahoo! Groups Links
        >
        >
        >
      • Tatu Saloranta
        ... Exactly, since more dynamic languages can use duck typing , or more loose conversions. It s static languages that need extra help. ... Yes. I don t think
        Message 3 of 8 , Apr 27, 2009
        • 0 Attachment
          On Mon, Apr 27, 2009 at 10:14 AM, John Cowan <cowan@...> wrote:
          > Tatu Saloranta scripsit:
          >
          >> One thing that is currently missing (AFAIK) from JSON stack is the
          >> equivalent of useful parts of W3C Schema ("xml schema").
          >
          > Very interesting. I've been thinking about this a bit, and here's a
          > sketch of what I've come up with. I'm assuming that the target language
          > looks like C++, Java, or C#: that is, objects have statically typed

          Exactly, since more dynamic languages can use "duck typing", or more
          loose conversions. It's static languages that need extra help.

          > members, there exists a subtyping relationship between types such that
          > a subtype has all the members of its supertypes, and Null (the type of
          > null) is a subtype of every object type. Note that it doesn't matter

          Yes. I don't think nulls should be typed; and probably only limitation there

          > whether the language provides single or multiple inheritance.

          Hmmh. I have to think about that a bit -- you may be right. I have
          only considered single-inheritance case so far.

          > In XML, data binding is based on the element name (for DTD-based systems)
          > or on the element name plus the types of ancestor elements (for W3C XML
          > Schema based systems). JSON objects have no convenient analogue of the
          > element name, which is not only in a distinguished place in the

          True. But part of it is just "off by one" problem -- there's no name
          at root level, but beyond this, there are field names that can be
          mapped to schema contextually.

          One thing I forgot to mention is that I am effectively using Java
          objects as the schema: and there get/set methods and fields do have
          names that you can match to JSON object property names (with
          overriding by annotations or other configuration).
          Same can not be done with explicit schema however, problem has to be
          resolved differently like you explain.

          > but is up front, making data binding on the fly practical. We could just
          > require a "name" key in each object, but that would be IMHO against the
          > spirit of JSON.

          Agreed, that seems wrong.

          > So let's instead write down for each JSON object type a predicate that
          > identifies it. Here's a tentative list of primitive predicates:
          >
          > isNull(keyName)

          Would this mean it has to be null, or that it is nullable (allowsNull)?

          > isUndefined(keyName)

          does this mean "any type"? Sort of fallback, xml "any" type.

          > isBoolean(keyName)
          > isTrue(keyName)
          > isFalse(keyName)
          > isNumeric(keyName)

          perhaps also isInteger/integral?

          > hasNumericValue(keyName, number)
          > hasStringValue(keyName, number)

          (string instead of 'number')

          > hasParentTypes(typename, ...)
          >
          > The hasParentTypes predicate accepts an arbitrary number of type names,
          > which are in order the type of the parent object, the grandparent object, ....
          > This predicate is required for WXS-equivalent capability; omitting
          > it gives DTD-equivalent capability. In addition, the usual and, or,
          > and not operators can be applied to construct complex predicates from
          > these primitives. The type-assignment engine would have to validate
          > that no object can be assigned to more than one independent type.

          Yes.

          > In addition to the predicate, each type also has a set of key->type
          > mappings, which will be validated during type assignment. The type in
          ...
          > it's probably best to specify in the mapping language one or more
          > supertypes for each type, with the understanding that the key->type
          > mappings of the supertype(s) are also assumed for the subtype. This is
          > just a notational convenience, but OO programmers expect it. In addition,
          > the subtype predicate as written is logically ANDed with the supertype
          > predicate(s) to form the effective predicate for the subtype. This
          > hopefully preserves the Liskov substitution principle for the bound types.
          > This is the only exception to the rule that type predicates must be
          > independent.
          >
          > Does this look like what you had in mind?

          It sounds good so far, although I need to read this with more thought.
          You have thought more deeply about hard problems than I have, I think. :)

          Some additional constraints, compared to other schema work that I
          thought of were:

          - Not allowing union types that are not mappable to OO ("value can be
          either an array or boolean"): this just means there are legal JSON
          constructs for which no strict can be defined.
          This is in line with the idea of "schema" being geared towards
          mapping to/from OO instances, not for constraining general JSON
          content.

          -+ Tatu +-
        • Tatu Saloranta
          On Mon, Apr 27, 2009 at 10:36 AM, John David Duncan ... I don t think it is directly applicable in PB, schema has much more fundamental role, being mandatory
          Message 4 of 8 , Apr 27, 2009
          • 0 Attachment
            On Mon, Apr 27, 2009 at 10:36 AM, John David Duncan
            <john.david.duncan@...> wrote:
            > Tatu,
            >
            > I don't really understand XML schema or WSDL (which I think you're
            > referring to), so I could be quite off target here ...
            >
            > but maybe Google Protocol Buffers fit the problem?
            >
            > http://code.google.com/apis/protocolbuffers/docs/proto.html

            I don't think it is directly applicable in PB, schema has much more
            fundamental role, being mandatory for operating anything, and to
            generate code (unless I'm mistaken). So in a way it is a strict
            schema-first format. I am hoping for something that allows both
            schema- and code-first approaches, somewhat similar to XML Schema in
            that respect.

            But maybe some aspects of PB might be useful: syntax, other ideas
            regarding versioning?

            -+ Tatu +-
          • John Cowan
            ... All these predicates are on the JSON object itself, because they are used before the object s type is known. So isNull( foo ) means that there is a key
            Message 5 of 8 , Apr 27, 2009
            • 0 Attachment
              Tatu Saloranta scripsit:

              > > isNull(keyName)
              >
              > Would this mean it has to be null, or that it is nullable (allowsNull)?

              All these predicates are on the JSON object itself, because they are used
              before the object's type is known. So isNull("foo") means that there
              is a key "foo" in the JSON object value is null, and likewise with all
              the other primitives.

              > > isUndefined(keyName)
              >
              > does this mean "any type"? Sort of fallback, xml "any" type.

              No, it means that the specified key does not exist. I use the name
              "undefined" for cultural compatibility with JavaScript. There would be
              little point in an isAnyType predicate, because it would match anything
              *except* a missing key.

              > > isNumeric(keyName)
              >
              > perhaps also isInteger/integral?

              That's reasonable. In that case it would be useful to be able to map
              keys to integer as well, and perhaps standardized subtypes of integer too.

              > - [U]nion types that are not mappable to OO ("value can be
              > either an array or boolean"): this just means there are legal JSON
              > constructs for which no strict can be defined.

              Multiple inheritance is a subset of this (doesn't handle primitives).
              If type foobar is a subtype of types foo and bar, then effectively it
              is a union of them.

              One question: should the predicates and the maps do on-the-fly conversion?
              For example, if we specify a predicate of hasNumericValue("foo", 0),
              does "foo": "0" match, or do we require "foo": 0? Likewise, when
              mapping, if the map specifies bar->String and the object contains
              "bar": 123, does the bar field get "123", or is that an error?
              These questions are independent. I'd favor doing the conversions.

              --
              You let them out again, Old Man Willow! John Cowan
              What you be a-thinking of? You should not be waking! cowan@...
              Eat earth! Dig deep! Drink water! Go to sleep!
              Bombadil is talking. http://ccil.org/~cowan
            • Kris Zyp
              ... JSON Schema can certainly be used for that purpose, I am using JSON Schema for typing in Dojo and Persevere, and not just for validating existing JSON
              Message 6 of 8 , Apr 27, 2009
              • 0 Attachment
                Tatu Saloranta wrote:
                >
                >
                > One thing that is currently missing (AFAIK) from JSON stack is the
                > equivalent of useful parts of W3C Schema ("xml schema").
                > I know there is a JSON Schema effort underway already, but if I
                > understand things correctly, its focus on validation aspects, and not
                > data typing or modeling .
                > Although w3c Schema can be used for that (as can DTDs and RelaxNG,
                > latter of which does this better), to me the main value of Schema is
                > as abstract data typing/model.

                JSON Schema can certainly be used for that purpose, I am using JSON
                Schema for typing in Dojo and Persevere, and not just for validating
                existing JSON structures.

                > What I mean by this is that the only real advantage of, say, SOAP over
                > similar simpler JSON+Rest approach is that of having language-agnostic
                > data model that can be used for full object binding and serialization;

                I am not sure if this exactly what you are talking, but we have been
                discussing using JSON Schema as a way to define hyperlink properties in
                JSON structures to provide interoperability in RESTful JSON:
                -------------------------------------

                > including code generation if need be. You can (painfully) define such
                > model, and then generate or bind conforming data to objects.
                > This helps in interoperability as you can describe data content in
                > language+platform independent way, but with ability to get specific
                > bindings reliably.
                > With JSON you can already do data mapping/binding quite well, except
                > for one area of problems: that of handling polymorphism (inheritance).
                > Although that can be worked around on language-by-language basis
                > (adding "class" element in Maps), there is no generic way to do it.

                The "extends" attribute of JSON Schema is designed specifically to meet
                the needs of describing data structures with inheritance. Is this
                insufficient for your polymorphic description needs?

                > Another thing that I feel most existing Schema languages get wrong is
                > the false goal of having to be expressed in format being described
                > (XML schema written in xml). Instead, notation absolutely should be a
                > DSL: right tool for the job. This is especially crucial for JSON
                > because of its simplicity: trying to shoehorn a definition language in
                > JSON seems unnecessarily cumbersome.
                > It is ok to have a one-to-one mapping between optimal DSL and JSON (or
                > xml) if need be -- RelaxNG shows a good way of doing that with its
                > compact (non-xml) notation, and equivalent XML serialization.
                >
                > So I would be interested in finding a solution for this problem -- it
                > seems like the last missing selling point for JSON-WS, to be used for
                > the use case of external entities communication over such an interface
                > (less needed for internal integration where more concrete interfaces,
                > client libs etc, can be used).

                "JSON-WS" hints more at RPC type communication, which is what SMD is
                designed for (which uses JSON Schema):


                > One more thing: defining such a notation might not be extremely
                > difficult -- since validation is NOT the main focus, range & size
                > limitations could be omitted, or kept very simple; more important is
                > JSON-primitive/object-inheritance data-typing and structural
                > definitions.
                > Validation aspects can be handled separately; or as an add-on.

                JSON Schema can as simple as the subset you want to use. {} is a valid
                schema (although not particularly useful). We have discussed formally
                defining a subset of JSON Schema for more traditional data-typing style
                constraints, but when after some discussion of how easy it is to ignore
                non-relevant parts of the spec, it seemed like kind of a silly exercise.
                Of course, if you want to define a non JSON-based data type modeling
                system, that is indeed a completely different matter and exercise. Or
                maybe I am missing what you are really after.

                Kris
              • Tatu Saloranta
                ... Ok. I did notice extends property (but only after sending email), which when combined with other pieces should allow for defining type structures? ...
                Message 7 of 8 , Apr 27, 2009
                • 0 Attachment
                  On Mon, Apr 27, 2009 at 2:25 PM, Kris Zyp <kriszyp@...> wrote:
                  >
                  > Tatu Saloranta wrote:
                  >>
                  >> One thing that is currently missing (AFAIK) from JSON stack is the
                  >> equivalent of useful parts of W3C Schema ("xml schema").
                  ...
                  >
                  > JSON Schema can certainly be used for that purpose, I am using JSON
                  > Schema for typing in Dojo and Persevere, and not just for validating
                  > existing JSON structures.

                  Ok. I did notice 'extends' property (but only after sending email),
                  which when combined with other pieces should allow for defining type
                  structures?

                  >> What I mean by this is that the only real advantage of, say, SOAP over
                  >> similar simpler JSON+Rest approach is that of having language-agnostic
                  >> data model that can be used for full object binding and serialization;
                  >
                  > I am not sure if this exactly what you are talking, but we have been
                  > discussing using JSON Schema as a way to define hyperlink properties in
                  > JSON structures to provide interoperability in RESTful JSON:

                  Basically I am thinking of supporting both schema-first model
                  (generaring classes/stubs from definition) and code first (annotating
                  to produce schema).
                  And for Java, perhaps generate bean-validation annotations, for schema
                  first case.

                  Perhaps what would be nice would be more along the lines of providing
                  more convenient syntax to express already defined json schema model.

                  ...
                  > The "extends" attribute of JSON Schema is designed specifically to meet
                  > the needs of describing data structures with inheritance. Is this
                  > insufficient for your polymorphic description needs?

                  Probably not -- initially I just missed it. :-/

                  It does however need to be coupled with some definition of how to
                  include type identifier in json content ("#type" field? standard name,
                  or perhaps configurable with schema). Is there something in Json
                  schema to define this? Or if not, could it be added?
                  Type is needed to deserialize specific sub-type instance; on
                  serialization full type is known, but on deserialization only declared
                  type, and full type needs to be available from json content (can
                  optimize this out for leaf types).

                  ...
                  > "JSON-WS" hints more at RPC type communication, which is what SMD is
                  > designed for (which uses JSON Schema):

                  True, I should have clarified that. This would be the data/type model
                  that is useful (and sort of required) for WS, but specifically not the
                  web service definition itself (no end points, operations etc defined).
                  Like XML Schema part that Soap/WSDL build on.

                  ...
                  > JSON Schema can as simple as the subset you want to use. {} is a valid
                  > schema (although not particularly useful). We have discussed formally
                  > defining a subset of JSON Schema for more traditional data-typing style
                  > constraints, but when after some discussion of how easy it is to ignore
                  > non-relevant parts of the spec, it seemed like kind of a silly exercise.
                  > Of course, if you want to define a non JSON-based data type modeling
                  > system, that is indeed a completely different matter and exercise. Or
                  > maybe I am missing what you are really after.

                  Ok, thanks. Sub-setting could definitely work. I need to read json
                  schema proposal with some more thought. I am interested in syntax
                  part, as well as some limitations on what kinds of schemas would be
                  allowed -- certain kinds of unions might not be representable on OO
                  side.

                  Also: is the use of json as format for schema a fundamental goal? To
                  me it seems that there are benefits to using more compact and
                  expressive notation, but not much benefit from using JSON (i.e.
                  complexity is not within parsing of schema but in processing it).
                  I don't mind having a json serialization (RelaxNG - like "dual"
                  model), but really like the idea of having something more optimal for
                  the domain.

                  -+ Tatu +-
                Your message has been successfully submitted and would be delivered to recipients shortly.