Loading ...
Sorry, an error occurred while loading the content.

Re: [json] Need for "abstract data model" to support JSON, web services ("useful parts of w3c schema", not validation)

Expand Messages
  • Tatu Saloranta
    ... Exactly, since more dynamic languages can use duck typing , or more loose conversions. It s static languages that need extra help. ... Yes. I don t think
    Message 1 of 8 , Apr 27, 2009
    • 0 Attachment
      On Mon, Apr 27, 2009 at 10:14 AM, John Cowan <cowan@...> wrote:
      > Tatu Saloranta scripsit:
      >
      >> One thing that is currently missing (AFAIK) from JSON stack is the
      >> equivalent of useful parts of W3C Schema ("xml schema").
      >
      > Very interesting. I've been thinking about this a bit, and here's a
      > sketch of what I've come up with. I'm assuming that the target language
      > looks like C++, Java, or C#: that is, objects have statically typed

      Exactly, since more dynamic languages can use "duck typing", or more
      loose conversions. It's static languages that need extra help.

      > members, there exists a subtyping relationship between types such that
      > a subtype has all the members of its supertypes, and Null (the type of
      > null) is a subtype of every object type. Note that it doesn't matter

      Yes. I don't think nulls should be typed; and probably only limitation there

      > whether the language provides single or multiple inheritance.

      Hmmh. I have to think about that a bit -- you may be right. I have
      only considered single-inheritance case so far.

      > In XML, data binding is based on the element name (for DTD-based systems)
      > or on the element name plus the types of ancestor elements (for W3C XML
      > Schema based systems). JSON objects have no convenient analogue of the
      > element name, which is not only in a distinguished place in the

      True. But part of it is just "off by one" problem -- there's no name
      at root level, but beyond this, there are field names that can be
      mapped to schema contextually.

      One thing I forgot to mention is that I am effectively using Java
      objects as the schema: and there get/set methods and fields do have
      names that you can match to JSON object property names (with
      overriding by annotations or other configuration).
      Same can not be done with explicit schema however, problem has to be
      resolved differently like you explain.

      > but is up front, making data binding on the fly practical. We could just
      > require a "name" key in each object, but that would be IMHO against the
      > spirit of JSON.

      Agreed, that seems wrong.

      > So let's instead write down for each JSON object type a predicate that
      > identifies it. Here's a tentative list of primitive predicates:
      >
      > isNull(keyName)

      Would this mean it has to be null, or that it is nullable (allowsNull)?

      > isUndefined(keyName)

      does this mean "any type"? Sort of fallback, xml "any" type.

      > isBoolean(keyName)
      > isTrue(keyName)
      > isFalse(keyName)
      > isNumeric(keyName)

      perhaps also isInteger/integral?

      > hasNumericValue(keyName, number)
      > hasStringValue(keyName, number)

      (string instead of 'number')

      > hasParentTypes(typename, ...)
      >
      > The hasParentTypes predicate accepts an arbitrary number of type names,
      > which are in order the type of the parent object, the grandparent object, ....
      > This predicate is required for WXS-equivalent capability; omitting
      > it gives DTD-equivalent capability. In addition, the usual and, or,
      > and not operators can be applied to construct complex predicates from
      > these primitives. The type-assignment engine would have to validate
      > that no object can be assigned to more than one independent type.

      Yes.

      > In addition to the predicate, each type also has a set of key->type
      > mappings, which will be validated during type assignment. The type in
      ...
      > it's probably best to specify in the mapping language one or more
      > supertypes for each type, with the understanding that the key->type
      > mappings of the supertype(s) are also assumed for the subtype. This is
      > just a notational convenience, but OO programmers expect it. In addition,
      > the subtype predicate as written is logically ANDed with the supertype
      > predicate(s) to form the effective predicate for the subtype. This
      > hopefully preserves the Liskov substitution principle for the bound types.
      > This is the only exception to the rule that type predicates must be
      > independent.
      >
      > Does this look like what you had in mind?

      It sounds good so far, although I need to read this with more thought.
      You have thought more deeply about hard problems than I have, I think. :)

      Some additional constraints, compared to other schema work that I
      thought of were:

      - Not allowing union types that are not mappable to OO ("value can be
      either an array or boolean"): this just means there are legal JSON
      constructs for which no strict can be defined.
      This is in line with the idea of "schema" being geared towards
      mapping to/from OO instances, not for constraining general JSON
      content.

      -+ Tatu +-
    • Tatu Saloranta
      On Mon, Apr 27, 2009 at 10:36 AM, John David Duncan ... I don t think it is directly applicable in PB, schema has much more fundamental role, being mandatory
      Message 2 of 8 , Apr 27, 2009
      • 0 Attachment
        On Mon, Apr 27, 2009 at 10:36 AM, John David Duncan
        <john.david.duncan@...> wrote:
        > Tatu,
        >
        > I don't really understand XML schema or WSDL (which I think you're
        > referring to), so I could be quite off target here ...
        >
        > but maybe Google Protocol Buffers fit the problem?
        >
        > http://code.google.com/apis/protocolbuffers/docs/proto.html

        I don't think it is directly applicable in PB, schema has much more
        fundamental role, being mandatory for operating anything, and to
        generate code (unless I'm mistaken). So in a way it is a strict
        schema-first format. I am hoping for something that allows both
        schema- and code-first approaches, somewhat similar to XML Schema in
        that respect.

        But maybe some aspects of PB might be useful: syntax, other ideas
        regarding versioning?

        -+ Tatu +-
      • John Cowan
        ... All these predicates are on the JSON object itself, because they are used before the object s type is known. So isNull( foo ) means that there is a key
        Message 3 of 8 , Apr 27, 2009
        • 0 Attachment
          Tatu Saloranta scripsit:

          > > isNull(keyName)
          >
          > Would this mean it has to be null, or that it is nullable (allowsNull)?

          All these predicates are on the JSON object itself, because they are used
          before the object's type is known. So isNull("foo") means that there
          is a key "foo" in the JSON object value is null, and likewise with all
          the other primitives.

          > > isUndefined(keyName)
          >
          > does this mean "any type"? Sort of fallback, xml "any" type.

          No, it means that the specified key does not exist. I use the name
          "undefined" for cultural compatibility with JavaScript. There would be
          little point in an isAnyType predicate, because it would match anything
          *except* a missing key.

          > > isNumeric(keyName)
          >
          > perhaps also isInteger/integral?

          That's reasonable. In that case it would be useful to be able to map
          keys to integer as well, and perhaps standardized subtypes of integer too.

          > - [U]nion types that are not mappable to OO ("value can be
          > either an array or boolean"): this just means there are legal JSON
          > constructs for which no strict can be defined.

          Multiple inheritance is a subset of this (doesn't handle primitives).
          If type foobar is a subtype of types foo and bar, then effectively it
          is a union of them.

          One question: should the predicates and the maps do on-the-fly conversion?
          For example, if we specify a predicate of hasNumericValue("foo", 0),
          does "foo": "0" match, or do we require "foo": 0? Likewise, when
          mapping, if the map specifies bar->String and the object contains
          "bar": 123, does the bar field get "123", or is that an error?
          These questions are independent. I'd favor doing the conversions.

          --
          You let them out again, Old Man Willow! John Cowan
          What you be a-thinking of? You should not be waking! cowan@...
          Eat earth! Dig deep! Drink water! Go to sleep!
          Bombadil is talking. http://ccil.org/~cowan
        • Kris Zyp
          ... JSON Schema can certainly be used for that purpose, I am using JSON Schema for typing in Dojo and Persevere, and not just for validating existing JSON
          Message 4 of 8 , Apr 27, 2009
          • 0 Attachment
            Tatu Saloranta wrote:
            >
            >
            > One thing that is currently missing (AFAIK) from JSON stack is the
            > equivalent of useful parts of W3C Schema ("xml schema").
            > I know there is a JSON Schema effort underway already, but if I
            > understand things correctly, its focus on validation aspects, and not
            > data typing or modeling .
            > Although w3c Schema can be used for that (as can DTDs and RelaxNG,
            > latter of which does this better), to me the main value of Schema is
            > as abstract data typing/model.

            JSON Schema can certainly be used for that purpose, I am using JSON
            Schema for typing in Dojo and Persevere, and not just for validating
            existing JSON structures.

            > What I mean by this is that the only real advantage of, say, SOAP over
            > similar simpler JSON+Rest approach is that of having language-agnostic
            > data model that can be used for full object binding and serialization;

            I am not sure if this exactly what you are talking, but we have been
            discussing using JSON Schema as a way to define hyperlink properties in
            JSON structures to provide interoperability in RESTful JSON:
            -------------------------------------

            > including code generation if need be. You can (painfully) define such
            > model, and then generate or bind conforming data to objects.
            > This helps in interoperability as you can describe data content in
            > language+platform independent way, but with ability to get specific
            > bindings reliably.
            > With JSON you can already do data mapping/binding quite well, except
            > for one area of problems: that of handling polymorphism (inheritance).
            > Although that can be worked around on language-by-language basis
            > (adding "class" element in Maps), there is no generic way to do it.

            The "extends" attribute of JSON Schema is designed specifically to meet
            the needs of describing data structures with inheritance. Is this
            insufficient for your polymorphic description needs?

            > Another thing that I feel most existing Schema languages get wrong is
            > the false goal of having to be expressed in format being described
            > (XML schema written in xml). Instead, notation absolutely should be a
            > DSL: right tool for the job. This is especially crucial for JSON
            > because of its simplicity: trying to shoehorn a definition language in
            > JSON seems unnecessarily cumbersome.
            > It is ok to have a one-to-one mapping between optimal DSL and JSON (or
            > xml) if need be -- RelaxNG shows a good way of doing that with its
            > compact (non-xml) notation, and equivalent XML serialization.
            >
            > So I would be interested in finding a solution for this problem -- it
            > seems like the last missing selling point for JSON-WS, to be used for
            > the use case of external entities communication over such an interface
            > (less needed for internal integration where more concrete interfaces,
            > client libs etc, can be used).

            "JSON-WS" hints more at RPC type communication, which is what SMD is
            designed for (which uses JSON Schema):


            > One more thing: defining such a notation might not be extremely
            > difficult -- since validation is NOT the main focus, range & size
            > limitations could be omitted, or kept very simple; more important is
            > JSON-primitive/object-inheritance data-typing and structural
            > definitions.
            > Validation aspects can be handled separately; or as an add-on.

            JSON Schema can as simple as the subset you want to use. {} is a valid
            schema (although not particularly useful). We have discussed formally
            defining a subset of JSON Schema for more traditional data-typing style
            constraints, but when after some discussion of how easy it is to ignore
            non-relevant parts of the spec, it seemed like kind of a silly exercise.
            Of course, if you want to define a non JSON-based data type modeling
            system, that is indeed a completely different matter and exercise. Or
            maybe I am missing what you are really after.

            Kris
          • Tatu Saloranta
            ... Ok. I did notice extends property (but only after sending email), which when combined with other pieces should allow for defining type structures? ...
            Message 5 of 8 , Apr 27, 2009
            • 0 Attachment
              On Mon, Apr 27, 2009 at 2:25 PM, Kris Zyp <kriszyp@...> wrote:
              >
              > Tatu Saloranta wrote:
              >>
              >> One thing that is currently missing (AFAIK) from JSON stack is the
              >> equivalent of useful parts of W3C Schema ("xml schema").
              ...
              >
              > JSON Schema can certainly be used for that purpose, I am using JSON
              > Schema for typing in Dojo and Persevere, and not just for validating
              > existing JSON structures.

              Ok. I did notice 'extends' property (but only after sending email),
              which when combined with other pieces should allow for defining type
              structures?

              >> What I mean by this is that the only real advantage of, say, SOAP over
              >> similar simpler JSON+Rest approach is that of having language-agnostic
              >> data model that can be used for full object binding and serialization;
              >
              > I am not sure if this exactly what you are talking, but we have been
              > discussing using JSON Schema as a way to define hyperlink properties in
              > JSON structures to provide interoperability in RESTful JSON:

              Basically I am thinking of supporting both schema-first model
              (generaring classes/stubs from definition) and code first (annotating
              to produce schema).
              And for Java, perhaps generate bean-validation annotations, for schema
              first case.

              Perhaps what would be nice would be more along the lines of providing
              more convenient syntax to express already defined json schema model.

              ...
              > The "extends" attribute of JSON Schema is designed specifically to meet
              > the needs of describing data structures with inheritance. Is this
              > insufficient for your polymorphic description needs?

              Probably not -- initially I just missed it. :-/

              It does however need to be coupled with some definition of how to
              include type identifier in json content ("#type" field? standard name,
              or perhaps configurable with schema). Is there something in Json
              schema to define this? Or if not, could it be added?
              Type is needed to deserialize specific sub-type instance; on
              serialization full type is known, but on deserialization only declared
              type, and full type needs to be available from json content (can
              optimize this out for leaf types).

              ...
              > "JSON-WS" hints more at RPC type communication, which is what SMD is
              > designed for (which uses JSON Schema):

              True, I should have clarified that. This would be the data/type model
              that is useful (and sort of required) for WS, but specifically not the
              web service definition itself (no end points, operations etc defined).
              Like XML Schema part that Soap/WSDL build on.

              ...
              > JSON Schema can as simple as the subset you want to use. {} is a valid
              > schema (although not particularly useful). We have discussed formally
              > defining a subset of JSON Schema for more traditional data-typing style
              > constraints, but when after some discussion of how easy it is to ignore
              > non-relevant parts of the spec, it seemed like kind of a silly exercise.
              > Of course, if you want to define a non JSON-based data type modeling
              > system, that is indeed a completely different matter and exercise. Or
              > maybe I am missing what you are really after.

              Ok, thanks. Sub-setting could definitely work. I need to read json
              schema proposal with some more thought. I am interested in syntax
              part, as well as some limitations on what kinds of schemas would be
              allowed -- certain kinds of unions might not be representable on OO
              side.

              Also: is the use of json as format for schema a fundamental goal? To
              me it seems that there are benefits to using more compact and
              expressive notation, but not much benefit from using JSON (i.e.
              complexity is not within parsing of schema but in processing it).
              I don't mind having a json serialization (RelaxNG - like "dual"
              model), but really like the idea of having something more optimal for
              the domain.

              -+ Tatu +-
            Your message has been successfully submitted and would be delivered to recipients shortly.