Loading ...
Sorry, an error occurred while loading the content.

Collection primitives (aka "hash tables")

Expand Messages
  • David Hull
    First of all, hi. I m the Dave from TIBCO who will be attending the F2F in Almaden. Though I ve been involved with SOAP at TIBCO for about a year now, I
    Message 1 of 1 , May 30, 2002
    • 0 Attachment
      First of all, hi. I'm the "Dave from TIBCO" who will be attending the F2F in
      Almaden. Though I've been involved with SOAP at TIBCO for about a year now, I
      haven't picked up the soapbuilders discussion until just recently. I have,
      however, just plowed through the last month's archived messages, so I have at
      least a bit of context.

      I agree it would be good to have a richer set of collection primitives than just
      "array". I would even say that arrays are more an implementation primitive than
      a data modelling primitive, and except in numerical applications like matrix
      algebra, they probably have less business going out on the wire than maps do.
      And if you want to be strict about it, arrays are a special case of map anyway.

      I also don't care for the use of "hash table". After all, if you want to put a
      hash table out on the wire, you should put out an array of buckets, plus some
      indication of the hash function.

      So without further ado, here are the collection primitives I'd like to see. How
      to get them into the appropriate standards is a totally different story. Please
      be patient as I give rudimentary defintions for concepts we all know inside and
      out. I'm trying to avoid ambiguity, not teach basic CS to a clearly advanced
      audience.

      List/sequence

      A list is an ordered colleciton with duplicates permitted. By "ordered" I mean
      if you echo back a list, you need to echo back the elements in the same order
      you got them. By nature, every useful wire format supports this one way or
      another, and literal XML and Soap Encoding are no exceptions. In fact, list is
      the only real wire format primitive for collections. Anything else has to be
      represented as a list.

      Set

      A set is an unordered collection with no duplicates permitted. As was pointed
      out, if you echo back a set, you need not echo the elements in the same order
      they came in, as long as you echo back every element that came in, and no
      others. Two lists represent the same set iff they contain the same elements.
      So [2 3 5 7 11] and [11 5 7 3 2] both represent the set {2 3 5 7 11}. For that
      matter, so would [2 2 3 3 5 5 7 7 11 11 11], but there is little to be gained by
      this, and serializers probably SHOULD (possibly MUST) eliminate duplicates on
      the wire.

      Relation/dataset/M:N relation/multimap

      A relation is just a set of ordered pairs (if you're a mathematician) or tuples
      (if you're a hacker). So nothing new is needed here.

      Map/table/1:N relation/"hash table"

      A map is a special kind of relation: It's a set of ordered pairs, where no two
      elements share the same first component. That is, {(1, 2) (2, 4)(3, 6)} is a
      valid map, as is {(1, 2)(2, 2)(3, 2)}, but {(1, 2)(1, 3)(2, 4)(3, 6)} isn't. So
      it's not enough to say that a map is just a list (or array, or even set) of
      (name, value) pairs. A serializer MUST NOT produce a map with two different
      values for the same name.

      1:1 relation/1:1 map

      A 1:1 relation is a set of ordered pairs where not only the first but the
      second component is unique.

      That's it. It's basically the RDB primitives plus list (which is not really
      primitive semantically, but has a special status on the wire). Rather than
      special-case 1:1 and 1:N relations, i's probably better just to allow uniqueness
      constraints on any component(s) of the element type.

      So the metadata to specify are

      Is it a list or a set?
      If it's a set, which if any components of the elements need to be unique

      XSD appears to have a concept of uniqueness of keys. I'm not sure if it has a
      concept of "order doesn't matter as long as the same elements are present."
      Soap Encoding, as it stands, has neither.

      Some collections I would not include as primitives are

      Bags

      A bag is an unordered list with duplicates permitted. Equivalently, it's a map
      to the natural numbers. I don't see bags mentioned outside computer science
      texts, and they're easily built if needed.

      Priority queues

      A priority queue is a set, together with an ordering (Comaprator), from which
      you can extract the least element. On the wire, it's just a set, with the
      comparator generally implied by the element type, though it could be supplied
      explicitly (e.g., by a class name). In any case, nothing special needs to be
      done to put a priority queue on the wire.

      Arrays

      OK, they're already there, and they're not going away. But dense arrays work
      fine as lists, possibly with a length hint included, and for my money it's a lot
      nicer to represent a sparse array as {(1, x)(1000000, y)} than to mess around
      with position attributes and such. Nested arrays are no different from nested
      anything else, and again, you could include hints akin to the range and
      dimensions in Soap Encoding array types. The point is, unlike sets and maps,
      arrays don't require any special semantics on the wire. A dense array should be
      echoed back in order, like a list. A sparse array can be echoed back like any
      other map, and has the same restriction of unique first element as does any
      other map.
    Your message has been successfully submitted and would be delivered to recipients shortly.