Loading ...
Sorry, an error occurred while loading the content.

Re: [rng-users] Lets standardize PI for associating Relax NG schema with XML document

Expand Messages
  • Henri Sivonen
    ... I agree. I think it is a desirable feature that the RELAX NG validation process takes two *independent* inputs: the schema and the document. (Mentioned
    Message 1 of 96 , Jul 3, 2005
    • 0 Attachment
      On Jul 3, 2005, at 00:42, B Tommie Usdin wrote:

      > At 11:14 PM +0200 7/2/05, Jirka Kosek wrote:
      >> Primary motivation (although not stated clearly) for my proposal was
      >> not validation, but guided editing of XML document. Describing
      >> complex validation is out of scope of my proposal, something much
      >> more powerfull like NRL could be used.
      > But if there is a "standard" there will be pressure to use it for
      > everything conceivable, whether it is appropriate or not.

      I agree. I think it is a desirable feature that the RELAX NG validation
      process takes two *independent* inputs: the schema and the document.
      (Mentioned also by James Clark in the famous IETF post:

      I can see three main cases here:

      1) Apps that want to check their input in an off-the-shelf manner
      2) Quality assurance tools
      3) Editors with autocomplete/error high-lighting

      In case 1) an application receives input from an outside source and
      cannot trust that the outside source produces correct output (correct
      in the sense that the receiving application works properly when using
      it as input). In order to avoid hand coding checks for all the possible
      errors situations, the developer of the application decides to embed a
      RELAX NG validator and an appropriate schema. Then in the hand-coded
      part of the application can trust that anything it sees conforms to the

      If the input can smuggle in its own rules the way DOCTYPE and
      schemaLocation allow it to do, the app can no longer trust the
      validation stage, which defeats the whole point of embedding the
      validator. Therefore, I think a PI for the input to specify its own
      schema is totally wrong considering case 1).

      In case 2) a user has a document (not necessarily created by the user
      him/herself) and is interested in the syntactic correctness of the
      document. If the document is allowed to define the rules, the user is
      getting the answer to the question "Does this document conform to the
      grammar it sets for itself?"

      http://validator.w3.org/ works like this. It gives you a little badge
      of validity to show off, but it doesn't tell you if the internal subset
      was used to introduce radically different home grown rules than what
      the "This document is valid FooML" message implies. All you know is
      that whoever produced the document managed to adhere to his/her own
      rules. Then what? The rules could be anything.

      http://hsivonen.iki.fi/validator/ - being a RELAX NG validator - works
      differently. It allows the user to pose the (in my opinion much more
      useful) question "Does this document conform to this grammar?" It does
      not give out a badge, but after the validation the user knows what
      schema the document did or did not conform to. I think RELAX NG-based
      QA tools would regress to a less useful level if the user of a QA tool
      only knew that the document is internally consistent without knowing
      whether it adheres to the particular grammar the user is interested in.
      Therefore, I think a PI for the input to specify its own schema would
      harm case 2).

      I agree that in case 3) it is desirable to use a RELAX NG schema for
      editing assistance. However, I think such use is a private matter
      between the user and his/her editor and, therefore, it is not necessary
      to expose such private editing method details to whoever subsequently
      receives the document. Moreover, the schema repository is likely to be
      local, so the most obvious references ie. installation-specific file
      system paths would be useless to others making the PI useful only
      privately. OTOH, registering common identifiers for schemas and
      abstracting away the file paths would probably be an overkill and for
      the same effort you could use some configurable association method that
      does not contaminate the document.

      Also, having to contaminate the document itself with editing
      process-specific artifacts can be a sign of a design flaw in the
      editor. In the common cases, the schema could be bound to the root
      namespace or to the filename extension (as is customary with
      programming language-specific syntax highlighting in text editors).

      Since case 3) seems more like a private issue, I think central
      endorsement of a standard PI is not necessary for case 3).

      BTW, I think DOCTYPE and schemaLocation are design bugs, because they
      foil the point of cases 1) and 2).

      Henri Sivonen
    • Robin Berjon
      ... Sorry to answer so late, I ve been on vacation. Just for the record, if you are going to be counting heads in the RNG community, I think I would be best
      Message 96 of 96 , Jul 31, 2005
      • 0 Attachment
        MURATA Makoto (FAMILY Given) wrote:
        > If the camp trying to standardize PIs is not the majority
        > of the RELAX NG community, I do not think that PIs will take off.
        > Here is my understanding of the current status. Please let me know
        > if I misinterprets somebody.
        > For schema-associating PIs
        > Jirka Kosek
        > Robin Berjon
        > George Cristian Bina

        Sorry to answer so late, I've been on vacation. Just for the record, if
        you are going to be counting heads in the RNG community, I think I would
        be best counted as "neutral". My take on this is that a schema PI is
        just as bad an idea as a stylesheet PI, which is to say that it's most
        of the time a very bad idea (and in the absence of a processing model,
        dreadfully underspecified in its interactions with other specs at that),
        but *if* people are going to be doing it anyway (as seems to be the
        case) then I would prefer that there is a standard made by people who
        understand the issues and limitations of this approach rather than ad
        hoc proprietary options mades by people who are probably smart and
        probably understand some of the problems, but won't benefit from the
        head-banging that some form of community standard would get (or rather,
        is getting).

        Robin Berjon
        Senior Research Scientist
        Expway, http://expway.com/
      Your message has been successfully submitted and would be delivered to recipients shortly.