On Jul 3, 2005, at 00:42, B Tommie Usdin wrote:
> At 11:14 PM +0200 7/2/05, Jirka Kosek wrote:
>> Primary motivation (although not stated clearly) for my proposal was
>> not validation, but guided editing of XML document. Describing
>> complex validation is out of scope of my proposal, something much
>> more powerfull like NRL could be used.
> But if there is a "standard" there will be pressure to use it for
> everything conceivable, whether it is appropriate or not.
I agree. I think it is a desirable feature that the RELAX NG validation
process takes two *independent* inputs: the schema and the document.
(Mentioned also by James Clark in the famous IETF post:
I can see three main cases here:
1) Apps that want to check their input in an off-the-shelf manner
2) Quality assurance tools
3) Editors with autocomplete/error high-lighting
In case 1) an application receives input from an outside source and
cannot trust that the outside source produces correct output (correct
in the sense that the receiving application works properly when using
it as input). In order to avoid hand coding checks for all the possible
errors situations, the developer of the application decides to embed a
RELAX NG validator and an appropriate schema. Then in the hand-coded
part of the application can trust that anything it sees conforms to the
If the input can smuggle in its own rules the way DOCTYPE and
schemaLocation allow it to do, the app can no longer trust the
validation stage, which defeats the whole point of embedding the
validator. Therefore, I think a PI for the input to specify its own
schema is totally wrong considering case 1).
In case 2) a user has a document (not necessarily created by the user
him/herself) and is interested in the syntactic correctness of the
document. If the document is allowed to define the rules, the user is
getting the answer to the question "Does this document conform to the
grammar it sets for itself?"
works like this. It gives you a little badge
of validity to show off, but it doesn't tell you if the internal subset
was used to introduce radically different home grown rules than what
the "This document is valid FooML" message implies. All you know is
that whoever produced the document managed to adhere to his/her own
rules. Then what? The rules could be anything.
- being a RELAX NG validator - works
differently. It allows the user to pose the (in my opinion much more
useful) question "Does this document conform to this grammar?" It does
not give out a badge, but after the validation the user knows what
schema the document did or did not conform to. I think RELAX NG-based
QA tools would regress to a less useful level if the user of a QA tool
only knew that the document is internally consistent without knowing
whether it adheres to the particular grammar the user is interested in.
Therefore, I think a PI for the input to specify its own schema would
harm case 2).
I agree that in case 3) it is desirable to use a RELAX NG schema for
editing assistance. However, I think such use is a private matter
between the user and his/her editor and, therefore, it is not necessary
to expose such private editing method details to whoever subsequently
receives the document. Moreover, the schema repository is likely to be
local, so the most obvious references ie. installation-specific file
system paths would be useless to others making the PI useful only
privately. OTOH, registering common identifiers for schemas and
abstracting away the file paths would probably be an overkill and for
the same effort you could use some configurable association method that
does not contaminate the document.
Also, having to contaminate the document itself with editing
process-specific artifacts can be a sign of a design flaw in the
editor. In the common cases, the schema could be bound to the root
namespace or to the filename extension (as is customary with
programming language-specific syntax highlighting in text editors).
Since case 3) seems more like a private issue, I think central
endorsement of a standard PI is not necessary for case 3).
BTW, I think DOCTYPE and schemaLocation are design bugs, because they
foil the point of cases 1) and 2).