Hello, I am wrestling with a choice and would like to ask for
opinions. In validating XML instance documents against a W3C XML Schema
instance, I can either rely use @xsi:schemaLocation and rely on it as a hint or
I can infer which schema to apply using some other piece of information from
the document. I believe one of the arguments against using
@xsi:schemaLocation is that the consuming application should arguably be in a
better position to determine which schema to apply than the producer. This
is especially true in situations where a document could be valid against
multiple schemas. My scenario is that a document is either valid or not
but I do not want to discount this argument. Another argument against is
that it is defined as only a hint and that not all tools support it, although
in my case, the tools do support it. My question is, if I did not use/provide @xsi:schemaLocation,
what are some suggested options and means to determine the schema? I will
almost certainly be using a catalog (OASIS) so I believe this will play a role
in the decision. One option I have considered is using the namespace URI
of the root element as a sort of public identifier that could be used by the
catalog resolver but this has limited support in “off-the-shelf”
parsing solutions. For example, Xerces (Java) supports this through their
(XNI) XMLCatalogResolver class but standard SAX EntityResolver(2) does not
expose/report namespaces. The piece that is bugging me a little is that, regardless of
the means of determining the schema, it feels like an extra
step/pass/look-into-the-document is required before the actual parse of the
document. Relying on @xsi:schemaLocation feels much more like relying on
a DOCTYPE for a DTD in that it is recognized during the main parsing step
represented by a standard API call (e.g. xmlreader.parse(…)) (even if
that call does a few passes itself). I could even remove the notion of XSD here and ask the same
question if I were validating against one of multiple RelaxNG schemas.
Since RNG does not have the standardized equivalent of @xsi:schemaLocation that
allows the instance document to say “validate me to this schema”,
it feels like a pre-pass would be needed here too. The Oxygen editor uses
a processing instruction to indicate which RNG file it should use for
validation but I am unsure whether the implementation first does a pass to get
the PI and then another to validate or whether it is able to validate in a
single pass. Am I missing anything here? I appreciate any comments,
alternatives, etc. Thanks, I appreciate it! Matt PS: My scenario involves collections of heterogeneous content
types so each document could be of one of several schema types (but only valid
to one). The effect is that I could not rely on doing a pre-parse (or
regex) on the first of a collection and assume that all docs in that collection
are the same. |