validation against xml schema (xsd)

Hello,

I am wrestling with a choice and would like to ask for opinions. In validating XML instance documents against a W3C XML Schema instance, I can either rely use @xsi:schemaLocation and rely on it as a hint or I can infer which schema to apply using some other piece of information from the document. I believe one of the arguments against using @xsi:schemaLocation is that the consuming application should arguably be in a better position to determine which schema to apply than the producer. This is especially true in situations where a document could be valid against multiple schemas. My scenario is that a document is either valid or not but I do not want to discount this argument. Another argument against is that it is defined as only a hint and that not all tools support it, although in my case, the tools do support it.

My question is, if I did not use/provide @xsi:schemaLocation, what are some suggested options and means to determine the schema? I will almost certainly be using a catalog (OASIS) so I believe this will play a role in the decision. One option I have considered is using the namespace URI of the root element as a sort of public identifier that could be used by the catalog resolver but this has limited support in “off-the-shelf” parsing solutions. For example, Xerces (Java) supports this through their (XNI) XMLCatalogResolver class but standard SAX EntityResolver(2) does not expose/report namespaces.

The piece that is bugging me a little is that, regardless of the means of determining the schema, it feels like an extra step/pass/look-into-the-document is required before the actual parse of the document. Relying on @xsi:schemaLocation feels much more like relying on a DOCTYPE for a DTD in that it is recognized during the main parsing step represented by a standard API call (e.g. xmlreader.parse(…)) (even if that call does a few passes itself).

I could even remove the notion of XSD here and ask the same question if I were validating against one of multiple RelaxNG schemas. Since RNG does not have the standardized equivalent of @xsi:schemaLocation that allows the instance document to say “validate me to this schema”, it feels like a pre-pass would be needed here too. The Oxygen editor uses a processing instruction to indicate which RNG file it should use for validation but I am unsure whether the implementation first does a pass to get the PI and then another to validate or whether it is able to validate in a single pass.

Am I missing anything here? I appreciate any comments, alternatives, etc. Thanks, I appreciate it!

Matt

PS: My scenario involves collections of heterogeneous content types so each document could be of one of several schema types (but only valid to one). The effect is that I could not rely on doing a pre-parse (or regex) on the first of a collection and assume that all docs in that collection are the same.