OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
validation against xml schema (xsd)



I am wrestling with a choice and would like to ask for opinions.  In validating XML instance documents against a W3C XML Schema instance, I can either rely use @xsi:schemaLocation and rely on it as a hint or I can infer which schema to apply using some other piece of information from the document.  I believe one of the arguments against using @xsi:schemaLocation is that the consuming application should arguably be in a better position to determine which schema to apply than the producer.  This is especially true in situations where a document could be valid against multiple schemas.  My scenario is that a document is either valid or not but I do not want to discount this argument.  Another argument against is that it is defined as only a hint and that not all tools support it, although in my case, the tools do support it.


My question is, if I did not use/provide @xsi:schemaLocation, what are some suggested options and means to determine the schema?  I will almost certainly be using a catalog (OASIS) so I believe this will play a role in the decision.  One option I have considered is using the namespace URI of the root element as a sort of public identifier that could be used by the catalog resolver but this has limited support in “off-the-shelf” parsing solutions.  For example, Xerces (Java) supports this through their (XNI) XMLCatalogResolver class but standard SAX EntityResolver(2) does not expose/report namespaces.


The piece that is bugging me a little is that, regardless of the means of determining the schema, it feels like an extra step/pass/look-into-the-document is required before the actual parse of the document.  Relying on @xsi:schemaLocation feels much more like relying on a DOCTYPE for a DTD in that it is recognized during the main parsing step represented by a standard API call (e.g. xmlreader.parse(…)) (even if that call does a few passes itself).


I could even remove the notion of XSD here and ask the same question if I were validating against one of multiple RelaxNG schemas.  Since RNG does not have the standardized equivalent of @xsi:schemaLocation that allows the instance document to say “validate me to this schema”, it feels like a pre-pass would be needed here too.  The Oxygen editor uses a processing instruction to indicate which RNG file it should use for validation but I am unsure whether the implementation first does a pass to get the PI and then another to validate or whether it is able to validate in a single pass.


Am I missing anything here?  I appreciate any comments, alternatives, etc.  Thanks, I appreciate it!




PS:  My scenario involves collections of heterogeneous content types so each document could be of one of several schema types (but only valid to one).  The effect is that I could not rely on doing a pre-parse (or regex) on the first of a collection and assume that all docs in that collection are the same.






[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS