RE: [xml-dev] validation against xml schema (xsd)

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "Johnson, Matthew C. (LNG-HBE)" <Matthew.C.Johnson@lexisnexis.com>
To: "George Cristian Bina" <george@oxygenxml.com>
Date: Fri, 6 Mar 2009 08:04:35 -0500

George,

Thanks very much for this information and for your thoughts.  They will
be useful!

Matt

> -----Original Message-----
> From: George Cristian Bina [mailto:george@oxygenxml.com]
> Sent: Thursday, March 05, 2009 4:00 PM
> To: Johnson, Matthew C. (LNG-HBE)
> Cc: xml-dev@lists.xml.org
> Subject: Re: [xml-dev] validation against xml schema (xsd)
> 
> Hi Matt,
> 
> You can do a first parse and stop once you reach the root element, for
> instance by throwing an exception on the first startElement callback.
> That will give you enough information about the document to determine
> the schema to use. While you do this parse you can buffer what the
> parser reads and then start the validation feeding the parser with the
> buffered content and then the remaining content of your document. You
> can find an example of this in Jing, see the AutoSchemaReader and the
> RewindableReader and RewindableInputStream classes:
> 
> http://code.google.com/p/jing-
>
trang/source/browse/trunk/mod/validate/src/main/com/thaiopensource/valid
at
> e/auto/AutoSchemaReader.java
> http://code.google.com/p/jing-
>
trang/source/browse/trunk/mod/validate/src/main/com/thaiopensource/valid
at
> e/auto/RewindableReader.java
> http://code.google.com/p/jing-
>
trang/source/browse/trunk/mod/validate/src/main/com/thaiopensource/valid
at
> e/auto/RewindableInputStream.java
> 
> Best Regards,
> George
> --
> George Cristian Bina
> <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
> http://www.oxygenxml.com
> 
> 
> Johnson, Matthew C. (LNG-HBE) wrote:
> > Hello,
> >
> >
> >
> > I am wrestling with a choice and would like to ask for opinions.  In
> > validating XML instance documents against a W3C XML Schema instance,
I
> > can either rely use @xsi:schemaLocation and rely on it as a hint or
I
> > can infer which schema to apply using some other piece of
information
> > from the document.  I believe one of the arguments against using
> > @xsi:schemaLocation is that the consuming application should
arguably be
> > in a better position to determine which schema to apply than the
> > producer.  This is especially true in situations where a document
could
> > be valid against multiple schemas.  My scenario is that a document
is
> > either valid or not but I do not want to discount this argument.
> > Another argument against is that it is defined as only a hint and
that
> > not all tools support it, although in my case, the tools do support
it.
> >
> >
> >
> > My question is, if I did not use/provide @xsi:schemaLocation, what
are
> > some suggested options and means to determine the schema?  I will
almost
> > certainly be using a catalog (OASIS) so I believe this will play a
role
> > in the decision.  One option I have considered is using the
namespace
> > URI of the root element as a sort of public identifier that could be
> > used by the catalog resolver but this has limited support in
> > "off-the-shelf" parsing solutions.  For example, Xerces (Java)
supports
> > this through their (XNI) XMLCatalogResolver class but standard SAX
> > EntityResolver(2) does not expose/report namespaces.
> >
> >
> >
> > The piece that is bugging me a little is that, regardless of the
means
> > of determining the schema, it feels like an extra
> > step/pass/look-into-the-document is required before the actual parse
of
> > the document.  Relying on @xsi:schemaLocation feels much more like
> > relying on a DOCTYPE for a DTD in that it is recognized during the
main
> > parsing step represented by a standard API call (e.g.
> > xmlreader.parse(...)) (even if that call does a few passes itself).
> >
> >
> >
> > I could even remove the notion of XSD here and ask the same question
if
> > I were validating against one of multiple RelaxNG schemas.  Since
RNG
> > does not have the standardized equivalent of @xsi:schemaLocation
that
> > allows the instance document to say "validate me to this schema", it
> > feels like a pre-pass would be needed here too.  The Oxygen editor
uses
> > a processing instruction to indicate which RNG file it should use
for
> > validation but I am unsure whether the implementation first does a
pass
> > to get the PI and then another to validate or whether it is able to
> > validate in a single pass.
> >
> >
> >
> > Am I missing anything here?  I appreciate any comments,
alternatives,
> > etc.  Thanks, I appreciate it!
> >
> >
> >
> > Matt
> >
> >
> >
> > PS:  My scenario involves collections of heterogeneous content types
so
> > each document could be of one of several schema types (but only
valid to
> > one).  The effect is that I could not rely on doing a pre-parse (or
> > regex) on the first of a collection and assume that all docs in that
> > collection are the same.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >

References:
- validation against xml schema (xsd)
  - From: "Johnson, Matthew C. (LNG-HBE)" <Matthew.C.Johnson@lexisnexis.com>
- Re: [xml-dev] validation against xml schema (xsd)
  - From: George Cristian Bina <george@oxygenxml.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]