Lists Home |
Date Index |
Christopher R. Maden wrote:
>Surely I am not the first person to try doing this, but I can't seem to
>find any prior art nor any straightforward way to do this.
>I have data that may be arbitrarily large and may conform to arbitrary
>XSDL schemata. Because of the size, I want to process the document as an
>event stream (hence SAX), and I want to make different processing
>decisions based on the declared types from the schema and based on the
>ultimate base types, if there's any type inheritance.
Here's an outline of one way to proceed using Xerces (I've only used
Xerces-J; I don't know if what follows applies to Xerces-P):
It's unclear from your post whether you have all the schemas available
in advance. However, it suffices to have parsed the XSD grammars
relevant to a particular document (into a grammar pool) before doing
what follows. This might involve looking at the namespace of the root
element and any xsi:schemaLocation attribute on that element and/or
using some custom entity resolver and fetching the relevant grammar and
anything it imports or includes.
Having found all the grammars, you retrieve the grammar for the root
element's namespace from the pool, and convert it to an XSModel (from
the XML Schema API as specified on the Worldwide Web Consortium web
site). Given the root element's qualified name, you can get its
XSElementDeclaration from the XSModel, from there its type declaration,
and from there the base types. You might also need to look at any
xsi:type attribute on the root element in case the content is specified
by a derived type of the declared type. If so, you can examine that
derived type declaration also from the information in the XSModel. This
can all be done in handling startElement() for the root element.
The problem is harder if you want to handle elements deeper down in the
document whose association with components in the schema depend upon the
details of the grammar. The easiest way to handle these would be to
turn on validation and PSVI annotation in your parser, and get the
XSElementDeclaration for any element from the PSVI information.
Probably you would have to access the PSVI from endElement().