Lists Home |
Date Index |
Title: Bulk XSD validation in Java
If you chose to tackle this job using Saxon, you could do
it as follows. Create a Saxon Configuration object, which acts as a container
for a schema cache. Then write some kind of listener that listens for the events
indicating a validation request (or do it as an HTTP servlet). When a
request arrives, fire off a new thread to do the validation: you can use the
standard JAXP validation interface, or Saxon's internal interface if you want
more control. If you need to, you can write a URIResolver that interprets the
schemaLocation attribute, giving the option to redirect the URIs. If a document
requests a schema that is already loaded in the cache, then it will be used from
the cache; otherwise it will be fetched from disk, compiled, and added to the
The only limitations with this approach are that (a) the
cache can only contain one schema component with any given name (which may be a
problem if you have many unrelated no-namespace schemas), and (b) a newly-loaded
schema isn't allowed to "modify" definitions that have already been used.
"Modifications" here includes using xs:redefine, adding to a substitution group,
or extending a complex type. You can usually get around the "modifying"
restrictions by preloading schemas into the cache rather than adding them
incrementally as they are first encountered.
I've got a java process that needs to
continously validate xml documents according to the w3c schemas they indicate
in their xsd:schemaLocations. The documents arrive at a high rate and
must be processed as quickly as possible. The exact schemas they employ
are not known ahead of time and there may be several of them required to
validate each document.
My question is, what library/libraries are
appropriate in this situation and how do I tell them to only load the required
schema(s) only once? Any advice?