[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: There can be only one! (was are we losing our grammar?)
- From: Rick Jelliffe <ricko@allette.com.au>
- To: xml-dev@lists.xml.org
- Date: Thu, 08 Feb 2001 02:14:27 +0800
From: Charles Reitzel <creitzel@mediaone.net>
>In your other posting, you infer that the schema designer can control the
>use of DOM vs. streaming. Am I in thinking Schematron implements it's own
>internal model of the input and by judicious use of XPath you can control
>how big or small that model becomes?
Not in the XSLT-based (nor the Perl & Python ones AFAIK) implementations of
Schematron! I think people on Schematron mail-list are so far quite happy
with exploring what you can do when you no longer worship streams. But in a
Java or C++ implementation, for example, there could certainly be rules that
could be applied to decide when nodes in a DOM are no longer required for
schema validation and whether there are nodes that don't need to be
constructed.
For example, if the Schema was simple
<rule context="/">
<assert test="absynthe"
>The top-level element must be absynthe</assert>
</rule>
I am sure rules could be made to handle this kind of case efficiently. I
want to download Sun's XSLT compiler to see what they do (except their dumb
pages kept sending me the same page again and again) for optimised
implementations. It is an interesting area.
IBM's lazy DOM approach, where branches are only fully parsed and
constructed when they are accessed, would also be a useful option there too.
But, again, if one does not start from the assumption that the data has not
already been loaded, it is not always important.
XML Schemas has been designed to allow streaming implementations. The
key/uniqueness constraints being the sticking points for this...the
implentation of these would probably use a big hash table for all candidates
rather than maintaining the DOM, though. But this would help access but be
not much help for pruning.
There may well be some stripped down profile of XPath defined in the next
few months which would make inferenced-pruning easier to do: for example by
only including axis with down-branch and up-ancestor scope.
I think OmniMark may still be the most sophisticated in this regard: it is
defined for streaming use and apparantly prunes the partially-built tree
once there data will not be accessed by any location paths in it (not
XPath-based). The OmniMark developers seem to play things close to the
chest about this, perhaps because 10+ years of tuning their implementation
to their language is one of their competitive advantages.
Cheers
Rick Jelliffe