OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Blowout

Rick Jelliffe  wrote:

> But now it occurs to me that perhaps there are some basic questions we can
> ask (of a schema implementer or language designer) which may give a head
> start in the absense of benchmarking:
>  - Are there any innocent-looking structures that explode (or may explode)
> (perhaps this is the same as asking are there any constructs which, when
> used, may have more than an O(n) effect on performance) and what are the
> workarounds (e.g. in XML Schemas case, to detect unbounded particles in
> unbounded choice groups) ?

I don't think there are any explosive constructs in TREX.
In James' reference implementation of TREX (if I understand it
correctly),  it takes approximately O(N) time and space to preprocess
the schema, since this phase amounts to little more than constructing
an abstract syntax tree.   During validation, each start-tag, attribute,
end-tag, and #PCDATA particle can be processed in O(1) amortized time.
(The _first_ time a particular state/symbol pair is encountered, it
takes O(pattern-size) time to compute the next state, but this result is
cached so the next time around it takes constant time.  As the
size of the document approaches infinity, this averages out to O(1)
per input symbol.)

Space requirements are modest as well: storage requirements
include the stack of open elements from the input document,
the AST of the schema, and a lazily-constructed DFA transition
table.  The space requirements are largely independent of
the size of the input document, so TREX is suitable for
large input sets.

Worst-case space requirements involving INTERLEAVE and CONCUR are
also handled efficiently.  An INTERLEAVE pattern containing N element
tokens can lead to a DFA with 2^N states.  However, states are only
constructed as they are needed so the implementation will only use O(2^N)
space for such a pattern if the input document actually contains
sequences for each possible permutation.

I believe that RELAX can be implemented in the same way, though
I don't know if any of the existing implementations do so.

--Joe English