[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: should all XML parsers reject non-deterministic content models?
- From: Joe English <jenglish@flightlab.com>
- To: xml-dev@lists.xml.org
- Date: Sun, 14 Jan 2001 09:58:16 -0800
TAKAHASHI Hideo wrote:
> I understand that the XML 1.0 spec prohibits non-deterministic (or,
> ambiguous) content models (for compatibility, to be precise).
> Are all xml 1.0 compliant xml processing software required to reject
> DTDs with such content models?
No: a processor can ignore the DTD entirely and still be compliant.
And since the prohibition against non-deterministic content models
appears in a non-normative appendix, I would presume that conforming
DTD-aware processors are not required to detect this condition either.
Even in full SGML, ambiguous content models are a "non-reportable
markup error", i.e., parser don't need to detect this condition.
> Ambiguous content models doesn't cause any problems when you construct a
> DFA via an NFA. I have heard that there is a way to construct DFAs
> directly from regexps without making an NFA, but that method can't
> handle non-deterministic regular expressions.
There are many, many other ways to validate documents against content
models though. Take a look at James Clark's TREX implementation,
which has no problem with ambiguity, and also efficiently handles
intersection, negation, and interleaving of content models
(the first two of which are *very* expensive in a DFA-based
approach).
> If you choose that method
> to construct your DFA, you will surely benefit from the rule in XML 1.0
> . But if you choose not, detecting non-deterministic content models
> become an extra job.
But note that detecting ambiguity in XML content models is considerably
simpler than in SGML -- the really difficult part involves '&' groups
which aren't present in XML.
--Joe English
jenglish@flightlab.com