OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: should all XML parsers reject non-deterministic content models?




TAKAHASHI Hideo wrote:

> I understand that the XML 1.0 spec prohibits non-deterministic (or,
> ambiguous) content models (for compatibility, to be precise).
> Are all xml 1.0 compliant xml processing software required to reject
> DTDs with such content models?

No: a processor can ignore the DTD entirely and still be compliant.
And since the prohibition against non-deterministic content models
appears in a non-normative appendix, I would presume that conforming
DTD-aware processors are not required to detect this condition either.
Even in full SGML, ambiguous content models are a "non-reportable
markup error", i.e., parser don't need to detect this condition.

> Ambiguous content models doesn't cause any problems when you construct a
> DFA via an NFA.  I have heard that there is a way to construct DFAs
> directly from regexps without making an NFA, but that method can't
> handle non-deterministic regular expressions.

There are many, many other ways to validate documents against content
models though.  Take a look at James Clark's TREX implementation,
which has no problem with ambiguity, and also efficiently handles
intersection, negation, and interleaving of content models
(the first two of which are *very* expensive in a DFA-based
approach).


> If you choose that method
> to construct your DFA, you will surely benefit from the rule in XML 1.0
> . But if you choose not, detecting non-deterministic content models
> become an extra job.

But note that detecting ambiguity in XML content models is considerably
simpler than in SGML -- the really difficult part involves '&' groups
which aren't present in XML.


--Joe English

  jenglish@flightlab.com