[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: should all XML parsers reject non-deterministic content models?
- From: Danny Ayers <firstname.lastname@example.org>
- To: email@example.com
- Date: Mon, 15 Jan 2001 02:10:16 +0600
I'm afraid I've lost the thread of these arguments - could someone please
give me a easy definition of what is meant by determinism (in the context of
content models), ambiguity (in the context of non-determinism) and how these
are being considered (in the context of conformance and compliance to
Recommendations). A comprehensible (normative) explanation would be
preferred. Are we talking about a processor which knows a closed set of
parameters or one with a default catch-all : "I have no idea what you are on
about?" Expansions of the acronyms DFA and NFA would also be appreciated.
> -----Original Message-----
> From: Joe English [mailto:firstname.lastname@example.org]
> Sent: 14 January 2001 23:58
> To: email@example.com
> Subject: Re: should all XML parsers reject non-deterministic content
> TAKAHASHI Hideo wrote:
> > I understand that the XML 1.0 spec prohibits non-deterministic (or,
> > ambiguous) content models (for compatibility, to be precise).
> > Are all xml 1.0 compliant xml processing software required to reject
> > DTDs with such content models?
> No: a processor can ignore the DTD entirely and still be compliant.
> And since the prohibition against non-deterministic content models
> appears in a non-normative appendix, I would presume that conforming
> DTD-aware processors are not required to detect this condition either.
> Even in full SGML, ambiguous content models are a "non-reportable
> markup error", i.e., parser don't need to detect this condition.
> > Ambiguous content models doesn't cause any problems when you construct a
> > DFA via an NFA. I have heard that there is a way to construct DFAs
> > directly from regexps without making an NFA, but that method can't
> > handle non-deterministic regular expressions.
> There are many, many other ways to validate documents against content
> models though. Take a look at James Clark's TREX implementation,
> which has no problem with ambiguity, and also efficiently handles
> intersection, negation, and interleaving of content models
> (the first two of which are *very* expensive in a DFA-based
> > If you choose that method
> > to construct your DFA, you will surely benefit from the rule in XML 1.0
> > . But if you choose not, detecting non-deterministic content models
> > become an extra job.
> But note that detecting ambiguity in XML content models is considerably
> simpler than in SGML -- the really difficult part involves '&' groups
> which aren't present in XML.
> --Joe English