OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: should all XML parsers reject non-deterministic content models?



Hi,
I'm afraid I've lost the thread of these arguments - could someone please
give me a easy definition of what is meant by determinism (in the context of
content models), ambiguity (in the context of non-determinism) and how these
are being considered (in the context of conformance and compliance to
Recommendations). A comprehensible (normative) explanation would be
preferred. Are we talking about a processor which knows a closed set of
parameters or one with a default catch-all : "I have no idea what you are on
about?" Expansions of the acronyms DFA and NFA would also be appreciated.

Cheers,
Danny.

> -----Original Message-----
> From: Joe English [mailto:jenglish@flightlab.com]
> Sent: 14 January 2001 23:58
> To: xml-dev@lists.xml.org
> Subject: Re: should all XML parsers reject non-deterministic content
> models?
>
>
>
> TAKAHASHI Hideo wrote:
>
> > I understand that the XML 1.0 spec prohibits non-deterministic (or,
> > ambiguous) content models (for compatibility, to be precise).
> > Are all xml 1.0 compliant xml processing software required to reject
> > DTDs with such content models?
>
> No: a processor can ignore the DTD entirely and still be compliant.
> And since the prohibition against non-deterministic content models
> appears in a non-normative appendix, I would presume that conforming
> DTD-aware processors are not required to detect this condition either.
> Even in full SGML, ambiguous content models are a "non-reportable
> markup error", i.e., parser don't need to detect this condition.
>
> > Ambiguous content models doesn't cause any problems when you construct a
> > DFA via an NFA.  I have heard that there is a way to construct DFAs
> > directly from regexps without making an NFA, but that method can't
> > handle non-deterministic regular expressions.
>
> There are many, many other ways to validate documents against content
> models though.  Take a look at James Clark's TREX implementation,
> which has no problem with ambiguity, and also efficiently handles
> intersection, negation, and interleaving of content models
> (the first two of which are *very* expensive in a DFA-based
> approach).
>
>
> > If you choose that method
> > to construct your DFA, you will surely benefit from the rule in XML 1.0
> > . But if you choose not, detecting non-deterministic content models
> > become an extra job.
>
> But note that detecting ambiguity in XML content models is considerably
> simpler than in SGML -- the really difficult part involves '&' groups
> which aren't present in XML.
>
>
> --Joe English
>
>   jenglish@flightlab.com
>