[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: should all XML parsers reject non-deterministic content models?
- From: Daniel Veillard <Daniel.Veillard@imag.fr>
- To: "TAKAHASHI Hideo(BSD-13G)" <hideo-t@bisd.hitachi.co.jp>, xml-editor@w3.org
- Date: Sun, 14 Jan 2001 10:04:58 +0100
On Sun, Jan 14, 2001 at 04:42:55PM +0900, TAKAHASHI Hideo(BSD-13G) wrote:
> Hello.
>
> I understand that the XML 1.0 spec prohibits non-deterministic (or,
> ambiguous) content models (for compatibility, to be precise).
Note also that this is stated in a non-normative appendix.
> Are all xml 1.0 compliant xml processing software required to reject
> DTDs with such content models?
Since it is stated as non-normatively only I don't think this is the
case in theory.
In prectice this can be a problem. I recently faced a problem with
a DtD developped at the IETF which was clearly non-determinist. This
also means that this introduce new classes of XML parser among the
validating ones:
- those who detect and report non-determinist content model
- those who validate (correctly) or not using non-determinist
content model
> Ambiguous content models doesn't cause any problems when you construct a
> DFA via an NFA. I have heard that there is a way to construct DFAs
> directly from regexps without making an NFA, but that method can't
> handle non-deterministic regular expressions. If you choose that method
> to construct your DFA, you will surely benefit from the rule in XML 1.0
> . But if you choose not, detecting non-deterministic content models
> become an extra job.
I tried to read the Brüggemann-Klein thesis listed in reference and
found it a bit frightening, though very informative. The beginning
of the Part I on Document Grammar for example makes clear that SGML
view of unambiguity of the content model is really a 1 token lookahead
determinism.
In practice this is a very good rule because it allows to simplify
the validation of a content model a lot. Problem is that grammars
need to be rewritten to conform to it (the thesis proves it's always
possible at lest).
> I can see that parsers that allow non-deterministic content models may
> be harmful to the user. The user won't notice that his DTD may be
> rejected by other parsers.
>
> So there seems to be good reason for the XML 1.0 spec to prohibit
> parsers that accept non-deterministic content models. In that case the
> spec not only gives chance for a particular DFA constructing algorithm
> to be used, but effectively recommends the usage of the algorithm.
As usual, such suggestions should also be provided to the spec comment
list so I'm forwarding it to xml-editor@w3.org,
Daniel
--
Daniel Veillard | Red Hat Network http://redhat.com/products/network/
daniel@veillard.com | libxml Gnome XML toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/