[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Accepting non-deterministic content models
- From: Lars Marius Garshol <firstname.lastname@example.org>
- To: email@example.com
- Date: Sun, 08 Jul 2001 11:58:59 +0200
* Roger L. Costello
| I believe that the origin of this "problem" is that in the XML spec
| it states that parsers *should* reject non-deterministic content
| models. I am wondering if perhaps "should" would be better replaced
| with "must"?
In my opinion the best fix would be to replace "should" with "must not".
The only reason to reject non-deterministic content models is because
the SGML standard requires it, and so for backwards compatibility XML
did the same. This is also why the requirement is so vague. It doesn't
say "should", it says "for compatibility, it is an error if...".
When implementing an XML parser, however, you can build a finite state
automaton representing the content model. This is fairly easy, and
gives you a structure that can be traversed very quickly in order to
validate the contents of an element against its content model.
If this approach is followed, you have to do extra work to detect
whether the original content model was "non-deterministic". In fact,
the reason why xmlproc accepts such content models (as Tom Passin
reports) is that I haven't implemented this check yet. I don't think
there's all that much point in doing so, either.
Now that we've ended up with a spec that is the way it is, I think the
best course to follow for an implementation is to accept such content
models, but to warn about them. There is nothing problematic about
them, except that they vioalate SGML backwards-compatibility.