OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: deterministic content model?

Gary Stephenson wrote:
> I am currently trying to get my XML processor to correctly detect
> non-deterministic content models - which means I have first to understand
> myself ! <g>
> Is the following content model determinisic?
> (a, (b|c)+, (a|b)?, (a|c)* )?
> The recent posts concerning the Conformance Test suite results imply that
> _is_ deterministic (see test ibm47v01)

Hi Gary,

No, this content model is not deterministic.  However, I don't believe that
the conformance test ibm47v01 is in error either!

The reason for this dichotomy is that XML processors are not required to
analyse the content model to see if it is deterministic *unless* the
instance document contains an element of that type.  This is my
understanding of xml 1.0, 3.2.1 [1] which reads:-

"For compatibility, it is an error if an element in the document can match
more than one occurrence of an element type in the content model."

So, it is only an error if there is an element in the document!?  In the
case of ibm47v01, there is not a "child4" element, so it is my understanding
that the processor does not need to build the DFA for its content model.

Our XML Validator used to check every content model to see if it was
deterministic.  However we changed this when we realised the impact it could
have on perform when processing very large DTDs.  Normal Walsh's DocBook DTD
[2] is a good example of a DTD containing many element types, many of which
are not used by individual instance documents.


Rob Lugt
ElCel Technology

[1] http://www.w3.org/TR/REC-xml#sec-element-content
[2] http://www.docbook.org/