[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] DTD vs XSD: No Duplicate Types in (Mixed) Content Models
- From: =?UTF-8?B?TWFpayBTdMO8aHJlbmJlcmc=?= <maik.stuehrenberg@uni-bielefeld.de>
- To: Fourny Ghislain <gfourny@inf.ethz.ch>
- Date: Mon, 31 Jan 2011 15:27:07 +0100
Hello Ghislain,
yes, you are absolutely right, silly (copy and paste) mistake. Such a
content model is of course invalid in DTDs because of the wrong
operator. When I change it to
<!ELEMENT a (#PCDATA | b | b)*>
the error message is the expected 'The element type "b" was already
specified in the content model of the element decl "a"' (Xerces).
It would be interesting to know why XSD processors do make a difference
between
<xs:element name="a">
<xs:complexType>
<xs:sequence>
<xs:element ref="b"/>
<xs:element ref="b"/>
<xs:element ref="b"/>
<xs:element ref="b" minOccurs="0" maxOccurs="1"/>
<xs:element ref="b" minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
</xs:element>
and
<xs:element name="a">
<xs:complexType>
<xs:sequence>
<xs:element ref="b" minOccurs="3" maxOccurs="5"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Kind regards,
Maik
Am 31.01.11 14:55, schrieb Fourny Ghislain:
> Hallo Maik,
>
> I only have a quick comment, hope this helps?
>
> You give the following example of a not-allowed DTD:
> <!ELEMENT a (#PCDATA | b, b)*>
> <!ELEMENT b EMPTY>
>
> There is one more reason why it is not allowed: it is also that the comma is not allowed at all in mixed-content declarations, for example even (#PCDATA | a, b)* is not allowed even though these are different tag names.
>
> My impression is that only allowing a repeated choice between #PCDATA and some element names for mixed content declarations is more a DTD limitation (which makes sense, for example, if you consider a document-oriented XML file with formatting tags) than an implementation of deterministic content models (even though thanks to this limitation, the no-duplicate constraint is a low-hanging fruit to ensure determinism). On the other side, and unlike in DTDs, XML Schema has the same expressiveness for mixed content or element-only content, as the only difference is this "mixed" attribute.
>
> Does that make sense?
>
> Kind regards,
> Ghislain
>
>
> On Jan 31, 2011, at 1:56 PM, Maik Stührenberg wrote:
>
>> Hello,
>>
>> first of all, apologies for cross-posting this question to both the
>> xml-dev and the xmlschema-dev list. But since the question is related
>> both to XML DTDs (defined in the XML spec) and XML Schema I hope it's ok.
>> One of my students asked me if it wouldn't be easy to simulate XML
>> Schema's minOccurs and maxOccurs by repeating the same element type
>> (element name) in the content model of its parent element, e.g.:
>>
>> <!ELEMENT a (b, b, b)>
>> <!ELEMENT b EMPTY>
>>
>> would corrrespond to
>>
>> <xs:element name="a">
>> <xs:complexType>
>> <xs:sequence>
>> <xs:element ref="b" minOccurs="3" maxOccurs="3"/>
>> </xs:sequence>
>> </xs:complexType>
>> </xs:element>
>> <xs:element name="b"/>
>>
>> or, as a 1:1 implementation:
>>
>> <xs:element name="a">
>> <xs:complexType>
>> <xs:sequence>
>> <xs:element ref="b"/>
>> <xs:element ref="b"/>
>> <xs:element ref="b"/>
>> </xs:sequence>
>> </xs:complexType>
>> </xs:element>
>> <xs:element name="b"/>
>>
>> All three solutions were tolerated by the parsers I've tried. For
>> simulating minOccurs="3", maxOccurs="5" one could use the following DTD
>> solution:
>>
>> <!ELEMENT a (b, b, b, b?, b?)>
>> <!ELEMENT b EMPTY>
>>
>> To my suprise, I had no problems with the parsers I've tried. However,
>> such a content model seems to be not allowed in XSD (as expected because
>> of the determinstic content models constraint):
>>
>> <xs:element name="a">
>> <xs:complexType>
>> <xs:sequence>
>> <xs:element ref="b"/>
>> <xs:element ref="b"/>
>> <xs:element ref="b"/>
>> <xs:element ref="b" minOccurs="0" maxOccurs="1"/>
>> <xs:element ref="b" minOccurs="0" maxOccurs="1"/>
>> </xs:sequence>
>> </xs:complexType>
>> </xs:element>
>> <xs:element name="b"/>
>>
>> Of course it would be possible to use<xs:element ref="b" minOccurs="3"
>> maxOccurs="5"/> to construct this content model.
>>
>> In addition, the XML spec defines that in mixed content the same element
>> name must not appear more than once (Validity Constraint: No Duplicate
>> Types (XML Spec 5th ed, 3.2.2), therefore the following DTD construct is
>> not allowed:
>>
>> <!ELEMENT a (#PCDATA | b, b)*>
>> <!ELEMENT b EMPTY>
>>
>> But it seems to be ok in XSD:
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
>> <xs:element name="a">
>> <xs:complexType mixed="true">
>> <xs:sequence>
>> <xs:element ref="b"/>
>> <xs:element ref="b"/>
>> </xs:sequence>
>> </xs:complexType>
>> </xs:element>
>> <xs:element name="b"/>
>> </xs:schema>
>>
>> So, if I get it straight, it seems that the (non-normative) requirement
>> for determinstic content models is only partially implemented (only for
>> mixed content models) in both DTD and XSD and both document grammar
>> formalisms differ in handling this (under the assumption that the
>> parsers are correct).
>> The interesting case is, if I have exact four occurrences of the element
>> b in an instance, resulting that my parser cannot know which b in the
>> model is being matched. Was this behaviour expected? Or does the
>> requirements for deterministic content rules is only applied when
>> dealing with different element names (such in the a, (b|c) vs.
>> (a|b),(a|c) example)?
>>
>> Best regards,
>>
>> Maik Stührenberg
>>
>> --
>>
>> Maik Stührenberg, M.A.
>>
>> Universität Bielefeld
>> Fakultät für Linguistik und Literaturwissenschaft
>> Universitätsstraße 25
>> 33615 Bielefeld
>>
>> Telefon: +49 (0)521/106-2534
>> E-Mail: maik.stuehrenberg@uni-bielefeld.de
>>
>> http://www.maik-stuehrenberg.de
>> http://www.xstandoff.net
>>
>>
>> _______________________________________________________________________
>>
>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> to support XML implementation and development. To minimize
>> spam in the archives, you must subscribe before posting.
>>
>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>> subscribe: xml-dev-subscribe@lists.xml.org
>> List archive: http://lists.xml.org/archives/xml-dev/
>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>
>
>
--
Maik Stührenberg, M.A.
Universität Bielefeld
Fakultät für Linguistik und Literaturwissenschaft
Universitätsstraße 25
33615 Bielefeld
Telefon: +49 (0)521/106-2534
E-Mail: maik.stuehrenberg@uni-bielefeld.de
http://www.maik-stuehrenberg.de
http://www.xstandoff.net
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]