[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Schemas and mixed content with Relax NG and W3C XML Schema
- From: rjelliffe@allette.com.au
- To: "Philippe Poulard" <philippe.poulard@sophia.inria.fr>
- Date: Thu, 17 Jul 2008 00:42:19 +1000 (EST)
> hi,
>
> this is a question about schemas
>
> I know that with DTDs, when a text is allowed with elements, the best we
> can do is to allow it everywhere between other elements that can be
> repeated at any place in the text :
>
> <!ELEMENT p (#PCDATA|a|ul|b|i|em)*>
>
> unfortunately, we can't enforce the text to be at a given place :
>
> <person>Mr <firstname>John</firstname><lastname>Doe</lastname></person>
>
> the following DTD is invalid, but explain what we'd like to have :
> <!ELEMENT person (#PCDATA,firstname,lastname)>
>
> I wonder if there are also similar limitations with Relax NG and W3C XML
> Schema and why ?
SGML DTDs do allow that kind of structure.
Unfortunately, there was a logical flaw that it exposed that was very
difficult. It was called the pernicuous mixed content problem.
Say you have a content model like this:
<!ELEMENT person ( (title | #PCDATA) , firstname, lastname)>
where you can either mark up the title or just have it.
Now we have a document
<person><title>Mr</title><firstname>John</firstname><lastname>Doe</lastname></person>
That is fine.
But now we take that same document and pretty print it.
<person>
<title>Mr</title>
<firstname>John</firstname>
<lastname>Doe</lastname>
</person>
This is invalid! Why? Because the initial whitespace is taken to match the
$PCDATA, and the the <title> element is unexpected.
This problem could happen for all sorts of strange reasons, such as if you
were using a system with automatic line breaking and the start tag for
person was at the end of the line.
So in the end, in XML it was decided to dump this as too problematic. So
only (#PCDATA, ...)* was allowed, which is the same as XSD's mixed=true.
However, with RELAX NG it was realized that the problem does not occur for
tokens. So having tokens as well as elements such as
( "Mr" | "Mrs), firstname, lastname
will not trigger this problem.
Cheers
Rick Jelliffe
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]