XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Schemas and mixed content with Relax NG and W3C XML Schema

> hi,
>
> this is a question about schemas
>
> I know that with DTDs, when a text is allowed with elements, the best we
> can do is to allow it everywhere between other elements that can be
> repeated at any place in the text :
>
> <!ELEMENT p (#PCDATA|a|ul|b|i|em)*>
>
> unfortunately, we can't enforce the text to be at a given place :
>
> <person>Mr <firstname>John</firstname><lastname>Doe</lastname></person>
>
> the following DTD is invalid, but explain what we'd like to have :
> <!ELEMENT person (#PCDATA,firstname,lastname)>
>
> I wonder if there are also similar limitations with Relax NG and W3C XML
> Schema and why ?

SGML DTDs do allow that kind of structure.

Unfortunately, there was a logical flaw that it exposed that was very
difficult. It was called the pernicuous mixed content problem.

Say you have a content model like this:
    <!ELEMENT person ( (title | #PCDATA) , firstname, lastname)>
where you can either mark up the title or just have it.

Now we have a document
    <person><title>Mr</title><firstname>John</firstname><lastname>Doe</lastname></person>

That is fine.

But now we take that same document and pretty print it.

<person>
    <title>Mr</title>
    <firstname>John</firstname>
    <lastname>Doe</lastname>
</person>

This is invalid!  Why? Because the initial whitespace is taken to match the
$PCDATA, and the the <title> element is unexpected.

This problem could happen for all sorts of strange reasons, such as if you
were using a system with automatic line breaking and the start tag for
person was at the end of the line.

So in the end, in XML it was decided to dump this as too problematic. So
only (#PCDATA, ...)* was allowed, which is the same as XSD's mixed=true.

However, with RELAX NG it was realized that the problem does not occur for
tokens. So having tokens as well as elements such as
   ( "Mr" | "Mrs), firstname, lastname
will not trigger this problem.

Cheers
Rick Jelliffe


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS