OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Can XML Schemas Support Document Systems (WAS RE: ZDNet Schemaarticle,and hiding complexity within user-friendly products)



At 2001-04-26 01:12, Eric van der Vlist wrote:
>W3C XML Schema has sacrificed flexibility (a quality useful for document
>centric applications often authored by a wide range of tools including
>manual edition that can be seen as a defect by data application) to put
>the emphasis on datatypes (an absolute requirement for data
>applications).
>
>This is best seen in the extremist way W3C XML Schema is forbidding any
>non determinism just to insure that the datatypes in the PSVI are those
>expected by the schema authors.

Er, I guess that's one way to look at it.  But the non-determinism
rule is one we inherited from XML 1.0 DTDs and SGML DTDs.  I suspect
the developers of ISO 8879 will be surprised to learn that they are
guilty of having sacrificed the needs of document-oriented systems
for the sake of supporting data-oriented systems.

Let me be blunt: some members of the WG do think the non-determinism
rule is dumb in ISO 8879, and regret that the XML WG took it over
into XML (though I certainly remember why, and I don't see how we
could have done it differently), and they argued for an XML Schema
language without the non-determinism rule.

But we failed to persuade the WG that it served no useful purpose.
And anyone who faces the choice between a restricted language and
an unrestricted language, and is not 100% convinced that the restriction
serves no purpose, is likely to say "Well, let's go with the
restriction for now, we can relax it later if we want.  But if
we relax it now, and then discover we were wrong, then it will be
too late to add the restriction in later, because it will break
things."

I invite people to show that the requirement for determinism
(i.e. for LL(1) grammars) can be dropped without hurting anybody,
because then maybe we can get consensus on dropping it in some
future version.

But interpreting it as an innovation imposed on the document
community by dataheads is just wrong:  if the non-determinism
rule is bad for document processing, it's a self-inflicted wound.

>For example, you can't define a simple and flexible vocabulary where a
>document would have a title, an optional description and any number of
>paragraphs without imposing a relative order to the different elements.

Huh?  It's complicated, but it's doable.

   (p*, ((title, p*, desc?) | (desc, p*, title)), p*)

>One of the examples I am often using in my trainings and papers is a
>vocabulary where you can either define an element "inline":
>
><character id="character_Peppermint-Patty">
>  <name>Peppermint Patty</name>
>  <since>Aug. 22, 1966</since>
>  <qualification>bold, brash and tomboyish</qualification>
></character>
>
>or by reference:
>
><character ref="character_Peppermint-Patty"/>

You want #CONREF.  If it had worked consistently and interoperably
in SGML, it might be in XML.  But early on, the parsers were not
in agreement on which CONREF examples were valid and which were
invalid, and the people I know who cared about interoperability all
shunned it.

As a workaround, define 'character' as containing

  (char-ref | (name, since, qualification))

But I assume you already know this way to work around the problem.

>The W3C XML Schema working group has widely used this construct in their
>vocabulary and I find it surprising to see that violations such as:
>
><element name="foo" ref="foo" ...
>
>cannot be captured by the schema for W3C XML Schema!

Yes, that one really hurts. Maybe in a future version.

-Michael Sperberg-McQueen
  speaking only for himself