OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Are we losing out because of grammars?

On Sun, 04 Feb 2001 07:16:00, Rick Jelliffe wrote:
>From: Charles Reitzel <creitzel@mediaone.net>
>>1) Simple things should be simple.  Although subjective, 
>>I think grammars win easily by this measure.
>Aha, have we found someone who thinks XML Schemas is simple?  :-)

yeah, yeah, yeah ... well, at least for basic elements and attributes, it
could be worse ...

>This week I have been looking at XML Schemas for SVG's use of XLink. 
>innocuous usage is very hard to figure out in CR XML Schemas (I think
>something will work itself out) but trivial in DTDs and XML Schemas: so I
>think it is important not to say that what is even simple in one
>grammar-based system is simple in all.

What's a "CR XML Schema"?  I wouldn't expect link checking to be so simple.
Note, because it usually requires random access to document contents, I put
intra-document link checking in a layer over basic grammar checking.
Inter-document link checking in a layer above that.

>> To my mind, such validation is not a schema language requirement, per se.
>So to be able say "element x must have a child y" is a requirement but
>to be able say "document z must have a element y" is not?   Why?

"document z must have a element y" is *not* a basic requirement.  The simple
reason why is processing cost.  Schema designers (users of schema languages)
need predictable behavior from schema validation.  It is downright easy to
implement such checks at the application level, especially after the schema
checks out.

Also, we needed the simple stuff over a year ago.  I think the world is
tired of waiting for basic schema features until every last fine detail of
complex cases to be worked out.  It's getting silly and costing everyone
time, money and hair.  It was fine for committees haggle for years over C++,
as long as we had C to work with in the mean time.

>Or what about "the twelfth <month> in a <year> has 31 <days>"?  Is that a
>schema requirement? That can be expressed in some grammar languages but not

Not a basic requirement.  I don't know of any simple grammar that would
express that easily.  Rules+DOM are probably needed.  I put this at layer 3.

Of course, date and time types are basic types that should be handled at a
low level by any schema language that supports simple data types (numbers,
date/time, strings).  E.g. users shouldn't have to define calendar rules to
get a date attribute containing "1999-02-29" to fail validation.

Also, any application breaking up dates into parts probably does not
represent a generic calendar.  You are in application land again.

>What about "an xlink:locator element should not appear as the top-level
>element of a document"?  Is that a schema requirement?

Not a basic requirement.  A quick scan of the XLink spec indicates to me
that XLink imposes several constraints that involve aggregate operations or
rules (e.g. uniqueness of arcs and constraints on parent element type).  At
least Rules+DOM are needed (layer 3).  Probably a specialized XLink
validator (aka an application) is needed too.


I read your posting on benchmarking with great interest.  Your point about
comparing implementations and not languages is a good one.  Based on the
broader discussion, a decent first step might be to break schema language
features into two categories: streamable (can use SAX) and random-access
(requires DOM).

Some features might be able to use SAX for some schemas, but will need DOM
in general (e.g. XPath).  A schema language implementor could follow these

1) load the schema,
2) determine if DOM tests exist
3) load the input document
   a) applying streamable tests 
   b) if any DOM tests exist, load DOM from stream. 
4) apply DOM tests to result

To my mind, step 2) is difficult and perhaps constrained by external
libraries (e.g. XPath).  There may be some schema experts out there who feel
differently.  It might even be possible to load only those nodes in the DOM
required to evaluate the tests in the schema.

take it easy,
Charles Reitzel