OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Are we losing out because of grammars?

From: James Clark <jjc@jclark.com>

> If I'm in an "x" element and I get a "y" element with a "z" attribute
> that is a legal lexical representation of an integer, I can't tell
> whether to type that attribute as an "xsd:integer" or an "xsd:string"
> unless I lookahead and see whether it's the last element "y" element in
> the "x".   The TREX implementation works on a stream of SAX events, so
> this is a big complication.

So can we say that in TREX that a type is a path through the grammar?

> It depends how you restrict the grammar.  If you restrict the grammar as
> much as W3C's schemas, type assignment is significantly simpler than
> validation (since I believe I am correct in saying that for W3C schemas
> the type of an element depends only on its name and the names of its
> parents).

For XML Schemas, it depends on what you call type. "Nullability" (or
whatever it is called, gawd don't get me started) is spoken as a "property"
of elements not of "types".  And xsi:type can override the type to a
compatible one.

> There are many
> applications for which type-assignment is not necessary; I think
> dispatching on the "FQGI" (ie on the name of the element and the names
> of its ancestor elements) is sufficient for many applications.

I think there are three issues: one is how specifically we can identify
element or attributes in context, the second is what abstractions we use to
express them, the third is what side-effects the abstractions have.  For
example, DTDs have just parent, sibling, group identification; these are
encoded using the grammar abstraction; the grammar abstraction forces us to
decide issues of order, belonging and parent-to-child occurrence even when
they are not relevant.

I wonder, given the existence, deployment and suitablility of James' XPath,
why we need to settle for "sufficient for many applications".  Smart readers
of XML-DEV will of course say "oh, but probably you can express things in
content models that you cannot express in paths and rules" but I have my
doubts: a really complex content model is IMHO often (always) either the
sign of struggling against the grammar or a kind of tag ommission: if there
is some complex structure there, why isn't it explicitly labelled for all
the world to see?

Rick Jelliffe