OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Are we losing out because of grammars? (Re: Schema ambiguitydetection algorithm for RELAX (1/4))



H: K.Kawaguchi <k-kawa@bigfoot.com>
>While XML Schema gains more and more weight, we have two relatively
>light-weight schema language. One is RELAX, and the other is TREX.
>And I think many people is curious about the difference of these two
>languages.
..
>I'd appreciate any comments from you about this.

Without disrespect to either RELAX or TREX (and certainly not for the people
involved, who have my highest respect as innovators and activists), my
serious comment is "why bother?" Of course, I don't mean that this is not
academically interesting nor commercially relevant,  nor that is not vital
for getting efficient implementations, nor that it is not important that we
have strong alternative schema languages for rich diversity and
cross-pollenation.

What I mean is this: ambiguity is an artifact of the grammar paradigm not of
schemas per se. In this, it is the same as studies of how to get the union
of schemas.  In other words, RELAX may be lightweight for implementing, but
not lightweight for reasoning (since grammar-based schema languages are
impoverished in the direct relationships they can express). What if, even
after figuring out how to handle ambiguity and unions in grammars, we are
still left with a paradigm that is not expressive enough for implementing
the human-oriented, concept-modeling/data-modeling systems that some people
think are important?

I believe propronents of various schema paradigms (grammars and rule-based
systems) need to justify that their paradigm is useful. When we only had
grammars, it was a moot point. Now we have Schematron and other rule-based
systems, there I think we can be bold enough to start to critique
grammars-as-schemas.  (Of course, this is a two-way street.)

Lets contrast the grammar approach to a rule/path based approach (as in
Schematron). In Schematron we don't have any ambiguity problem, because we
don't have alternate paths for any location expression in the same way.

The union issue is a bit more tricky. Naively, one merely sticks the
patterns from both schemas into one schema by cut and paste, and they are
evaluated as separate patterns. There is no need to merge the patterns (just
like TREX avoids the problem, I think, by allowing parallel content models).
An inferencing system could be made which checked whether each assertion
totally or partially conflicted with another and which assertions could be
joined: some notion of which contsraints could be relaxed would be important
too: if schema 1 says count(*)>5 and schema 2 says count(*) is > 6, what is
the result of "merging"?--in that case there are a universe of possible
merges, with some being more interesting than others.

So my comment is this: doesn't the presence of these tricky ambiguity issues
mean that to actually understand RELAX (and presumably certain other schema
languages)  requires a computer scientist not a data modeler?  Might we get
to higher-level schema languages faster by completely ditching the grammar
paradigm and treating schemas as systems of logical assertions which can be
queried?

XML does not have SGML's short refs and delimiter maps, so why does it now
need grammar-based content-modeling?

Cheers
Rick Jelliffe