OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Are we losing out because of grammars?




±H¥óªÌ: Joe English <jenglish@flightlab.com>

>Rick Jelliffe wrote:
>
>> I believe propronents of various schema paradigms (grammars and
>> rule-based systems) need to justify that their paradigm is useful....

>I think the utility of grammar-based schemas has been well established.

As far as I know, the decision to use grammars in XML Schemas was made
completely uncritically. And how could it be otherwise if there has been no
discussion?  So the utility of grammar-based schemas is as well-established
as the utility of the horse-drawn stump-jump plough: yes we can do excellent
things with them that we would not do without them, but if we are not Amish
why use a horse when there are shining tractors with air-conditioned cabs
and diverting CDs of Dwight Yoakim yodelling and the Georgia Peach singing?

If we know XML documents need to be graphs, why are we working as if they
are trees? Why do we have schema languages that enforce the treeness of the
syntax rather than provide the layer to free us from it?

>One of the biggest benefits of grammar-based schemata is that
>they are generative: there is an enumeration procedure as well
>as a decision procedure.

A rule-based schema can also be generative, using type inferencing.
The simpler the rules-language, the simpler the inferencing required.

For example, given
   <rule context="book"><assert text="chapter"/></rule>
an inferencing interface can say when we are in a book "do you want to add a
chapter?" So guiding/performing generation is not a unique quality of
grammars.  The difference is, that a grammar pretty much forces one to work
in sibling order, while a rule-based system is more free.  If a test is too
tricky or too general to infer a specific action, so the plain text of the
assertion becomes more important to guide the generator--this is not a
weakness but a strength.

(And Schematron 1.5 allows two extra hints on assertions: a role attribute
and a subject path which can be used to key specific handlers for
assertions. And the diagnostics for each assertion allow you to get the
value of expressions, and so have dynamic error messages. Plus the phase
mechanism will let you turn off sets of patterns:  so there could be set of
patterns (a phase) for when one is creating a skeletal document (so you only
get generative hints about these) and then in a different phase you get
interested in something else--current schema languages do not allow
management of a division of labour or tasking for data entry.)

>But there *is* an ambiguity issue in Schematron -- the requirement
>that a node can match the context of at most one <rule>  in
>a given <pattern>.  As you mentioned, this doesn't cause
>a problem when trying to merge two schemas (just include all
>the <pattern>s from both schemas), but the question of whether
>all the <rules> within each of the original <pattern>s match
>disjoint node-sets remains.

I don't see that, really. If you want to merge two rule sets (unless they
happen to be the same) you may get a minor explosion but there is no
ambiguity. I think it is just straightforward set operations, and you always
know which one will fire due because of lexical priority.

So, to merge any two patterns, for each rule in pattern A take each rule in
pattern B and create a new rule by anding the context attributes from the
current A and B and combining their contents (and, at the end, also have all
the A rules and all the B rules uncombined to handle the fallthrough case).
Maintain the order; prune impossible cases if desired (they are harmless,
just for performance reasons); refactor rule contexts to eliminate common
subexpressions as desired.

(Schematron provides a syntactic sugar facility called abstract rules, which
will keep the line count under control: instead of combining the assert
statements, make an abstract rule with the contents of each source rule and
then add references to the abstract rule inside the generated rules.)

>> XML does not have SGML's short refs and delimiter maps, so why does it
>> now need grammar-based content-modeling?
>
>XML doesn't need *any* kind of schema language

But we have them, people want/need them, and the underlying paradigm has
never been explicitly justified compared to alternatives.

Cheers
Rick Jelliffe