xml-dev - Re: [xml-dev] Property dependency checking

Re: [xml-dev] Property dependency checking
[ Lists Home | Date Index | Thread Index ]
To: "xml-dev" <xml-dev@lists.xml.org>
Subject: Re: [xml-dev] Property dependency checking
From: "Bob Foster" <bob@xmlbuddy.com>
Date: Sat, 12 Apr 2003 09:23:36 -0500
[Warning: longish reply to a longish reply]

From: "Rick Jelliffe" <ricko@allette.com.au>
> > Since when are attributes order-constrained?
> Oops: s/@value/text()/     but the point is the same. (I don't think
> attribute
> orderr was the mistake: it is prop1 that provides the context.
>
> <rule context="prop1[@value='ABC']" >
>[snip]

I think I see the problem. The question was about attributes, but your rules
are in terms of elements. I am sure you know that the following-sibling axis
is empty for attribute contexts. I would think the rules would be something
like

<rule context="root[@prop1='ABC']">
  <assert test="@prop2 and not(@prop3)">
    When the value of prop1 is ABC then prop2 is required
  </assert>
</rule>

<rule context="root[@value='XYZ']">
  <assert test="@prop3 and not(@prop2)">
    When the value of prop1 is XYZ then prop3 is required
  </assert>
</rule>

<!-- and conversely -->
<rule context="root[@prop2]">
  <assert test="@prop1='ABC'">
    When prop2 is specified, the value of prop1 must be ABC
  </assert>
</rule>

<rule context="root[@prop3]">
  <assert test="@prop1='XYZ'">
    When prop3 is specified, the value of prop1 must be XYZ
  </assert>
</rule>

As a rule, the set of assertions required to express a regular model are not
simpler than the model. For example, here is the model in RELAX NG compact
syntax:

element root {
  (attribute prop1 {'ABC'}, attribute prop2 {text})
| (attribute prop1 {'XYZ'), attribute prop3 {text})}

If we had a compact syntax for expressing the rules above (wouldn't that be
nice?) it might look something like this:

assert root[@prop1='ABC'] -> @prop2 and not(@prop3)
&& root[@prop1='XYZ'] -> @prop3 and not(@prop2)
&& root[@prop2] -> @prop1='ABC'
&& root[@prop3] -> @prop1='XYZ'

(Hopefully, the notation, context -> test and && for
short-circuit boolean AND, is clear enough. I have left off the error
messages, in the interest of fairness, since they could be generated from
the expressions, but I am sure an actual implementation would allow them.)

The truth is, neither of these forms are particularly easy for the average
person to understand, but the first is surely preferable to the second!
Moreover, it is easier to maintain and extend, easier to "visualize", etc.

> On the contrary, what you have is exactly a list of things that can be
said:
> you can format a Schematron schema as a list.  The text of the assertions
> is the basis for the implementing contexts and tests.

Nope. You just have a list of things that can't be said. Some constructive
information can be derived from this list, not everything. For example, what
set of XPath expressions correspond to a*? Limitations in XPath and the
deduction problem (analogous to trying to derive a schema from instances)
keep this approach from completeness.

> Other people (quite legitimately) want to create a document
> format so it is nice and idiomatic according to their eye; that is the
> kind of person who does
> <root>
>     <prop1 value="ABC">
>     <prop2 value="...">
> </root>
>
> rather than
>
> <root>
>     <prop1>
>         <ABC>
>             <prop2 value="...">
>         </ABC>
>     </prop1>
> </root>
>
> Which is regular.

They are both regular. That's not the issue.

> You can do it with some grammars. You cannot do it with others,
> such as grammars containing wildcards.

Sure you can. You just select the grammar information from the appropriate
namespace(s) and proceed. Of course you can't give constructive assistance
where no information about a namespace is provided, but when it is provided,
it is available constructively.

> But you don't need some program to infer readable
documentation
> for Schematron: the assertions are already there in plain language.

Unless you're talking about the error messages, there is nothing "plain
language" about either assertions or regular models. It would be silly to
argue that more people know one than the other; all the people who know
either are a miniscule percentage of the population. That's why it's so
important to be able to derive hints, documentation, data entry formats,
etc. from a schema.

> The problem with assertions is not that you cannot extract anything from
> them,
> but that people have not investigated this area.

I think the problems go deeper than that, but I agree it would be useful for
content assistance, documentation, etc. to accurately reflect the effect of
assertions on a content model. Since the assertions used in practice are by
and large subtractive, the effect would be to reduce the choices available
to the user. Good idea.

> (In fact, I prototyped
> about
> 2 years ago a system for generating contextual author assistance from
> patterns recognized in Schematon schemas, so it can work, but I am waiting
> for more Schematron schemas to test whether it goes far enough. The main
> problem is not that Schematron is assertion-based or XPath based, but that
> it is typically open and typically is used to complement grammars--they
are
> typically open and sparse rather than closed and complete.)

Exactly so. Assertions are swell as a complement to models. But no one
(human) is willing to write out all the assertions required to describe
regular models (if that is possible). If you're waiting for a significant
body of all-Schematron schemas, I would guess it will be a long wait.

Which returns me to my original point. If models *can* be used to describe
something, they *should* be used. Assertions should only be used where
models are insufficient. In XSD, you can't describe what the original writer
wanted, so if XSD is a requirement, the writer has no choice but to use
assertions. If not, RELAX NG can describe this problem in a natural and
concise way. For things RELAX NG can't describe, by all means augment it
with assertions.

> I didn't say "don't use grammars for anything". But there is no reason why
> an arbitrary database should have a dominant tree or natural tree;
> consequently
> there every reason to expect that grammars are deficient for representing
> the structural constraints of a database.

Of course there are a host of reasons why regular expressions aren't used to
model databases, but databases aren't documents and, marketing hype aside,
documents aren't databases. The content is not the container. XML documents
are trees, results of queries on XML documents are hedges, both perfectly
amenable to regular grammars.

A better point is that regular grammars aren't sufficient to describe every
idiom a schema designer might want to impose on a document. Is that what you
mean? I agree! Schema design is language design. Language designers are
notoriously, uh, irregular. But one should distinguish between the
limitations of regular grammars and the restrictions of the various designs
that employ regular grammars, DTD, XML Schema, RELAX NG, etc. Since RELAX NG
has fewer 'arbitrary' restrictions, it can capture more designer idioms,
like the extremely common idioms that constrain which attributes can appear
with which other attributes in a given element. However, it retains some
restrictions dictated by implementation concerns.

Then there is the entirely different point that some idioms cannot be
expressed by *any* regular grammar, e.g., counting,  naming, uniqueness,
etc. Yet language designers want to do these things.

Finally, the fact is that no schema language commonly in use is entirely
'regular'; they all have some assertion content. An ID is an assertion about
a value space, as is a key, etc. Driven by use cases, schema language
designers will push ever further into the assertion space to overcome the
limitations of regular models (or their fears of implementation overhead).
But assertions can never be completely eliminated without introducing the
equivalent of assertions.

One could go further than assertions, into full programming languages. Maybe
XQuery will go there; it has all the required parts to mutate into a
programmable schema language. But I'm not certain this would be an
improvement, by an extension of my earlier argument. Isn't this where we
started, documents accepted or rejected by arbitrary programs that few can
read and even fewer understand? Assertions occupy a sweet spot just shy of
Turing completeness that is very powerful but much easier to understand and
explain than arbitrary programs.

Which, I hope, allows me to conclude on a point of agreement.  There will
always be a place for assertions. Long live Schematron! ;-}

Bob
Follow-Ups:
- Re: [xml-dev] Property dependency checking
  - From: "Bob Foster" <bob@xmlbuddy.com>
Prev by Date: Re: [xml-dev] [permathread:semantics] What Markup Is For
Next by Date: Re: [xml-dev] Property dependency checking
Previous by thread: RE: [xml-dev] property dependency checking
Next by thread: Re: [xml-dev] Property dependency checking
Index(es):
- Date
- Thread