[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] XML spec and XSD
- From: "Pete Cordell" <petexmldev@codalogic.com>
- To: <xml-dev@lists.xml.org>
- Date: Wed, 18 Nov 2009 09:47:28 -0000
I think a lot of this is horses for courses.
I think for a lot of programmers with a Java/C#/C++ background the structure
of XSD mirrors what they do in classes. So something like:
<xs:element name='foo' type='xs:int' minOccurs='0' maxOccurs='unbounded'/>
is more readily mappable to their everyday mindset than:
<zeroOrMore>
<element name='foo'>
<data type='int/>
</element>
</zeroOrMore>
Also, as classes tend to be a bag of bits without much co-occurence
constraints it probably doesn't even occur to them that XSD is weak in this
area.
I think for a lot of people XML is just a means to an end, and the less they
have to deal with it the better. So they pick up the first thing they see,
decide that it's adequate for their needs and go with it. Maybe also a case
of nobody got fired for choosing XSD!
For example, it was stated previously on this list that people tend to do
things in Java that would be better done in XSLT. Sometimes the devil you
know is the better route to take. Especially as pointy-haired boss likes to
see regular progress updates, and doesn't consider surfing the web as
productive.
Undoubtedly XSD could have been better, but if your requirements are met by
what it offers then there's no real motivation to use some other approach.
My 2 cents (for today at least!)
Pete Cordell
Codalogic Ltd
Interface XML to C++ the easy way using XML C++
data binding to convert XSD schemas to C++ classes.
Visit http://codalogic.com/lmx/ or http://www.xml2cpp.com
for more info
----- Original Message -----
From: "Rick Jelliffe" <rjelliffe@allette.com.au>
To: <xml-dev@lists.xml.org>
Sent: Wednesday, November 18, 2009 7:13 AM
Subject: Re: [xml-dev] XML spec and XSD
Glidden, Douglass A wrote:
> From the perspective of an application developer who uses XML for storing
> fairly complex hierarchical data (as opposed to documents), I have a very
> basic question: Is there any real, practical reason that Relax NG is
> _universally_ better than XSD?
>
> Here's the background for my question: I've been using XSDs for
> generating object-oriented data models for some time, and have been
> frustrated by various inflexibilities in XSDs. For instance:
> - Derivation by restriction, while conceptually very useful, seems
> slightly cumbersome in operation; in particular, I wish it could be
> accomplished with less repetition (I'm a fanatical believer in DRY). I
> can live with the repetition, though, because I am assured that, if I make
> a typographical or some other error, the XSD will (almost) always fail to
> validate ("Bar is not a valid restriction of Foo"), so the the possiblity
> of unintended changes being introduced due to the repetition is kept to a
> minimum.
> - Certain fairly simple data structures can only be achieved in XSD (if at
> all) with ridiculously verbose patterns; for instance, how do you
> represent a structure with five possible elements, at least one of which
> must be present and none of which may be repeated? The best method I have
> found takes about 25 lines and far too much code repetition.
> - Most business logic cannot be represented, like "the value of the foo
> element must be less than or equal to the value of the bar element".
> - I occasionally was foiled by XSDs strict "no ambiguity" rules, which
> make it difficult to enforce rules such as "if the value of the foo
> element is xyz, then the bar and baz elements are required, but otherwise
> they are optional".
>
> Nonetheless, I have been more or less pleased with the final results in
> most cases. Having heard much praise of Relax NG, I recently decided to
> try it out, hoping it would be able to resolve these issues. Here's what
> I found:
> - It does relieve the "no ambiguity" problem and makes enforcing that sort
> of rules much easier.
> - It does not resolve most of the business logic issues, but there are
> other technologies that can be used in combination with either Relax NG or
> XSD to resolve these issues, so that can be worked around in either case.
> - It does not appear to relieve the complexity of representing some types
> of data structures (see the one described above-as far as I can tell, it
> would not be significantly easier to define in Relax NG than it is in
> XSD).
> - There is only a rudimentary concept of type definition and virually none
> of inheritance. Designing good, object-oriented data models is
> practically impossible in Relax NG; for me, this is a HUGE deficiency.
> For instance, say Foo is an abstract data type with several fields,
> including an id field (any mixed data allowed) and a type field (with
> enumerated values); Bar is one of several concrete subtypes of Foo, which
> restricts the content of the id field (must contain two elements, a
> groupId and a subId) and the type field (to a subset of the values
> enumerated by Foo) and appends several new fields; other subtypes have
> completely different restrictions on the id and type fields and append
> various different new fields. In XSD, implementing this is fairly
> straightforward (although not necessarily simple and certainly more
> verbose than I would prefer): Foo is an abstract complex type, and Bar
> and the other subtypes must derive from it first via restriction (creating
> the restrictions on id and type) and then via extension (appending the new
> fields). In Relax NG, by contrast, I could find only two ways of doing
> this, neither of which was at all acceptable: One is to get rid of Foo
> completely and reference its fields separately in each subtype-this is
> unacceptable to me as it completely violates DRY and object-oriented
> design; the other is to define each subtype of Foo in a separate file,
> importing Foo and redefining as necessary-this is marginally better from
> an OOP point-of-view, but still makes it impossible for a schema to use
> "any valid subtype of Foo" as a content definition without violating the
> Open-Closed principle.
> - Relax NG provides no way of specifying cardinalities other than 0, 1,
> and infinity! This is also a big one for me--if I need at least 2 and no
> more than 20 occurrences of the foo element (fairly common in my data
> definitions), there's NO WAY I'm going to define two required foo elements
> and eighteen optional ones!
>
> Now, I'm hoping that some Relax NG expert will come along and prove me
> wrong about some or all of these things, since I only experimented with it
> for a couple of days. If not, for me these deficiencies in Relax NG *far*
> outweigh its benefits over XSD-there's really no comparison. As much as I
> would like to see some aspects of XSD improved, unless someone can propose
> real alternatives for issues like the ones above, Relax NG is simply not
> an option for the type of data I work with
>
Three responses.
First is that XSD was not designed as an abstract data modeling language
but rather a markup description language: even though the grammars have
been somewhat extended with xsd:any and wildcards (and now assertions
and conditional types), XSD is not a substitute for the kinds of things
you would use, say, UML for. (And then convert the UML to RELAX NG.)
XBRL is an example of a system that attempts to piggyback data modeling
on top of XSD, and makes an almost fatal complexity.
The second is that IMHO it is better to see RELAX NG as part of the ISO
DSDL standard, so when there is some feature missing from RELAX NG, it
may be provided elsewhere. This is a layered approach, where each
language tries to do one thing really well, rather than a monolithic
approach where . For example, you organize your system to that there is
a master RELAX NG schema, then you add your restrictions as a layer in
Schematron. You do the detailed cardnality constraints in Schematron.
The advantage of this approach is that you are not at the mercy of the
abstractions provided by the schema language: you don't need to think
"what combination of restiction/extension/redeclaration do I need, and
what are the order, ambiguity and UPA issues?" because you can just say
"Inside a Bar, element X is not allowed" directly.
Why model abstractly things that can be more simply said directly? If
the intent of all the data modeling is to reveal the connections and
design of the system, the grammar-based systems have the fundamentally
poor abstraction weakness that they don't allow negatives or exclusions,
one of the great tools of logic simplification: you don't need to have
types in order to do restrictions, and you don't have to have types in
order to document and explain restrictions.
The third thing is that the design of RELAX NG is, to my mind, that
abstract assertions about the relationships between schema items does
not belong in the schemas themselves. They would better left to tools.
For example, there is a third party tool that checks if a RELAX NG
schema is ambiguous or not. But certainly the lack of tools for RELAX NG
shows that it has not been adopted by people who are so keen on
automated conversion from abstract types to the schema language. Because
of RELAX NGs strong theoretical base, it should be easier to do these
kind of set operations on.
I would note that one large schema we maintain, in which a master
vocabulary is used by a dozen committees to make multiple local schemas,
we actually remove type derivation information from the working draft
schemas, and only add it at the end (it is a pain) because it is too
much trouble maintaining the base schemas to track the work in progress.
So we use XSD, but in the same kind of approach (and custom tooling)
that would have been necessary if we were using RELAX NG. (In fact, they
are discussing abandoning much use of type derivation and just moving to
a master XSD and much more Schematron, just for reasons of directness
and simplicity.)
> As a postscript, let me say I'm very thankful that the tone of this thread
> has been gradually transitioning to actual constructive argument (at which
> this list is normally rather good) rather than the ridiculously juvenile
> mud-slinging that was predominant early last week-in particular, I've
> appreciated the level-headed input from Jim Tivy, Liam Quin, Ken Holman,
> and Michael Kay.
Argumentative on this topic: that would be me! I plead guilty. For
constructive responses, please see the ISO standards for RELAX NG and
Schematron etc which a number of us on this list have worked on, and the
open source implementations we have made. Please also note that there
was no mud-slinging at individuals. If some of us have lost patience at
XSD because we have to use the wretched monstrosity, please don't blame
the victims!
Cheers
Rick Jelliffe
P.S. "how do you represent a structure with five possible elements, at
least one of which must be present and none of which may be repeated?
The best method I have found takes about 25 lines and far too much code
repetition."
Here is one straight-forward Schematron schema, in 10 lines, including
documentation, if I understand the requirement.
<rule context="x">
<assert test="count(a) <= 1">An a may only appear once in an
x</assert>
<assert test="count(b) <= 1">An b may only appear once in an
x</assert>
<assert test="count(c) <= 1">An d may only appear once in an
x</assert>
<assert test="count(d) <= 1">An e may only appear once in an
x</assert>
<assert test="count(e) <= 1">An f may only appear once in an
x</assert>
<assert test="count(*) > 0">There must be at least one element in
x</assert>
<assert test="count(*) = count(a) + count(b) + count(c) + count(d) +
count(e)">
The only elements allowed in x are a, b, c, d, and e</assert>
</rule>
And here is another version, which demonstrates that you can do kinds of
abstract type modeling
in Schematron that are unavailable in XSD. In this case, it is mixin
style modeling (like interfaces
in Java). Note that there is no combinatorial explosion: in fact, to add
an extra child you only
need to alter the last XPath, not even add a new element.
<rule abstract="true" id="non-empty-element">
<assert test="count(*) > 0">There must be at least one element in
<name/></assert>
</rule>
<rule abstract="true" id="bag">
<assert test="*[not(following-sibling::*[name()=current()/name()])]">
No child of x may appear more than once.</assert>
</rule>
<rule context="x">
<extends rule="bag" />
<extends rule="non-empty-element" />
<assert test="count(*) = count(a) + count(b) + count(c) + count(d) +
count(e)">
The only elements allowed in x are a, b, c, d, and e</assert>
</rule>
I think that XSD 1.1 will have an OK story: you could use xsd:any with
maxOccurs=1,
and then an assertion using count(*) > 0 I guess you could use
RELAX NG & in the
same way.
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.
[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]