Re: [xml-dev] XML Schema complex type restriction

XSD misses at least five important mechanisms needed for practical schemas.

First, it does not allow parameterization: i.e. parameters supplied that can be used to create cohesive subsets. Interestingly, one of the extensions (pushed by mathematician Dave Peterson IIRC) mooted for SGML in the early 1990s was to turn the parameter entity mechanism into a proper logical expression language (with not, or, and combiners). It is ridiculous that an abstract schema cannot be instantiated just by filling in the missing blanks, but the relentless focus on databases and databinding requirements of XSD shows.

Second, it does not allow mixins or exclusions where you select or reject the particular things you are interested in: e.g. to disallow tables or enable style attributes. Of course mixin versus inheritance is the oldest debate in OO language design, with many languages that start with hierarchy adding various mix-in-lite mechanisms to cope (and that debate is now rattled by the emergence of annotations.) XSD is not a schema language for lifecycle support of XML documents, it is largely failing experiment to squeeze XML document lifecycles into type hierarchies.

Third, it does not allow systematic combination of constraints into cohesive groupings that can be enabled or disabled. This was the flaw that meant that HTML needed three schemas, for example: there wasn't any concept of a thing between types and namespaces that could cope with "version".

Forth, it provides no lifecycle mechanism. For example, there is no way to mark elements as deprecated, and to get validation reports that can tell whether obsolete elements have been used.

Fifth, it limits its scope to only the information in a single document. While the most important classes of documents are compound (eBooks, websites, OOXML, etc).

In contrast, Schematron has supported these since the turn of the century: parameters (schema parameters, abstract pattern parameters, variable values pulled from config documents, abstract rule parameters) for the first, the pattern for the second, phases for the third, role attributes for the fourth. (The fifth has has expanded support in the new Schematron, because you can specify which document each pattern applies to in a schema.)

And there is an important sixth shortfall: there is no standard XML format for representing the results of validation. So there is no standard way to take the validation results as part of an XSLT toolchain and do stuff with it. (Schematron has ISO SVRL reporting language for this.) You might say that XML Schemas is a technology for constructing dead-ends for your XML. ...perhaps too harsh...

Regards

Rick

On Tue, Sep 26, 2017 at 8:07 AM, u123724 <u123724@gmail.com> wrote:

I'd like to share my experience here with the ISO 20012 schema to
represent financial transactions as used in the upcoming EU MiFIR
reporting process. There's no question the schema was competently
designed by domain experts, and also from a markup design PoV apart
from using generated type names I guess. I believe the format was
inspired/based on Swift messaging or was even designed by the Swift
itself.

But end users need to inspect the data represented and reported; the
format has rich alternative representations for concepts such as
persons, prices, quantities, parties, transaction legs, etc. which
aren't adequately represented as flat-file records. There's the real
problem that the format is already too intensive and complex for end
users to comprehend ("we don't understand XML"), yet constructing the
XML in software is complicated enough to require at least basic domain
knowledge. The schema doesn't even use complex type restrictions.

It might not be such a good idea to use all nuances of XSD for the
"perfect" representation when it gets unwieldy for your users and
limiting your choices for developers as well in terms of required
knowledge mix. I know I generally advise against substitution groups
and nillable. I'm not even sure local redefinition of element names as
possible with XSDs but not DTDs is desirable.

M. Reichardt
sgmljs.net

On Mon, Sep 25, 2017 at 7:47 PM, Webb Roberts <webb@webbroberts.com> wrote:
> On 2017-09-25, at 08:47:59, Hans-Juergen Rennau <hrennau@yahoo.de> wrote:
> how does NIEM treat this question, where restriction of extreme generality
> should be extremely important?
>
>
> The National Information Exchange Model (NIEM) provides a set of XML Schema
> components that can be reused to build concrete exchanges. NIEM has defined
> data components that represent many common objects needed in exchanges by
> participating organizations. NIEM started as GJXDM, based on US state,
> local, and tribal participants, and evolved to also include federal
> participants, primarily US DOJ, DHS, and DOD, with some international
> participation.
>
> NIEM data components are principally defined by a set of XML Schema
> documents, and are broken up into namespaces organized by governance.
> There's a technical level (structures and other utility schemas), then the
> NIEM core, governed by a cross-government group, and a set of domains that
> focus on topic areas, like trade, immigration, justice, and military
> operations. NIEM is based in the RDF model; we're using JSON-LD for NIEM's
> JSON approach.
>
> NIEM defines reference schemas that don't use restriction. Anyone who
> develops an exchange based on NIEM schemas is encouraged to build a subset
> of the NIEM reference schemas. NIEM provides a tool, the NIEM Subset Schema
> Generation Tool, that helps a user pick types and elements they're
> interested in, to generate a subset schema. The SSGT writes and reads a
> file, called a "wantlist" that identifies what pieces of the reference
> schemas need to be included. A resulting subset will have the data
> definitions listed in the wantlist, along with the definitions those
> definitions require, like base types, types of elements, and elements for
> substitution groups. Contents of a wantlist look like:
>
> <w:Element w:name="nc:Person" w:isReference="false" w:nillable="true"/>
> <w:Type w:name="nc:PersonType" w:isRequested="false">
> <w:ElementInType w:name="nc:PersonBirthDate" w:isReference="false"
> w:minOccurs="1" w:maxOccurs="1"/>
> <w:ElementInType w:name="nc:PersonName" w:isReference="false"
> w:minOccurs="1" w:maxOccurs="1"/>
> </w:Type>
>
> The basic property that NIEM expects of a subset schema is that any instance
> that is valid against a subset schema must be valid against the base
> reference schemas. So if something is required in the base schema, it must
> be required in the subset. Most things are optional in the reference
> schemas, so a subset may constrain optional components to be required. It
> gives a lot of flexibility to the exchange developers as to exactly what
> they want in their exchanges. Subsets have worked well. People use wantlists
> to collaborate and save and upload their requirements, and the resulting
> subset schemas are pretty simple. You could do more in subset schemas than
> the SSGT does, but it seems to do enough.
>
> Extension schemas are where exchange developers build on the reference
> schemas, via type extension, new substitutable elements, completely new
> types, etc. XML Schema restriction is allowed in extension schemas.
>
> NIEM exchanges are encouraged to make their exchanges precise via:
>
> Subset schemas that provide just the data definitions of interest to the
> exchange
> XML Schema restriction to make data definitions more precise
> Rules via Schematron to provide whatever else is needed
>
>
> We hear from some people who don't like Schematron, and want to do
> everything via XSD validation, but it's a balancing act to reuse common data
> definitions while making everything super-precise. Adding a few Schematron
> rules can greatly simplify the schemas.
>
> The NIEM Subset Schema Generation Tool is at
> https://tools.niem.gov/niemtools/ssgt/index.iepd
> NIEM reference schemas are on GitHub: https://github.com/NIEM/NIEM-Releases,
> with addition info and tools listed at http://niem.github.io/niem-releases/.
>
> Very respectfully,
> Webb Roberts
> Georgia Tech Research Institute
>
> On 2017-09-25, at 08:47:59, Hans-Juergen Rennau <hrennau@yahoo.de> wrote:
>
> "The trouble with restriction is not knowing exactly what the differences
> are or why."
>
> This is an interesting point. (And I've always avoided restriction of
> complex types, instinctively.)
>
> Of course it would, in principle, be very easy to specify a restriction step
> explicitly, using a tiny vocabulary for specifying the removal of optional
> elements, other cardinality changes and whatever else is needed. Such a
> "restriction descriptor" might be the input (together with the original
> schema) for generating the restricted schema, as well as a new from-scratch
> schema expressing the restrictions. I wonder if there are any proven
> approaches to this which might be considered good practise? (And how does
> NIEM treat this question, where restriction of extreme generality should be
> extremely important?)
>
> With kind regards,
> Hans-Jürgen
>
>
> Rick Jelliffe <rjelliffe@allette.com.au> schrieb am 10:01 Montag,
> 25.September 2017:
>
>
> You are forced to use a started kitchen sink schema because is standard and
> therefore will make life easier.
>
> However, most of the elements and attributes are things you dont need. And
> you know the full schema will blow out implementatuon and confuse testing
> and anyway YAGNI.
>
> So you will make profile (Subset) of it using restriction and distribute
> that.
>
> But then your schema documents may be bloated. It may be simple just have a
> parallel validation which just check that only wanted element names are
> used, using any schema language.
>
> I.e. In allow elements a,b,c,d,e,f,g,... only chill elements
> a,b,c,d,e,f,g,... can be used. So the big schema states all the rules. The
> small one excludes unwanted canes, making it really expect. The trouble with
> restriction is not knowing exactly what the differences are or why.
> Rick
>
>
>
>
> Regards
> Rick
>
>
> On 25 Sep 2017 5:27 PM, "Mukul Gandhi" <gandhi.mukul@gmail.com> wrote:
> Hello list,
> Can anyone come up with a useful business use case, to use XML Schema
> complex type restriction?
>
>
> --
> Regards,
> Mukul Gandhi
>
>
>
> Very respectfully,
> Webb
>
> --
> Webb Roberts <webb.roberts@gtri.gatech.edu>
> Senior Research Scientist, Georgia Tech Research Institute
> office/mobile: (404)407-6181
>