Re: [xml-dev] XML Schema complex type restriction

Thank you very much, Webb - I am glad to have these answers, which, taken together with your links and earlier explanations provide food for weeks of thoughts and studies.

This thread started with concerns about XSD's mechanism of restriction, xs:restriction. I would like to make an attempt and relate what you have described to the notion of restriction in a broader sense. An image came to my mind. Think of an English business letter - crisp, focused, meaningful. (Hopefully.) It it based on a tiny fraction of the English language, augmented with a few extensions which are terms introduced by a particular community. The meaning and impact of the letter totally relies on the letter's *alignment* with the English language. (For example, today means today.) A model of the letter might be constructed by restricting the English language to a small subset, adding a small extension (as mentioned) and reassembling the resulting set of building blocks to simple structures. But how to define the restriction? Obviously, it would be very cumbersome to repeat the English dictionary, marking 99.7 % of the words as not relevant in the present context. It is incomparably better and more intuitive to express the restriction as a selection applied to the base language: to use a wantlist.

With NIEM, the base language is a large RDF vocabulary. (Equivalently, it is the set of core and domain XSD's.) I think you were convinced right from the start that RDF is well suited to express a language, which can serve as a foundation for a virtually unlimited range of letters. Perhaps one might even say that the RDF model is more fundamental and the XSDs expressing the model are more instrumental, serving as the material bridge to connect the language to the letter models, for which, I believe, XSD is indeed a highly appropriate language.

NIEM discovered RDF as appropriate material for shaping a language, and it designed devices for restricting the language to kits of building blocks from which to construct simple letter models. Returning to the very starting point of this thread - the huge challenge of a "National Information Exchange Model" and NIEM's response to it demonstrate the fundamental character which restriction has in information modelling. And NIEM gives answers which should be interesting to any information modeller. The punchline: let us not only think about xs:restriction, but about restriction.

Webb Roberts <webb@webbroberts.com> schrieb am 22:50 Dienstag, 26.September 2017:

Hans-Jürgen, thank you for your kind comments.

On 2017-09-25, at 18:07:04, Hans-Juergen Rennau <hrennau@yahoo.de> wrote:

is NIEM's being based on RDF a concept which had been present from the very beginning of NIEM, or had it been gradually discovered as a possibility?

The alignment between NIEM and RDF was part of the approach (in GJXDM) from the very beginning. A goal has always been to make NIEM a simple, schema-friendly RDFish representation. We've received push-back when we presented RDF concepts to people who just wanted plain XML, but elements as properties and types as classes was the first thing we had on the whiteboard when we first started working Justice XML data representations.

This has come in handy with our JSON work, since the JSON-LD representation of NIEM data basically just falls out of our NIEM RDF mapping.

do you think that the abstract model of NIEM (RDF, core schemas, subset schemas, extension schemas) would be equally appropriate for the creation of enterprise data models?

We've found that getting data working at enterprise scale requires getting the governance right, and that NIEM's hierarchy (core, domain, enterprise, exchange) has been a good way to breakdown governance across lots of parties. We've seen all-singing-all-dancing enterprise models fail under their own weight, while an exchange-focused approach has worked pretty well for organizations where everyone doesn't march to a single drummer. There are gaps between the exchange model and implementations and data storage, but the NIEM approach has worked well as a basis for enterprise-scale data.

do you think that the new Shapes Constraint Language (SHACL) [1] might become an important component in the NIEM ecosystem, perhaps stabilizing the bridge between XML and RDF views?

I hope so. We've always found a bit of a shortcoming between the commercial IT developer community (was mostly XML, now mostly JSON) and the Semantic Web/RDF/OWL toolset. As much as we've wanted to leverage ontologies explicitly for NIEM, we've always found an audience that was unprepared to actually field RDF-based technologies. We've produced RDF-based artifacts on occasion, but they always ended up being shelf-ware.

That said, I feel that JSON-LD has really threaded the needle between JSON and RDF, turning something that's fairly underspecified into something incredibly useful. We're getting some push-back from people who are skittish about using JSON-LD, even though it has zero effects on easy cases, and provides some incredibly useful capabilities. I'm hoping that our use of JSON-LD gets NIEM into the RDF world in a big way.

do you think it would be worthwhile to investigate the relationship between NIEM's approach to restriction and Facebook's GraphQL language [2]?

Absolutely yes, but we haven't yet gotten enough exposure to the GraphQL approach to understand how they fit together.

Very respectfully,

Webb Roberts

Georgia Tech Research Institute

On 2017-09-25, at 18:07:04, Hans-Juergen Rennau <hrennau@yahoo.de> wrote:

My cordial thanks for this excellent summary of the NIEM approach, Webb. I had read parts of it before, but your description pulled everything together as to give it a coherence of concept which I had not been aware of, and which impresses me deeply. Would you like to answer some of the following questions?

First, is NIEM's being based on RDF a concept which had been present from the very beginning of NIEM, or had it been gradually discovered as a possibility?

Second, do you think that the abstract model of NIEM (RDF, core schemas, subset schemas, extension schemas) would be equally appropriate for the creation of enterprise data models?

Third, do you think that the new Shapes Constraint Language (SHACL) [1] might become an important component in the NIEM ecosystem, perhaps stabilizing the bridge between XML and RDF views?

Finally, do you think it would be worthwhile to investigate the relationship between NIEM's approach to restriction and Facebook's GraphQL language [2]?

With kind regards,
Hans-Jürgen

[1] https://www.w3.org/TR/shacl/
[2] http://facebook.github.io/graphql/

Webb Roberts <webb@webbroberts.com> schrieb am 19:50 Montag, 25.September 2017:

On 2017-09-25, at 08:47:59, Hans-Juergen Rennau <hrennau@yahoo.de> wrote:
how does NIEM treat this question, where restriction of extreme generality should be extremely important?

The National Information Exchange Model (NIEM) provides a set of XML Schema components that can be reused to build concrete exchanges. NIEM has defined data components that represent many common objects needed in exchanges by participating organizations. NIEM started as GJXDM, based on US state, local, and tribal participants, and evolved to also include federal participants, primarily US DOJ, DHS, and DOD, with some international participation.

NIEM data components are principally defined by a set of XML Schema documents, and are broken up into namespaces organized by governance. There's a technical level (structures and other utility schemas), then the NIEM core, governed by a cross-government group, and a set of domains that focus on topic areas, like trade, immigration, justice, and military operations. NIEM is based in the RDF model; we're using JSON-LD for NIEM's JSON approach.

NIEM defines reference schemas that don't use restriction. Anyone who develops an exchange based on NIEM schemas is encouraged to build a subset of the NIEM reference schemas. NIEM provides a tool, the NIEM Subset Schema Generation Tool, that helps a user pick types and elements they're interested in, to generate a subset schema. The SSGT writes and reads a file, called a "wantlist" that identifies what pieces of the reference schemas need to be included. A resulting subset will have the data definitions listed in the wantlist, along with the definitions those definitions require, like base types, types of elements, and elements for substitution groups. Contents of a wantlist look like:

<w:Element w:name="nc:Person" w:isReference="false" w:nillable="true"/>
<w:Type w:name="nc:PersonType" w:isRequested="false">
<w:ElementInType w:name="nc:PersonBirthDate" w:isReference="false" w:minOccurs="1" w:maxOccurs="1"/>
<w:ElementInType w:name="nc:PersonName" w:isReference="false" w:minOccurs="1" w:maxOccurs="1"/>
</w:Type>

The basic property that NIEM expects of a subset schema is that any instance that is valid against a subset schema must be valid against the base reference schemas. So if something is required in the base schema, it must be required in the subset. Most things are optional in the reference schemas, so a subset may constrain optional components to be required. It gives a lot of flexibility to the exchange developers as to exactly what they want in their exchanges. Subsets have worked well. People use wantlists to collaborate and save and upload their requirements, and the resulting subset schemas are pretty simple. You could do more in subset schemas than the SSGT does, but it seems to do enough.

Extension schemas are where exchange developers build on the reference schemas, via type extension, new substitutable elements, completely new types, etc. XML Schema restriction is allowed in extension schemas.

NIEM exchanges are encouraged to make their exchanges precise via:

• Subset schemas that provide just the data definitions of interest to the exchange
• XML Schema restriction to make data definitions more precise
• Rules via Schematron to provide whatever else is needed

We hear from some people who don't like Schematron, and want to do everything via XSD validation, but it's a balancing act to reuse common data definitions while making everything super-precise. Adding a few Schematron rules can greatly simplify the schemas.

The NIEM Subset Schema Generation Tool is at https://tools.niem.gov/niemtools/ssgt/index.iepd
NIEM reference schemas are on GitHub: https://github.com/NIEM/NIEM-Releases, with addition info and tools listed at http://niem.github.io/niem-releases/.

Very respectfully,
Webb Roberts
Georgia Tech Research Institute

On 2017-09-25, at 08:47:59, Hans-Juergen Rennau <hrennau@yahoo.de> wrote:

"The trouble with restriction is not knowing exactly what the differences are or why."

This is an interesting point. (And I've always avoided restriction of complex types, instinctively.)

Of course it would, in principle, be very easy to specify a restriction step explicitly, using a tiny vocabulary for specifying the removal of optional elements, other cardinality changes and whatever else is needed. Such a "restriction descriptor" might be the input (together with the original schema) for generating the restricted schema, as well as a new from-scratch schema expressing the restrictions. I wonder if there are any proven approaches to this which might be considered good practise? (And how does NIEM treat this question, where restriction of extreme generality should be extremely important?)

With kind regards,
Hans-Jürgen

Rick Jelliffe <rjelliffe@allette.com.au> schrieb am 10:01 Montag, 25.September 2017:

You are forced to use a started kitchen sink schema because is standard and therefore will make life easier.

However, most of the elements and attributes are things you dont need. And you know the full schema will blow out implementatuon and confuse testing and anyway YAGNI.

So you will make profile (Subset) of it using restriction and distribute that.

But then your schema documents may be bloated. It may be simple just have a parallel validation which just check that only wanted element names are used, using any schema language.

I.e. In allow elements a,b,c,d,e,f,g,... only chill elements a,b,c,d,e,f,g,... can be used. So the big schema states all the rules. The small one excludes unwanted canes, making it really expect. The trouble with restriction is not knowing exactly what the differences are or why.
Rick

Regards
Rick

On 25 Sep 2017 5:27 PM, "Mukul Gandhi" <gandhi.mukul@gmail.com> wrote:
Hello list,
Can anyone come up with a useful business use case, to use XML Schema complex type restriction?

--
Regards,
Mukul Gandhi

Very respectfully,
Webb

--
Webb Roberts <webb.roberts@gtri.gatech.edu>
Senior Research Scientist, Georgia Tech Research Institute
office/mobile: (404)407-6181