RE: Advantages/disadvantages of extremely simple XML designs

Outstanding! Thank you John, Liam, and Dimitre. Awesome information.

I’d like to keep the information together, so I collected all the responses below.

---------------------------------------

I think that for such a document design, you should skip the XML Schema (or any other grammar schema) altogether, and just stick with Schematron.

Liam Quin

> <properties>
>     <entry key="A">A content</entry>
>     <entry key="B">B content</entry>
>     <entry key="C">C content</entry>
>     <entry key="D">D content</entry>
>     ...
> </properties>

This was for years a common XML (and SGML) half-anti-pattern. If you know most of the key values and they are safe, use element names, e.g.,

That way you can validate them, authoring software can help suggest them, and it's no harder to deal with in an application. There can still be "entry" elements for new/unknown values.

But if the values aren't known in advance, then yes, you want XSD 1.1 and/or Schematron.

You could use only Schematron, but if you want type assignment - e.g. it's going into a database to be queried with XPath 2 or later or with XQuery - then XSD is an obvious win.

Schematron is quite widely supported, by the way, including as assertions inside an XML Schema. Support for XSD 1.1 is more limited but still not that uncommon in some environments.

What's most important is that the design fits with how the people who work with the information think about it. You could use XSLT to transform it into something you can validate, for example, probably efficiently.

Dimitre Novatchev

> <properties>
>     <entry key="A">A content</entry>
>     <entry key="B">B content</entry>
>     <entry key="C">C content</entry>
>     <entry key="D">D content</entry>
>     ...
> </properties>

The above is equivalent to using a (type less) hash-table (key => object), not a typed Dictionary <key, value>, so one can convert the discussion to one of comparing strongly-typed versus weakly-typed programming.

In a hash-table one can put any type of object, even such objects whose type and existence were not anticipated at design time. This gives the feeling of freedom, flexibility, and extensibility.

This can be useful for a simple and universal store, that doesn't have to know anything about the stored data -- a useful functionality to have in some cases. Also, if we are in the middle of a project and we encounter new types of objects that we don't understand completely, we can "temporarily" store them in a hash-table and delay the moment we will deal with them until we need to and know more about these objects.

Also, an advantage is the fact that having no constraints makes everything easily/immediately storable and gives us the ability to interpret the same object in more than one way. So, we can change our code that gives type and meaning to such an object, without having to change its storage mechanism, thus achieving more flexibility.

One disadvantage is the effort and additional (run) time to load/convert/deserialize such anonymously-typed objects into specifically typed ones -- or in other words, to implement dynamic typing.

Another disadvantage is that the code that deals with such anonymous objects will have low readability/understandability/maintainability. Such representation should be used only at a very low level of a layered system.

Still another disadvantage is that much less optimization and compile-time error-raising is possible with dynamic typing. So, one must expect to have more run-time errors... and find oneself repeatedly dealing with these errors. Also, expect using the anonymously-typed objects to consume significantly more processing time than if they were strongly-typed.

Personally, weak typing is too universal for my taste -- I prefer to call the object types with their most appropriate names and in their natural problem domain / context.

From: Costello, Roger L.
Sent: Saturday, August 11, 2018 10:35 AM
To: xml-dev@lists.xml.org
Subject: Advantages/disadvantages of extremely simple XML designs

Hi Folks,

There is an Apache technology called NiFi. It is used to distribute data. [1]

A feature of NiFi is that a NiFi application will automatically generate metadata about the data it ingests. The metadata is formatted as XML, which has this form:

<properties>
    <entry key="A">A content</entry>
    <entry key="B">B content</entry>
    <entry key="C">C content</entry>
    <entry key="D">D content</entry>
    ...
</properties>

Each item of metadata is represented with an <entry> element. The “key” attribute contains the name of the metadata item, and the value of the <entry> element contains the value of the metadata item.

What are the advantages and disadvantages of this type of XML design?

Advantages:

The markup is simple and regular: just a root element and one element that is repeated over and over. So, it’s easy to understand and use.
The design is highly extensible: to represent new metadata items, simply add more <entry> elements. No changes to the markup (i.e., no change to the XML Schema).

Disadvantages:

There is likely to be a constraint (relation) between an <entry>’s @key value and the <entry>’s content:

Examples of co-constraints:

“If @key has the value A, then the content must be a string restricted to 20 characters and must have this pattern: regex”
“If @key has the value B, then the content must be a member of this enumeration list: foo, bar, blah”
And so on.

XML Schema 1.0 doesn’t support co-constraints. [2] So, you need to supplement XSD with Schematron. In some environments, it is simply not feasible to perform both XML Schema validation and Schematron validation.

Consequently, co-constraints go unexpressed. The design hampers the ability to use XML Schema’s powerful constraint mechanisms. Thus, the design hampers the ability to perform powerful error-checking.

Do you agree with these advantages and disadvantages? Are there other advantages and disadvantages?

/Roger

[1] https://nifi.apache.org/

[2] Yes, I know that XSD 1.1 supports co-constraints, but (I assert, without evidence that) there is lesser support for XSD 1.1 than for XSD 1.0 or for Schematron.