Re: [xml-dev] It's too late to improve XML ... lessons learned?

<snip>

XML has been very resistant to this kind of evolution. We've seen XML parsers that don't support some features in the standard (notably DTDs), but we've seen little tendency to support extensions (such as relaxing the rules on element and attribute names, or allowing the outermost element to be omitted). Perhaps this is because there are so many parsers in use and because they aren't easy to change. It certainly means that XML is a much stronger interoperability standard than some others; but it does lead to a certain amount of frustration because everyone can see that some of the rules (like disallowing nested comments) are just plain silly.

</snip>

Part of this lack of evolution came about because parsers are very much tied into the language frameworks in which they're written, and once a language framework establishes a default implementation, most developers in that language cease looking elsewhere for potential upgrades, preferring instead to work with obsolete packages. Xalan was the default Java implementation, and it never moved beyond XSLT 1.0. The change to switch to a modern XSLT implementation is one line of code, but the number of people who know to change that one line of code is disturbingly small.

I'd also contend that the outright hostility of the HTML5 group towards XML in any form in the browser has kept the most obvious point where evolution could occur - the web browser - did not help matters much, to the extent that they bent over backward trying to come up with a non-namespace namespace solution for web components rather than admit that XML did get something right. It should be a fairly trivial thing to implement a streamable XML parser in node.js (there are ten listed with a quick Google search (https://openbase.com/categories/js/best-nodejs-xml-parser-libraries).

Personally, it may be time to take another stab at an XML 2.0 stack, with amendments for XPath, XQuery, and XSLT that are already written to acknowledge streaming, starting with acknowledging a "canon" _javascript_ parser, then moving out to Python and then Java. JSON is beginning to creak under its own limitations, and a new generation of developers and data scientists may likely be more amenable to something that captures the best of both. JSON is piss-poor at capturing narrative structures, XML's namespaces are unwieldy, streaming is a must, and identifiers ... (don't get me started on identifiers).

Kurt Cagle

Community/Managing Editor

Data Science Central, A TechTarget Property

kcagle@techtarget.com or kurt.cagle@gmail.com

443-837-8725

On Wed, Jan 5, 2022 at 3:36 AM Michael Kay <mike@saxonica.com> wrote:

>It seems that many Big Data systems read in a dialect of JSON, with one JSON "file" per line.

Indeed, that's a popular format, and it's an example of how standards can evolve through community evolution. Perhaps it will become popular enough that someone writes an RFC for it and allocates it a media type.

This kind of evolution has benefits and drawbacks. You often end up with a nice new capability that isn't universally supported (for example, use of non-Latin characters in email addresses), and it's a fine line whether the new capability is useful if not everyone supports it. At worst, you end up with a standard that's so woolly it's a nightmare (take CSV as an example).

XML has been very resistant to this kind of evolution. We've seen XML parsers that don't support some features in the standard (notably DTDs), but we've seen little tendency to support extensions (such as relaxing the rules on element and attribute names, or allowing the outermost element to be omitted). Perhaps this is because there are so many parsers in use and because they aren't easy to change. It certainly means that XML is a much stronger interoperability standard than some others; but it does lead to a certain amount of frustration because everyone can see that some of the rules (like disallowing nested comments) are just plain silly.

Michael Kay
Saxonica