Re: [xml-dev] KML is very extensible ... but why?

On Mon, 23 Apr 2018, 10:57 Simon St.Laurent <simonstl@simonstl.com> wrote:

On 4/22/2018 8:02 PM, Patrick Durusau wrote:
Simon,

I shudder at "...it's just an extraction problem...."
You're supposed to. That shuddering can be the first step away from years of thinking that we work with solid stuff rather than liquids or gases or plasma.
Switching from one ontology to another must just be a mapping problem. ;-)
It has similar problems, both technical and cultural.
If those are both "...just..." type problems, why do you think data
scientists keep talking about transformation of data being 80% of what
they do?
In my experience, data scientists talk about data clean-up as 80% of what they do. I suppose you could count that as transformation, but it includes everything from badly structured to badly entered to outright corrupted data. I haven't ever heard "the XML Schema structured things exactly as we wanted them" from... anyone who hadn't just created that schema.

I don't think that "oh my god someone let this schema be extended" is likely a problem for data scientists unless they wish that more of their data had used those extensions. Granted, it's possible to create extensions that duplicate data elsewhere in the schema, and have semi-duplicate data that doesn't match (I have seen it!), but again, that's not unusual cleanup.
Transformation requires an understanding of both the target and source
formats. Or should I say understanding the semantics of both formats?
Sure, if it's well-formed XML, all manner of things are fairly trivial,
if you just knew which ones to do.
In the case of schema extensions, you can:
(a) ignore the content because it isn't relevant to you
(b) ask for help
(c) study use context, including other people's transformations
(d) guess

Most of the time, (a) or (b) take care of it. (c) has not been difficult in my (admittedly distant) experience.

There are also times - as Walter Perry has enjoyed reminding us over the years - where we're more interested in where people jumped the structures than we are in cases where they followed the rules. Open schema models make it vastly easier to detect those changes.
In the interest of disclosure, I have seen any number of academic
projects that differ from other projects because they have special need
#1 or #2 or .... To be honest, not really. They typically are encoding
their texts to be different so it works with their tool set (which they
developed), etc. That may not be everyone's experience but it certainly
is mine in the humanities.
My experience is that everyone has something they want to do that goes beyond the available tools. Either they shut up about it and forget what they wanted, or they find a corner to allow it. As much as I hate divs and spans in HTML, I know very very well why people use them.
I'm not claiming my experience is universal and others may have
different stories to report.
Your experience is at least conventional. I just find those conventions to be the wrong set.
There certainly are other ways to create vendor lock-in, such as writing
your own database software. (Or HR software, I understand the Pentagon
has some 6,000 such systems.)
It doesn't even take that. When XML was new, a lot of the excitement around it came from businesses trying to get a leg up on their competition by being ahead on the standards process. There are still players in that game, but the stickiness of relationships, the complexity of creating interfaces, the challenges of backwards compatibility, and the power of brand loyalty seem vastly more powerful.
You may be right, whether encouraged or not, bad behavior (lack of
interoperability) will occur. Still, the lack of same creates a lot of
wasted time and effort.
I no longer consider vocabulary interoperability good behavior. I haven't for a long time. I think syntactic interop has much greater value, making it much easier to share tools. Those tools ecosystems are finally reaching the point where we can flexibly create and exchange information without having to stay in lock step.

Way back in 2003, I gave a rant at Extreme Markup Languages on these issues, accompanied by Playmobil figures and a bit of Strauss. I took a wrong turn in diving too deep on the value of specific syntactic details, but I'm quite content in the overall point that shared syntax is a blessing and shared semantics a curse.

XML was created in part as a reaction to HTML's fixed vocabulary. I'm puzzled that HTML today seems to be grasping the need for flexibility far better than the XML world - the dreaded div, span, and class, JSON as needed, plus the slow refactoring of those pieces into web components. We seem, though, to finally be reaching the point where we can usefully exchange information and even interfaces for working with information without endless negotiation over what the structure must look like.
Enjoy the sunshine, endure the jetlag!
Thank you! I hope your gardening is going well!

Simon