XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] The Goals of XML at 25, and the one thing that XML now needs

XML's primary business these days is representing complex high-value high-structure documents, for example legislation and health records and humanities texts.  It’s really the only sensible choice for that kind of thing. 

When XML surfaced 25 years ago, it was the first OS-neutral database-neutral programming-language neutral data format that anyone could use because there was decent open-source software and everything more or less Just Worked. So it got used for everything. Maybe the most important accomplishment is proving that neutral data representations are practical and useful. Now we have lots!



On Sun, Jul 18, 2021 at 6:33 PM Rick Jelliffe <rjelliffe@allette.com.au> wrote:
Forget the use case of XML for ephemeral, computer generated exchange of database-style data. That has gone the way of the dodos with JSON. 

And forget the use case where humans never  need or want to edit documents conveniently
That was always a fantasy by people trying to sell products.

And forget the idea that every font has every glyph. The ideas that we can always just read the direct character is  stillnot feasible.

And named entities, especially for STEM documents, have the advantage of being consistent: searching in Unicode for the correct version of a specialist character is often difficult, because interfaces often want you to know the Unicode block first.

Think about it: there was tremendous favour for a simplified XML 20 years ago (including very respectable and thoughtful people like James Clark and IIRC Tim Bray): but industrial users of XML, the very ones who had initiated the development of XML, never got onside and so it flopped: I believe that the reason was that without public entities, it was not practical: the straw that broke the camel's back. 

Plus I think the case then was not made of concrete benefits of simplification: in which case "simplicity" looked like just a euphemism for "elegance", which has zero attraction (and warranted suspicion) for industrial adoptees: it is a second-order requirement not a primary requirement. Contrast with a method that says "lets remove ONLY the minimum needed to allow the particular issue case of parsing speed to be given a head start" avoids the bizarre methodology of "lets remove everything we can until we just have elements and attributes. (I.e. equivalent to JSON without numbers etc, just strings: useless, impractical, sweet spot removing.) We look at the concrete examples from academia and implementation experience and trials and consider why they had to stop at some subset or another.

(Now, of course, I do have many other pet things I would have loved to see added or removed. In particular, I would like to remove the requirement that a document only has one top-level element, to allow progressive streams better: an implied top-level element if you prefer. And that does start with a technological use-case, not some speculative intuition about beauty. But it )

Cheers
Rick


 

On Mon, 19 Jul 2021, 3:56 am Marcus Reichardt, <u123724@gmail.com> wrote:
Hi Rick,

nice to hear you're doing well. I guess if even UK (and Israel) sees
the kind of infection rates they're seeing right now, we're in for
another lockdown season, no matter what politicians say now.

If I understand correctly, you're advocating for incorporating all
HTML/MathML entities into XML as predefined entities. But I don't
quite understand why your requirements wouldn't be met by using no
entities at all, relying entirely on encoded Unicode characters, with
the absence of entities besides the standard predefined entities (&lt;
etc) suitably represented by the absence of a DTD/entity declarations
for a parser to take a fast code path. There's also the issue of
Unicode gaining new code points all the time for things such as emojis
as they emerge, but also for scripts that only now get integrated into
Unicode.

Best,
M. Reichardt
sgmljs.net

On 7/18/21, Rick Jelliffe <rjelliffe@allette.com.au> wrote:
> While in lock-down, I took the time to write down a little post for
> Schematron.com called "The Goals of XML at 25: and the one change that XML
> really now needs
> <https://schematron.com/2021/07/the-goals-of-xml-at-25-and-the-one-change-that-xml-really-has-needed/>"
> which people interested in the past and future of XML may find familiar but
> not irrelevant.
>
> Key passage, or twist:
>
> "*For several decades I have dabbled with methods to speed up parsing UTF-8
> and XML using SIMD and parallel parsing: my conclusion is that the approach
> I am suggesting here is the only feasible way for XML to not be sidelined
> as slow and complex. I think the lack of papers and experience
> demonstrating otherwise indicates it too.)"*
>
> Regards
> Rick
>
> (Here in Sydney we are in lockdown again, after an exiled year of almost no
> cases, Delta broke through, and we are trying to eliminate it. Taiwan
> successfully eliminated it this month, so maybe we will: elimination is a
> feasible strategy on islands, rather than just suppression. I get my 2nd
> vaccine tomorrow.)
>


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS