Lists Home |
Date Index |
Alaric Snell wrote:
> On Monday 28 October 2002 4:12 pm, Paul Prescod wrote:
>>>Because XML has a fragile data model, designed for publishing stuff to a
>>>browser rather than transfer between applications?
>>XML is based on SGML which was invented long before browsers as we know
> Yep, but that's orthogonal to my point; XML is based on ASCII, which is based
> on binary, which has been around since (potentially) ancient times in China
> (there being some evidence of binary arithmetic in certain ancient Chinese
Argh. The XML _data model_ is not, as far as I know, much different than
the implicit SGML "data model" which long predated browsers. If you
believe that XML has a much "weaker" data model than SGML's (and not
groves, which post-date SGML), then please present evidence. Otherwise
let's leave aside the fact that XML was designed (in part) for browsers.
> XML was designed, sometime in the late 90s; quite incidentally they decided
> to subset SGML because it seemed a good base for solving the problem at hand
> (and scarily enough, I agree with them! XML's great as an SGML-lite :-)
There was nothing incidental about it.
> But for tables, it's clunky because you're having lots of nodes with the same
> structure under a parent node. Repeating that structure gets laborious to
> type if you're a human and is laborious to process if you're a computer. For
> a table of tuples, it's much easier for all parties to deal with:
> email@example.com,Alaric Snell
> firstname.lastname@example.org,Paul Prescod
> email@example.com,"Comma Containing, Mrs"
No, actually, I don't see it as being easier for a computer to keep a
stateful mapping of columns to column titles. And although the
compressed form is more convenient for a TYPIST, it is actually much
easier to read the XML spelling out the attribute names. If you are
adding only one entry (as opposed to typing thousands), I also prefer
the XML because you can't forget whether the "0" is the UID or the GID
(to use an /etc/passwd example).
Nevertheless, no matter what you keep saying about the "bulk" of the
world's data, most of the data that is interchanged is NOT in a simple
> ...and of course for graphs you need a system of primary keys and pointers
> like id= - and you have to build your graph over a hierarchy which isn't
> always optimal; at worst you use the hierarchy to build a list of key-value
> pairs, then use the list of key-value pairs with 'reference' nodes in the
> values to build a graph :-/
At worst it isn't much worse than any other serialization of a graph.
Reading serialized graphs is always a pain.
> Or how about a multiple-parent hierarchy, hmm? A family tree?
> Not so bad in CSV:
> Alaric Snell,Karin Owens,Lionel Snell
> Karin Owens,Jean Byrne,James Owens
> Lionel Snell,moment of shame as I forget my grandparent's names...
<person name="Alaric Snell" mother="Karin Owens" father="Lionel Snell"/>
<person name="Karin Owens" mother="Jean Byrne" father="James Owens"/>
It isn't as if CSV handles this issue naturally and the XML is massively
> Now, of course, the counter argument is that you put up with XML being clunky
> at tuples or graphs because it's good at hierarchies and lists and bearable
> at tuples and graphs so you overally have something that can kind of manage
Any serialization of graphs is going to be clunky (XML is no worse than
CSV or s-expressions) and I personally have no problem with XML for tuples.
> [s-expression idea]
> And kapow! I had a slight modification of s-expression syntax being my text
> input format. For completeness, here's a plain list in s-expr:
Yes, we discussed all of this five years ago when we were defining XML
(and years before that, and every year since then)! If you don't mind
having two syntaxes, sure you can do better than XML for certain kinds
of data. You could even have three or four or five syntaxes for
different kinds of data. But XML demonstrably hits the 80/20 point such
that most developers think it is good enough for both data and documents.
> Now s-expresions look pretty hierarchical here, but that's just a shorthand;
> they're really full graphs. I can't remember the graph syntax since I rarely
> used it but basically you can name any node with a bit of attached metadata
> and then say "Insert a reference to node X here" where you need it to create
You aren't talking about s-expressions anymore. You're talking about a
syntax you invented. Yet Another Information Encoding. Why not go with
> Now, if the people who came up with XML as data had wandered across a nest of
> Lisp hackers instead of XML, might we not have seen something like my
> s-expression variant with the symmetric tags being produced, perhaps
> augmented by a syntax to embed nodes in text strings as I hacked together
> with a macro system, then namespaces for symbols defined, then a
> transformation language and a path language and so on defined?
Sure, if S-expressions had the features you propose (they don't) and a
schema language. I've been through this argument too many times:
> Yep. I just think we could have come up with a system that provides a better
> overall gain for humanity - and has less areas where it produces small
> benefit. The s-expression corresponding to a CSV file is little more complex
> than the CSV would be, so it's already less of a loss than using XML in that
You're presuming that there is no cost in having multiple syntaxes:
different ones for different domains. But clearly there is _some_ cost.
> XML's gained a following, but the hype is waning already. It's lodged in many
> niches but it's failed to change The Web as I see it, what it was originally
> designed for... although XML with embedded stylesheets being served over HTTP
> and displayed by browsers would have been a better Web the improvement is
> justifiably marginal compared to the costs to vested interests, so I'm not
> too surprised.
The Web as we see it has stagnaged since Microsoft wiped out Netscape.
But all of the most innovative emerging technologies on that front are
XML-based: RSS, SVG, XUL, XForms. It is way too early to say that XML
didn't change the Web. Give it five years.
> Currently my home machine is mounting Sunsite over the Global Interweb (tm)
> with NFS, which uses ONC RPC underneath. This is cool because I can do:
> Now I don't think that would be *practical* with SOAP over HTTP! My ls
> request ran over UDP packets with sizes of 100-200 bytes - one packet to
> request, another packet to respond, on three-way handshakes or teardown and
> no TCP overhead wasting time reordering the response packets to make them
> come in the order the server sent
I guess the thousands of people using WebFolders are all hallucinating.
;) I would guess that there are more people using those every day than
NFS (but not as many as SMB -- yet).
> (and I can submit requests without needing
> to wait for the response to the last request, which I don't
> think HTTP 1.1 allows but I may be wrong).
That's what HTTP 1.1 pipelining is.
> I guess that's part of what makes my blood boil about XML data and XML for
> interprocess communication and all that; the reinvention of wheels, with less
> care than the first time round :-(
If it had been done right the first time around there would be no
temptation to do it again. It surprises me that there are still people
who don't "get" that schemas are an important part of the difference.
Mixed content is another important part.
Of course, I'm not defending SOAP and the idea of something like NFS
running on top of SOAP running on top of HTTP makes me cringe.