[
Lists Home |
Date Index |
Thread Index
]
On Tuesday 29 October 2002 16:20, Paul Prescod wrote:
> Alaric Snell wrote:
> > On Monday 28 October 2002 4:12 pm, Paul Prescod wrote:
> >...
> >
> >>>Because XML has a fragile data model, designed for publishing stuff to a
> >>>browser rather than transfer between applications?
> >>
> >>XML is based on SGML which was invented long before browsers as we know
> >>them.
> >
> > Yep, but that's orthogonal to my point; XML is based on ASCII, which is
> > based on binary, which has been around since (potentially) ancient times
> > in China (there being some evidence of binary arithmetic in certain
> > ancient Chinese symbolism).
>
> Argh. The XML _data model_ is not, as far as I know, much different than
> the implicit SGML "data model" which long predated browsers. If you
> believe that XML has a much "weaker" data model than SGML's (and not
> groves, which post-date SGML), then please present evidence. Otherwise
> let's leave aside the fact that XML was designed (in part) for browsers.
This isn't what I'm trying to say, though; the original question was why XML
didn't have metadata to handle change control, stating how applications
should deal with elements/attributes they don't recognise. And my point is
that while the PNG group had seen this problem arise in things like TIFF and
decided to guard against it, the XML group were thinking much more along the
lines of "We publish DTD; everyone uses DTD; we only publish new DTD when we
feel people won't mind upheaval and reprogramming". The PNG format is
designed to deal with different versions of everything interoperating as best
as they can. While I've seen XML and SGML vocabularies that uses a different
DTD identifier for different versions to keep the versions from ever mixing!
> > But for tables, it's clunky because you're having lots of nodes with the
> > same structure under a parent node. Repeating that structure gets
> > laborious to type if you're a human and is laborious to process if you're
> > a computer. For a table of tuples, it's much easier for all parties to
> > deal with:
> >
> > email,name
> > alaric@alaric-snell.com,Alaric Snell
> > paul@prescod.net,Paul Prescod
> > foo@bar.com,"Comma Containing, Mrs"
> No, actually, I don't see it as being easier for a computer to keep a
> stateful mapping of columns to column titles.
Storing the list of column titles from the first line being easier than what,
the state management of an XML parser?
> And although the
> compressed form is more convenient for a TYPIST, it is actually much
> easier to read the XML spelling out the attribute names. If you are
> adding only one entry (as opposed to typing thousands), I also prefer
> the XML because you can't forget whether the "0" is the UID or the GID
> (to use an /etc/passwd example).
/etc/passwd doesn't *have* column headings though, it's a bad non-extensible
non-flexible format :-)
> > ...and of course for graphs you need a system of primary keys and
> > pointers like id= - and you have to build your graph over a hierarchy
> > which isn't always optimal; at worst you use the hierarchy to build a
> > list of key-value pairs, then use the list of key-value pairs with
> > 'reference' nodes in the values to build a graph :-/
>
> At worst it isn't much worse than any other serialization of a graph.
> Reading serialized graphs is always a pain.
It's quite nice with the s-expr notation for it, though. See below.
> > Or how about a multiple-parent hierarchy, hmm? A family tree?
> >
> > Not so bad in CSV:
> >
> > name,mother,father
> > Alaric Snell,Karin Owens,Lionel Snell
> > Karin Owens,Jean Byrne,James Owens
> > Lionel Snell,moment of shame as I forget my grandparent's names...
> > ...
>
> <person name="Alaric Snell" mother="Karin Owens" father="Lionel Snell"/>
> <person name="Karin Owens" mother="Jean Byrne" father="James Owens"/>
> ...
>
> It isn't as if CSV handles this issue naturally and the XML is massively
> complex.
You're using a short fat tree to store another tree, my point was that XML's
hierarchy is only a particularly limited kind, beyond which you need to fall
back to the same tricks.
> > Now, of course, the counter argument is that you put up with XML being
> > clunky at tuples or graphs because it's good at hierarchies and lists and
> > bearable at tuples and graphs so you overally have something that can
> > kind of manage everything...
>
> Any serialization of graphs is going to be clunky (XML is no worse than
> CSV or s-expressions) and I personally have no problem with XML for tuples.
s-exprs are nicer since they EXPLICITLY express the graph structure; general
tools can deal with the graph. With XML, there is no way to pull out the
references other than parent/child relationships except where people have
used xlink, which does not seem to currently be as ubiquitous as one would
like.
> >...
> > [s-expression idea]
> >
> > And kapow! I had a slight modification of s-expression syntax being my
> > text input format. For completeness, here's a plain list in s-expr:
> Yes, we discussed all of this five years ago when we were defining XML
> (and years before that, and every year since then)! If you don't mind
> having two syntaxes, sure you can do better than XML for certain kinds
> of data. You could even have three or four or five syntaxes for
> different kinds of data. But XML demonstrably hits the 80/20 point such
> that most developers think it is good enough for both data and documents.
How do you mean multiple syntaxes? I presented a single syntax that handles
many cases neatly. My thing for embedding styling in s-expr strings was a
grotesque hack to make it look like HTML; a more elegant shorthand could be
had, I'm sure.
> > ...
> > Now s-expresions look pretty hierarchical here, but that's just a
> > shorthand; they're really full graphs. I can't remember the graph syntax
> > since I rarely used it but basically you can name any node with a bit of
> > attached metadata and then say "Insert a reference to node X here" where
> > you need it to create links.
>
> You aren't talking about s-expressions anymore. You're talking about a
> syntax you invented.
No I'm not. This is standard s-expr stuff from the Lisp days.
Here it is! Used # symbols, not @s. My bad!
http://www.cs.rice.edu/CS/PLT/packages/doc/mzscheme/node153.htm
...ugh, they mandate using numeric identifiers, I prefer my interpretation
with symbols after all :-)
> >...
> >
> > Now, if the people who came up with XML as data had wandered across a
> > nest of Lisp hackers instead of XML, might we not have seen something
> > like my s-expression variant with the symmetric tags being produced,
> > perhaps augmented by a syntax to embed nodes in text strings as I hacked
> > together with a macro system, then namespaces for symbols defined, then a
> > transformation language and a path language and so on defined?
>
> Sure, if S-expressions had the features you propose (they don't)
They have the graph feature, and the other two are just syntactic sugar to
please people who don't like all the ()s.
> and a schema language.
We can *make* a schema language. I'm comparing XML with s-expressions here,
not "The world of XML" with just s-expressions, since if so we'd include
Scheme as our transformation and schema language, no?
> I've been through this argument too many times:
>
> http://www.prescod.net/xml/sexprs.html
Ok, let's take a look.
1) Starting with Syntax
Very much in the eye of the beholder :-) I prefer s-expressions, you prefer
XML. Ok.
2) Redundancy is Good
I showed a trivial fix for this problem with my [foo[ ... ]foo] concept,
which I've implemented with success before, and would submit to the Scheme
standards bodies if I thought it mattered *that* much.
3) Family matters
Ahah! So you're comparing complex, limited, and inconsistent schemas,
infosets, xpaths, xqueries, and xslt against Scheme and (whisper lest you
waken him!) his big brother Lisp? Muhahahah!
3a) Principle of Least Power
That's missing a point - it confuses Power with Complexity, I suspect. The
most powerful languages can be the simplest. One can write a Scheme
interpreter with surprising ease, and the standard library that runs on top
of it is probably about the same size as an HTML DTD... ok, not that I've
actually seen the byte size counts of either, but I'm about as worried about
the complexity of each. A scheme runtime is going to be similar in complexity
to an XSLT engine, actually, and it'll be able to do more!
Put another way, power is the size of the set of things you can do with a
tool. Clearly, this is an attribute we want to increase. However complexity
is a metric of the cost of performing tasks (including initial learning) with
the tool, and is to be reduced. If you assume power and complexity are
inexorably related then indeed you need to find a happy medium, the least
complex system that has enough power to fulfill your needs. But Turing
completeness can be had from a mathematical system you can describe on the
back of an envelope, and that can be used to make a more expressive schema
notation than XSD (I'm think a graph rewriting system here in which valid XML
documents are ones that can be rewritten to a single token, "VALID", or
something like that).
> > Yep. I just think we could have come up with a system that provides a
> > better overall gain for humanity - and has less areas where it produces
> > small benefit. The s-expression corresponding to a CSV file is little
> > more complex than the CSV would be, so it's already less of a loss than
> > using XML in that situation.
>
> You're presuming that there is no cost in having multiple syntaxes:
> different ones for different domains. But clearly there is _some_ cost.
Not in this argument. I do happen to be a fan of multiple interopable
syntaxes, but not of multiple models. I'd like my s-expression to be
expressible in memory as pointers to cons cells, in front of me as a nest of
brackets, and in a database as a compact bit string...
> > XML's gained a following, but the hype is waning already. It's lodged in
> > many niches but it's failed to change The Web as I see it, what it was
> > originally designed for... although XML with embedded stylesheets being
> > served over HTTP and displayed by browsers would have been a better Web
> > the improvement is justifiably marginal compared to the costs to vested
> > interests, so I'm not too surprised.
>
> The Web as we see it has stagnaged since Microsoft wiped out Netscape.
> But all of the most innovative emerging technologies on that front are
> XML-based: RSS, SVG, XUL, XForms. It is way too early to say that XML
> didn't change the Web. Give it five years.
It hasn't changed it yet, though, and the momentum to do so seems to be
waning; once I kept hearing people asking "Um, should we be using XML for
this stuff?" but now they're back to their HTML templates and (grumble) CGIs
with code and HTML entwined.
> > Now I don't think that would be *practical* with SOAP over HTTP! My ls
> > request ran over UDP packets with sizes of 100-200 bytes - one packet to
> > request, another packet to respond, on three-way handshakes or teardown
> > and no TCP overhead wasting time reordering the response packets to make
> > them come in the order the server sent
>
> I guess the thousands of people using WebFolders are all hallucinating.
> ;) I would guess that there are more people using those every day than
> NFS (but not as many as SMB -- yet).
That thing's just dynamic uploading and downloading of files via HTTP, isn't
it? When I run a program over NFS, it's fast because the file is demand paged
over the network, it doesn't have to wait until it's downloaded the whole
thing before it can get started! And any other file that is read
nonsequentially or partially will be unweildy too... and I suppose these Web
folders use HTTP to fetch directory information too (in case you hadn't
guessed, I've not had the privelege myself) so it'll hardly be nippy to
navigate, will it?
They might be using range requests to speed up reading, and perhaps you can
PUT to a range in a resource and have the server understand you, but even
then you still have no standard way to represent chmod, chown, and chgrp or
ACL changes!
I don't think HTTP is a usable file sharing protocol. Just imagine mounting
/var over HTTP as the mail spool for a failed-over mail cluster :-) And how
practical is an ls -lR over these Web Folders, hmm? I suspect it's no more
than a nice user interface for downloading and uploading files rather than
something you'd actually want to use as a filesystem.
> > (and I can submit requests without needing
> > to wait for the response to the last request, which I don't
> > think HTTP 1.1 allows but I may be wrong).
>
> That's what HTTP 1.1 pipelining is.
I was wrong then; good job I put that qualifier in :-) But how does it get
around the underlying problem in TCP? Does it open extra TCP connections to
handle parallel requests?
> > I guess that's part of what makes my blood boil about XML data and XML
> > for interprocess communication and all that; the reinvention of wheels,
> > with less care than the first time round :-(
>
> If it had been done right the first time around there would be no
> temptation to do it again. It surprises me that there are still people
> who don't "get" that schemas are an important part of the difference.
> Mixed content is another important part.
Schemas are hardly new to XML. Nothing in XML is new; you can't say that this
feature or that feature is critical. All you can say is that this
*combination* of things is critical, these features we left out that other
systems had are not useful because of X or would spoil the ensemble because
of Y.
> Of course, I'm not defending SOAP and the idea of something like NFS
> running on top of SOAP running on top of HTTP makes me cringe.
I know, don't worry, I don't doubt your sanity, just your reasoning :-)
>
> Paul Prescod
>
ABS
--
A city is like a large, complex, rabbit
- ARP
|