Lists Home |
Date Index |
Alaric Snell wrote:
> I guess one thing that bugs me is that a schema might be used to test a bit
> of code that writes out documents but not one that reads them. Somebody might
> have added a new element and then forgotten to update the schema.
In the areas where XML particularly excels, these sentences wouldn't
really even make sense. How do you "add an element to SVG" or HRML or
RSS and "forget to update the schema"? I mean you could update the prose
specification and forget to update the schema but that will be caught by
implementors pretty quickly.
Schemas are most valuable when the specification process is completely
disjoint from the schedule of any particular implementations.
> ... At one
> point with the data import format somebody had even allowed arbitrary
> elements in a certain context - data fields for a record were done with
> <fieldname>value</fieldname>, and when we moved away from a fixed data
> structure to an editable one in the database you could have any field name
> cropping up there and the type of the content would have to match a type
> pulled from our database :-/
You're talking about totally different kinds of applications than the
ones I'm talking about.
> Hey, there's a point in my position that we don't need harsh seperation
> between data interchange format descriptions and in-memory ones; why do we
> need a seperate notation for each? It's just a data structure; you still end
> up declaring that certain things appear inside certain things and all that;
It's the "all that" that's the rub. XML serialization schemas are based
upon regular language theory which goes back to Chomsky's theories of
natural language. That makes a certain amount of sense because markup
languages are _languages_ and like speech, they are linearized.
Whereas in-memory data structures are very different. XML includes all
of the features that can reasonably be extracted from regular language
theory to give you power in describing your serialization. Some people
want and need that power. Some don't.
> even if you decide you want different formats internally and externally in a
> given situation, it would be nice to have the raw input data coming in as
> something compatible with what you process internally, for the simple reason
> that your transformation probably doesn't want to rewrite EVERYTHING.
In the case where you want your in-memory and on-the-wire data
structures to be the same, there are tools that will convert between
them. That seems to me an appropriate way to handle a degenerate case.
You can also use something like XML-RPC or the SOAP encoding which or
> ...I'd rather write:
So use a tool that allows you to do that!
>>For you and me, yes. For the average business programmer? I disagree.
>>We're talking about the kind of people who spend most of their day in
> But THEY don't even want XML; they probably don't find wandering a DOM tree
> any more friendly than calling whatever passes for Perl's "pack" and "unpack"
> in VB. They are the people who want to just have magic serialisation from
> data structures to strings of bytes.
If they are web developers then they are VERY familiar with the DOM.
They eat DOMs for breakfast. Now they have DOM's not just for user
interface but for structured data. To many of them, that's a big
>>If it is running on YOUR SYSTEMS then it isn't deployed in the sense I
>>mean. I'm asking have you ever tried to deploy a protocol that would
>>have dozens of independent implementations and thousands of users?
>>That's _really hard_ and many good protocols never make the leap.
> That's purely a problem of adoption in the protocol marketplace, not
> difficulty of development.
I didn't say they were difficult to develop, I said they were difficult
to deploy. Part of that is because they often use idiosyncratic
syntaxes, operation names and addressing schemes. This makes
implementation more challenging.
> But I'm a little WG right now developing a protocol to replace IMAP,
They are replacing IMAP? I'm still waiting for IMAP to replace POP. ;)
> Nope, because it's the same model still, just implemented differently. From a
> linked list of C structs to the result of an SQL "SELECT * FROM
> PurchaseOrderLines where poNumber = <foo>" isn't a change of data structure,
> just a change of implementation, and indeed in SQL interfaces I've written
> for suitable dynamic languages where I can throw together a 'struct' type at
> runtime from the result of an SQL query, the linked list and the result set
> both support an interface like Java's Iterators since they are the same data
You're using a definition of data structure that is totally foreign to me.
>>So you're saying that you have information that XML isn't really
>>designed for and XML isn't really that helpful for it. Are you arguing
>>just for the pleasure of it?
> No, I'm arguing against the fact that you're saying that XML is suitable for
> far more than I think it is!
> Just to reverse positions, I see XML as useful for marking up text... but
> it's not well fitted for data.
Text is data. If you mean "tuple-structured data" then say so. I'd say
that XML is good for all sorts of hierarchical, recursive data with links.
> <price currency="UKP" unit="kg">2.50</price>
> ...without enough stopping to think if it's a good idea.
You can't do real publishing without handling this case. This is the
core of what complicated technical publishing is all about. You're
describing a standard catalog! The same goes for links. Complex
technical publishing has to handle it.
> Whose idea *was* it to use XML for data interchange?
There was never a real boundary between documents and data. Therefore
nobody had the "idea". They just kept applying it to a wider and wider
range of problems. Arguably Microsoft _popularized_ the idea that XML
was even more relevant to data than to documents.
> ... The W3C seems to disavow
> responsibility in the first paragraph of that introduction. But somebody
> somehwere made a mental leap from "styling a human-readable document" to
> "data transfer". There are gray areas between the two, since an invoice might
> well be considered to need to be both a readable document and a piece of
> data, but nobody seems to be putting <?xml-stylesheet?> PIs in their XML
> purchase orders, do they?
No, because they are sending them over SOAP which makes the PI kind of
useless (and in fact illegal). I certainly hope that as REST catches on,
this will become common.
Today, there are a variety of REST application that allow you to view
"data" as rendered documents with stylesheets. Xoomle, Meerkat and the
Amazon API come to mind. In fact, the idea that data should be
straight-forwardly renderable has almost 100% penetration in the REST
world and 0% penetration in the SOAP world.
I still see it as part of the promise of XML that invoices and other
structured data will be accessed through URIs, rendered through
stylesheets, displayed as documents and that machines will use those
same URIs to manipulate those same XML documents for automated processes.