Lists Home |
Date Index |
Al B. Snell wrote:
> On Fri, 25 Oct 2002, Paul Prescod wrote:
>>>><!ELEMENT purchaseOrder (buyer, seller, ...)>
>>>That agrees nothing!
>>It agrees that the buyer precedes the seller and both go within the
> It *states* that, doesn't meen anyone *agrees* :-)
> But it doesn't say whether the buyer or seller are denoted as URIs,
> numeric ids, or string names, or whatever...
A full Schema can certainly differentiate between URIs, numeric IDs and
strings. And even DTDs can do it with DT4DTD.
> ... the point I'm getting at is
> that you need more than just a DTD.
I've been around for a long time. I know all about it. ;)
> ... It's nice to have a standard way of
> writing parts of the standard, but you still need to write up a lot of
> other stuff about the meanings of things and so on.
So? The fact that there is a formalism for some of it is a clear
advantage of XML over formats that lack it. XML takes one of the tricky
parts of the problem of describing and implementing complex,
hierarchical, recursive data structures and makes it much easier.
>>So massive, in fact, that the XML project is within the means of the
>>average business programmer and the binary project is not.
> No way. Have you ever looked at a spec for a binary file format? Most of
> the ones I've deal with have taken a few hours to bang out an
> implementation of (except TIFF; implementations of TIFF are never
For you and me, yes. For the average business programmer? I disagree.
We're talking about the kind of people who spend most of their day in
>>Have you ever tried to deploy a new Internet protocol?
> Actually, yes :-)
>>It is near impossible. That's why there are so few widely deployed ones.
> No it's not... I've got quite a few custom protocols I put together
> lurking around my systems.
If it is running on YOUR SYSTEMS then it isn't deployed in the sense I
mean. I'm asking have you ever tried to deploy a protocol that would
have dozens of independent implementations and thousands of users?
That's _really hard_ and many good protocols never make the leap.
> ...but I've also produced a few RPC protocols. Let's see if any of them
> are lying around... hmmm... not handy but take a look in
> /usr/include/rpcsvc on a Unix box for a few.
I turn off most of the RPC protocols on any boxes I maintain.
>>Insofar as people buy products in part for their performance, the answer
>>is definately YES. In particular, I find it absurd that you would argue
>>that SAP and Quickbooks should use the same data structures despite the
>>fact that one runs relational-backed enterprises and the other
>>Windows-hosted small businesses. SAP _could not_ get away with using the
>>same datastructures that QuickBooks does.
> Why not? QB could embed a small SQL server - you can get in-memory SQL
> servers - and use an identical table layout...
To me, that's just silly. I don't think it is fruitful to continue this
line of argument. Everyone else I've ever spoken to understands
implicitly that different consumers of information need to build
specialized data structures for representing that information in-memory.
You are welcome to believe that there is one true data structure for
each data type.
> ... if there are enough
> differences between the small business and large business *models* then
> you're comparing apples and oranges anyway.
Of course there are huge differences between the large business and
small business models. But they have to *interchange data*. They both
need to deal with purchase orders.
> But it's still a list of objects, perhaps with a semantic tree such as an
> object grouping / contaiment hierarchy and maybe with layers. Your in
> memory structure *has* to have that or else it's discarding information
> it'll need when it comes to saving the file again (dedicated readers that
> know they only need a subset of the information are a different matter,
> though). It may overlay that tree with a lookup index, but that tree will
> still be there...
There can be many lossless isomorphic data structures optimized for
different access modes. You yourself said that Quickbooks could move
from whatever format it uses today to a SQL/relational representation of
the same information. Presumably you didn't mean that they should lose
> 1) Vector graphics, although not my speciality
> What is the data model? Usually some kind of space partition tree with the
> leaf nodes being one of a set of primitives. The tree can either be
Okay, so you admit the possibility of _choice_.
> 2) Plain text
> The only model for this other than a list of characters that I've seen is
> a list of lines, each of which is a list of characters - and a slightly
There are also various paging and mapping strategies... Nevertheless,
you admit the possibility of choice.
> 3) Bitmapped images
> These always come down to 'some metadata' and 'a 2D array of pixel
> values'. The most variation is in the metadata; applications will tend to
> have a native model which is identical to the metadata system of their
> favourite format and map other formats in and out of that.
> 4) A table of information, SQL stylee
> There's a fair few indexing schemes that can be applied here, but it's
> still a table; an ordered multiset of tuples. (a pure relation in the
> mathematical sense is an unordered set of tuples).
"Table" is not a datastructure. There are many implementations of tables.
>>In my experience it is painful and obfuscatory to use CSV for
>>hierarchical or linked information. But if you and your customers enjoy
>>it, then I'm glad you're using what works for you.
> Not that much information is hierarchical, certainly by bulk...
So you're saying that you have information that XML isn't really
designed for and XML isn't really that helpful for it. Are you arguing
just for the pleasure of it?
> We're going to add a more hierarchical structure in future (to allow some
> fields to contain lists and tables); the jury's still out on the details
> of that for interchange, but XML probably still won't be a great contendor
> since we'd ideally not have to change EVERYTHING about the file format
> for a little thing like that. For now we'll probably go for something
And what about recursive hiearchies? You can keep hacking the CSV format
but eventually you'll reach a point of diminishing returns.