[
Lists Home |
Date Index |
Thread Index
]
At 9:39 AM -0500 1/15/02, Gavin Thomas Nicol wrote:
>On Tuesday 15 January 2002 08:26 am, Elliotte Rusty Harold wrote:
>> At 11:15 PM -0500 1/14/02, Gavin Thomas Nicol wrote:
>> >What happens if I don't
>> > a) read english
>>
>> 1. Ask a colleague who reads English
>> 2. Hire somebody to translate it into the language of choice
>> 3. Get a dictionary
>
>Let's change it to Maori then. Do you have 1) or 2)? How much would 2)
>cost?
>
If it were important to me I could get 2, though I'd probably try 3 first.
>The point is that XML can be as opaque as anything else, and that
>tags, in and of themselves, say little about overall semantics, and
>hardly anything about structure beyond encoding an attributed tree.
>
No, that misses point completely. The point is not whether XML *can*
be as opaque as anything else. It whether XML *is* as opaque as
anything else. In practice, XML *is* far less opaque than CSV and
similar formats. That's why it's important. And in practice tag names
do say something significant about the semantics of the document.
It's not everything, but not everything does not equal nothing.
>An attributed tree is admittedly a useful data structure, but not
>without some means for interpreting it.... and in that regard, XML is
>no better, and perhaps somewhat worse than CSV... because the signal
>to noise ratio is higher *if* the names are not intuitive to the
>interpreting entity.
>
The names can always be ignored if you desire to do so; but if you
choose to consider them, they are there. There is more information in
an XML document with the names intact than in the same document with
all the names stripped.
Your signal-to-noise analogy is fallacious. One of the defining
characteristics of noise is that it cannot be perfectly separated
from the signal. IN XML tags are very straight-forwardly separated
from the data.
>There *are* benefits to using XML well, and defining "largely
>interoperable" tag vocabularies (HTML). Those benefits spring not from
>XML, but rather, careful use thereof.
Careful use is good. But even careless use is likely to produce
significant benefits compared to untagged formats like CSV. For
example, here's some CSV data for you
9964.00, 72.58, 0.73
How meaningful is that? Indeed it has some meaning. With a little
effort, a little foreknowledge in the right domain, and a little luck
you can probably figure out what it is. However, the following has
more information:
<Index>
<price>9964.00<price>
<absolutechange>72.58</absolutechange>
<relativeChange>0.73 </relativeChange>
</Index>
No standard schema. Not a lot of thought. I just made that up
quickly. It doesn't even use consistent naming conventions. But if I
were faced with pages full of numbers I'd much rather have them in
the second format than the first, especially when I know from
experience that eventually there will be missing fields, the column
names will fail to line up with the data, and I will have to deal
with all the other problems that arise in CSV data. Not that these
problems can't occur in XML, but XML, unlike CSV, is fail-fast. I can
very easily set up my systems so they notify me immediately when
faced with bad data, rather than sometime later when I notice my
brokerage just blew several million dollars because some idiot
swapped the absolute change and the relative change in the Dow.
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| The XML Bible, 2nd Edition (Hungry Minds, 2001) |
| http://www.ibiblio.org/xml/books/bible2/ |
| http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://www.cafeaulait.org/ |
| Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ |
+----------------------------------+---------------------------------+
|