[
Lists Home |
Date Index |
Thread Index
]
The problem with Paul's reply that although
experientially relevant, it isn't formally
true in any sense nor do you or Paul say it
is, but that is what is being asked for.
If data vs document were
true polarities, one might make a case for
them, but they aren't because any document
can have a mixture of features. Really, the
terms say more about how the input shapes
the markup in the output, and then only weakly
and usually in context of the processor.
I suppose the best one can say is 'oriented'
but not 'determined'. Otherwise, it is a polar
distinction that should go away.
One at a time:
>"Document" XML is used to mark up narrative text
>intended to be read by humans; "data" XML is used to
>exchange database records intended to be processed by
>machines.
If I pull the tables out of an HTML file, I can give
them to the machine without much modification. That
is why applications like Access can use them as data
sources. Again, the shape of the markup is 'oriented'
by the source or destination.
>Instances of "document" XML are readable without the
>markup; instances of "data" XML are meaningless
>without the markup.
Not true in any sense. It depends on the tag names
to a some extent and on the background knowledge of
the human in any extent.
>"Document" XML generally allows mixed content;
>"data" XML generally does not
Close as close gets but again, a habit of the
processor, not the markup.
>The order of sub-elements almost always matters in
>document-oriented XML; in data-oriented XML it
>generally matters only for elements specifically
>identified as "lists" or something similar.
This is revealing but of what? A relational database
doesn't use ordering for addressing. A technical
manual can be written to be read by name or topical
addressing, but isn't always a strictly orderered read.
A novel typically is James Joyce not withstanding. A
haiku is.
>Document-oriented applications can generally deal
>with unknown markup by removing the markup and keeping
>the content; data-oriented applications generally deal
>with unknown markup by ignoring the unknown markup and
>its content.
Which is a binding or coupling property of the markup
to the application processor.
>[quoting directly from Prescod] "Data-oriented
>systems tend to prefer object types to be detectable
>independent of context (thus namespaces) whereas
>document processing is typically done top-down
>recursively so relying on context is natural."
Again, a property of the implementation. A human
readable document can be strongly typed which is
ontological.
The categories have so many holes, again, it is a
distinction without much value apart from the
context of the processor. So the distinction
I look for is how independent of the processor
can the markup enable the content to be and
still have semantic value.
len
|