[
Lists Home |
Date Index |
Thread Index
]
W. E. Perry wrote:
> For those
> cases I must care most about, ASN.1 and abstract syntax generally are incapable
> of a precise and unambiguous encoding of inherent fundamental textual properties
> without resorting to a priori agreements between the creator and the consumer of
> a document, and from the very nature of document processing such agreements are
> unreliable and negligible.
Hmm? How do you mean?
Taking the poem example - you say that the rhythm of the poem from line
to line is important, and thus the 'word wrapping' and whitespace are
significant, right?
Well, a poem as a Unicode string in an ASN.1 encoding will preserve
whitespace; since whitespace inside a string in an ASN.1 value is ALWAYS
significant (unlike whitespace in strings in XML, which is sometimes
used just for indentation of elements and suchlike).
From what I know of most poetry, typeface is insignificant, but
whitespace is, yes?
Well, you can go for the simple:
Poem ::= BMPString
...and use characters like space and newline within the string. Or you
might opt for the more structured:
Poem ::= SEQUENCE OF Line
Line ::= CHOICE {
text SEQUENCE {
indent INTEGER,
text BMPString
},
blank NULL
}
...meaning that a poem is a sequence of lines, each line either being a
blank or a piece of text indented by a given number of characters
As I say, I'm no poet, so these are just a few stabs at it. But I don't
see how XML can be any better at this than ASN.1 just because it is a
textual syntax? If, as you say, the important properties of the poem
cannot be extracted to 'metadata' usefully, then surely the poem in XML
is best represented as the textual content of an element with whitesapce
preserved, right? Which is just what ASN.1 strings provide, no?
I have a creeping feeling I've perhaps missed your point ;-)
> This is the fundamental distinction of document and data to which all
> permathreads return, but which I think the recent championing of ASN.1 on
> xml-dev gives us a useful new perspective on. Can't we now assert that what is
> fundamentally data is that of which the most salient properties are abstract?
Interesting definition...
> That is, different lexical manifestations are understood by both their creator
> and their consumer to be secondary to some abstract underlying platonic reality
> and, conversely, the physical qualities which might be inherent in a particular
> lexical manifestation are understood by both creator and consumer to be spurious
> and negligible. The content of documents, on the other hand, most specifically
> includes, often as the chief concern, those characteristics which come with the
> lexical manifestation and cannot be purged from the physical realization.
...but I don't think many things would meet that definition of document.
At what layer do you call it 'physical realization'? I gather by
inference that you're implying that the abstract character model XML is
based on will suffice as a 'physical realisation', but why is that any
more physical than an abstract value model? I would consider a truly
'physical' realisation to be electrical or optical or magnetic
waveforms, or holes in a punched card or tape... Not a very useful
definition, though. Most information processed by computers is really
defined in terms of streams of bytes, abstracting out the electrical,
magnetic, or whatnot implementations of those bytes, at least.
I think we can probably agree that an XML document in free space as an
AX.25 transmission can be equivelant to one on disk if they are the same
bytes, and even if they are the same characters even if one is in UTF-8
and the other in ISO-8859-1 (the same characters, that is, apart from a
different encoding name in the declaration!).
>
> Walter Perry
>
ABS
|