[
Lists Home |
Date Index |
Thread Index
]
dareo@microsoft.com (Dare Obasanjo) writes:
>Same here. I wonder what Simon considered food for thought.
Notes on markup practice, mostly. To spare everyone else the slogging:
> SGML is a good idea when the markup overhead is less than 2%. Even
> attributes is a good idea when the textual element contents is the
> "real meat" of the document and attributes only aid processing, so
> that the printed version of a fully marked-up document has the same
> characters as the document sans tags. Explicit end-tags is a good
> idea when the distance between start- and end-tag is more than the
> 20-line terminal the document is typed on. Minimization is a good
> idea in an already sparsely tagged document, both because tags are
> hard to keep track of and because clusters of tags are so intrusive.
> Character entities is a good idea when your entire character set is
> EBCDIC or ASCII. Validating the input prior to processing is a good
> idea when processing would take minutes, if not hours, and consume
> costly resources, only to abend. SGML had an important potential in
> its ability to let the information survive changes in processing
> equipment or software where its predecessors clearly failed.
> ...
> We are clearly not at the stage of human development
> where writers are willing to accept the burden of communicating to
> the machine what they are thinking. One has to marvel at the wide
> acceptance of our existing punctuation marks and the sociology of
> their acceptance. "Tagging" text for semantic constructs that the
> human mind is able to discern from context must be millennia off.
And in a totally different direction:
> But the one thing I would change the most from a markup language
> suitable for marking up the incidental instruction to a type-setter
> to the data representation language suitable for the "market" that
> XML wants, is to go for a binary representation. The reasons for
> /not/ going binary when SGML competed with ODA have been reversed:
> When information should survive changes in the software, it was an
> important decision to make the data format verbose enough that it
> was easy to implement a processor for it and that processors could
> liberally accept what other processors conservatively produced, but
> now that the data formats that employ XML are so easily changed
> that the software can no longer keep up with it, we need to slam on
> the breaks and tell the redefiners to curb their enthusiasm, get it
> right before they share their experiments with the world, and show
> some respect for their users. One way to do that is to increase the
> cost of changes to implementations without sacrificing readability
> and without making the data format more "brittle", by going binary.
> Our information infrastructure has become so much better that the
> nature of optimization for survivability has changed qualitatively.
> The question of what we humans need to read and write no longer has
> any bearing on what the computers need to work with. One of the
> most heinous crimes against computing machinery is therefore to
> force them to parse XML when all they want is the binary data.
I'll certainly confess that much of my interest stems from Naggum's
infamous reputation from SGML days, though.
--
Simon St.Laurent
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!
http://simonstl.com -- http://monasticxml.org
|