Lists Home |
Date Index |
The notion of "smart ASCII" as a way of creating structured documents
whose conventions allow it to be easily transformed to XML hit me
twice yesterday, once in the day job (a very significant media
company uses this to achieve interoperability between diverse
authoring systems) and once when reading
"For the most part, "smart ASCII" is what you have been writing for
years if you use e-mail and the Usenet. ...Asterisks surround bold
or heavily emphasized phrases; dashes surround italicized or lightly
emphasized phrases; underscores introduce Book or Series Titles. ...
They are all very quick to type.
Anything that looks like a URL is turned into a link automatically. A
fairly simple special format with curly braces and the ALT text
before a colon is used to insert images, such as charts and graphs."
There's a script included (a few hundred lines of well-commented
Python) to do the conversion to XML.
I'm of two minds on this ... on one hand it sounds like a return to
the Bad Old Days and will require continuing human intervention to
cleanup the inevitable not-so-smart ASCII before it can be converted
to XML rather than one-time human intervention to teach markup skills
to the authors. On the other, it leverages what humans do best --
deal with patterns, templates, informal conventions -- and lets
computers do what they do best -- generate and parse formal syntaxes,
putting XML further behind the scenes, perhaps where it belongs.
I'd be interested in hearing others' reactions to this IBM
DeveloperWorks article and about actual experiences in the field. My
guess is that is makes a LOT of sense for simple documents (memos,
weblogs, simple articles) and virtually no sense for serious
technical documents where the whole point of SGML/XML is to catch the
structural errors as early and automatically as possible even if this
requires some pain on the part of the authors. But how big is the
middle ground, and when does it pay to make authors switch over to
XML? In other words, should XML stay in the background, or is it time
for the end-users to add basic markup knowledge to their repertoire