xml-dev - Re: [xml-dev] "Smart ASCII" -> XML for authoring?

Re: [xml-dev] "Smart ASCII" -> XML for authoring?

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] "Smart ASCII" -> XML for authoring?
From: Michael Smith <smith@xml-doc.org>
Date: Mon, 4 Feb 2002 07:19:13 -0600
In-reply-to: Mike Champion's message of "Sat, 02 Feb 2002 13:11:15 -0500"
References: <PLYXVRWRE0GECBRO3VQM8273GCPNMGSP.3c5c2bc3@MChamp>
User-agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/21.1

Mike Champion <mc@xegesis.org> writes:

> The notion of "smart ASCII" as a way of creating structured documents 
> whose conventions allow it to be easily transformed to XML hit me 
> twice yesterday, once in the day job (a very significant media 
> company uses this to achieve interoperability between diverse 
> authoring systems) and once when reading 
> http://www-106.ibm.com/developerworks/xml/library/x-tipt2dw.html?
> open&l=136,t=grx,p=mod :
> "For the most part, "smart ASCII" is what you have been writing for 
> years if you use e-mail and the Usenet. ...Asterisks surround  bold 
> or heavily emphasized phrases; dashes surround italicized or lightly 
> emphasized phrases; underscores introduce Book or Series Titles. ... 

Sounds a lot like the "stucturedtext" that's already part of the Zope
distribution:

  http://cvs.zope.org/Zope2/lib/python/StructuredText/

and the "reStructuredText" that's been proposed as an update/
alternative to that --

  http://structuredtext.sourceforge.net/docs/quickref.html

A while back, Peter Ring posted a note to the xml-doc list mentioning
Structuredtext, reStructuredText, and another system/tool, APT:

  http://www.xmlmind.com/aptconvert.html

(And before all that, there was Setext...)

It's funny that while the common goal is clear -- a simple syntax for
marking plain text for conversion to XML, HTML, etc., hopefully based
on the same simple structural cues we all already use in mail etc. --
they disagree on what a lot of that simple syntax ought to be.

There are straightforward things like numbered/bulleted lists and
blockquotes, but it seems like even basic HTML or XML documents
normally include markup for things, like "literal" text, that we don't
have plain-text conventions for, as well as things that we don't
usually include in e-mail messages at all, like titled inline links.

So lacking existing conventions, the systems (no surprise) all offer
conflicting "simple" ways to mark up those kinds of things.

> [...]
> I'm of two minds on this ... on one hand it sounds like a return to 
> the Bad Old Days and will require continuing human intervention to 
> cleanup the inevitable not-so-smart ASCII before it can be converted 
> to XML rather than one-time human intervention to teach markup skills 
> to the authors.

It might also be a return to the Bad Old Days of authoring-mostly-
with-presentation-in-mind, because these "smart ASCII" systems seem
mostly like ways to mark up plain text for presentational purposes,
not ways to add real structure or logical distinctions to the content.

In terms of the set of "logical units" of information (to borrow a
term from David Megginson's "Structuring XML Documents") that "smart
ASCII" can provide -- asterisk-delimited "bold" unit, a series-of-
numbered-paragraphs list unit, etc. -- it's a very small set, actually
just things for which there are presentational distinctions.

I guess that's because there are only so many non-presentational,
logical distinctions that be represented unambiguously in simple
plain-text markup. Because of that, I think many authors would find
that once they tried to mark up anything but the most basic documents,
the set of logical distinctions they can make is just too limited.

> On the other, it leverages what humans do best -- deal with
> patterns, templates, informal conventions -- and lets computers do
> what they do best -- generate and parse formal syntaxes, putting XML
> further behind the scenes, perhaps where it belongs.

I guess XML should rightly be behind the scenes, but if the goal is to
author structured documents, it seems like the structure and logical
distinctions ought to be up front and umambiguous where the author can
clearly see them, regardless of whether the actual XML markup is.

> I'd be interested in hearing others' reactions to this IBM 
> DeveloperWorks article and about actual experiences in the field.  My 
> guess is that is makes a LOT of sense for simple documents (memos, 
> weblogs, simple articles) and virtually no sense for serious 
> technical documents where the whole point of SGML/XML is to catch the 
> structural errors as early and automatically as possible even if this 
> requires some pain on the part of the authors.

To put it another way, I think it might be that you can _only_ produce
simple documents with "smart ASCII"/plain text markup (if it's truly
kept simple). It can't practically provide real structure or the
variety of logical distinctions you need in technical documents.

It seems like it keeps things so simple that structural errors aren't
really a problem -- it limits authors to such a basic structure that
it'd be hard for them produce anything that would convert to invalid
HTML or DocBook or whatever the target format is supposed to be.

> But how big is the middle ground, and when does it pay to make
> authors switch over to XML? In other words, should XML stay in the
> background, or is it time for the end-users to add basic markup
> knowledge to their repertoire of skills?

I think it pays to switch over when you need for authors to produce
something with real structure, in a logical markup vocabulary, not
just something that will look good in a browser after it's converted.

I think authors writing technical documentation ought to learn and use
a common markup vocabulary appropriate to the tech field they're in.
These days, because XML happens to be the format that instances of
technical markup vocabularies are written with, I guess many document
authors believe they're being asked to learn XML, when maybe the real
goal is get more people learning and using richer markup vocabularies.

And I don't see how a vocabulary with any real richness or complexity
could be represented in simple "smart ASCII". So I guess if learning
an appropriate logical markup vocabulary also means moving beyond the
limited set of informal presentational "markup" conventions in simple
texts such as e-mail messages, it's time for them make the move.

-- 
Michael Smith, Tokyo, Japan    http://sideshowbarker.net
&#x30DE;&#x30A4;&#x30AF;

Of Bliss the Codes are few -

  --Emily Dickinson (1586)     http://www.logopoeia.com/ed/

Follow-Ups:
- Re: [xml-dev] "Smart ASCII" -> XML for authoring?
  - From: Marcus Carr <mrc@allette.com.au>
- Re: [xml-dev] "Smart ASCII" -> XML for authoring?
  - From: John Cowan <jcowan@reutershealth.com>

References:
- "Smart ASCII" -> XML for authoring?
  - From: Mike Champion <mc@xegesis.org>

Prev by Date: Re: [xml-dev] Solving current schema problems with DSDs
Next by Date: Re: [xml-dev] XPointer crisis
Previous by thread: RE: [xml-dev] "Smart ASCII" -> XML for authoring?
Next by thread: Re: [xml-dev] "Smart ASCII" -> XML for authoring?
Index(es):
- Date
- Thread