OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Keeping ISO 8879 Alive (was RE: [xml-dev] Markup perspective not code)

[ Lists Home | Date Index | Thread Index ]

8/2/2002 9:48:40 AM, "Bullard, Claude L (Len)" <clbullar@ingr.com> wrote:


>Keep ISO 8879 alive.  It is ISO that guarantees that markup 
>is the property of the commons. 

I'm not sure I agree with Len's characterization of the W3C or 
the intelligence quotient of those who think that XML should 
move beyond its SGML roots <grin>, but I do agree with the importance
of keeping ISO 8879 alive.  In fact, I'm beginning to think
that both SGML and XML need to be living specs, and some of us need
to simply choose which community we belong in.  (Many of us will
belong to both, and that's fine too).

There certainly is a "documents vs data" cleavage, but
IMHO the more crucial differentiator between the two communities
is whether they, deep down inside, think of XML as "markup" or
"infoset".  Markup people can be  minimalists who
want to keep things very simple and monastical, or they can be 
hard core SGML geeks who know how to exploit the bells and whistles
of 8879.  But at heart they think of "XML" as *text* to be 
written and marked up by a human and crunched by a variety of 
tools that can utilize the markup to produce material
that will be consumed by humans.  They seem to prefer processing
tools such as SAX that let them stay close to the syntax.
They certainly use XSLT, but tend to think of it as operating
at the syntax level, I suspect, and get irritated when it throws
away their CDATA sections.  They care deeply about syntactical
details and changes (XML 1.1, the global/local attribute namespace
thread, etc.) because they HAVE to care. They have certainly
benefitted from the spoils of the "XML Revolution" but would
have been just as happy if the DOM, InfoSet, XQuery, etc. abstractions
away from the syntax had never been invented.


"Infoset" people can also be minimalists, or they
may have stopped worrying and learned to love the PSVI, the XQuery
type system, etc.  They tend to be agnostic about how some
input stream was produced, whether by a person, a program,
a serialization of some object or database, etc.  At heart
they think of "XML" as some text or data that can be readily
mapped to one flavor or another of the XML data model (of
which  there are at least 4 by my count: W3C InfoSet,
DOM, XPath, and XQuery... maybe JDOM counts as another, I'm not
sure). They tend to prefer processing tools that abstract 
the structure away from the syntax (e.g.
DOM/JDOM, XPath, XQuery).  They also use XSLT extensively, but
conceieve of it as a data-model to data-model transformer and
may combine XSLT processing with DOM-ish programming.  
They are perfectly happy with the redeformulation 
of specs such as SOAP from a syntax definition
to an InfoSet definition, seeing the potential for specialized
serializations as greatly outweighing the problems of deviating
from the One True Syntax, because they really think of syntax
as a detail that parsers and serializers worry about. 

So, one way forward to avoiding endless, fruitless debates on
XML-DEV, IMHO, is to accept the fact that we are not one
community anymore.  We have a lot in common -- the XML 1.0
syntax as a "canonical form", for example ... and XSLT.  As I
said, many of us can happily live in either camp, switching
from one to another as the job requires.  But sometimes we
just must agree to disagree, proudly stating "I am a
syntax person" -- I have to care, so quit telling me I shouldn't",
or "I am an infoset person" -- I don't give a rat's patootie
about details of syntax, so quit trying to make me 
feel guilty about it.

Another way forward, which I doubt if many people will agree with,
might be refactor things along SGML "markup for authors" and
XML "infoset for programmers" lines.  Agree on a basic syntax
for XML 2.0 that removes most the stuff that the infoset throws away
and causes the DOM (which basically tries to live in both the
syntax and InfoSet worlds) fits, such as DTDs, entities, entity references,
CDATA sections.  That's not to say that people should stop using
entities and CDATA sections, just to say that they "properly" belong in
the SGML world where "syntax sugar" is respected and supported.  Or 
if it is too confusing and politically unworkable to consign this stuff
to the ISO 8879 community, separate out an XML "preprocessor" that 
understands all the author-friendly syntax sugar coating but isolates
it from the hard-core element/attribute/text parsing into an InfoSet.









 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS