OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SAX LexicalHandler::comment issue

On Sat, 07 Jul 2001, John Cowan wrote:
> Lars Marius Garshol scripsit:
> > I've always been very curious to know what the rationale for that
> > [i.e. comment information items]
> > is. To me, a major part of what the infoset was supposed to be about
> > was to draw the line between logically significant information and
> > purely lexical information. The current version fails to achieve this.
> It's a compromise.   Some people felt that significant information
> was being kept in comments.  

But that's what comments are *for*...in any language.
You mean you'd prefer to debug other people's C++ or
Perl or LaTeX or FORTRAN *without* comments? 

Er, gosh, you mean there are people who write 
uncommented code :-) <gulp/>

> If I had my way, they'd be gone.

Comments are a convenient way of adding information which is
not part of the view you wish the recipient to have of the
document (what used to be called THE document: that is, the
text in element content and attribute values). To remove them
would simply make life harder for users who wish to add their
metadata in this way and utilise the fact that it gets stripped
from the output automatically. 

If you want a language without comments (and someone wanted
attributed to go away as well), fine...but let's not try to
pretend that it could be XML: let's call it YML or ZML or
something else. There's definitely a market for it,
especially if you remove EMPTY elements as well, because then
we'd be back to the original concept of simplicity: nested
element markup only. While we're at it, remove the concept of
mixed content as well, like the EuroMath Article DTD did, so
then parsing could be done in 2-3 lines of Perl. *Then* you
can have the speed someone was looking for; in effect you are
providing what most newcomers assume XML actually was meant to
be -- a simplified form of SGML. This would still provide the
concept of hierarchy, though, which a lot of people find gets
in the way, and having to terminate every element as well as 
begin it is a pain: why not label the data at the start of
the line and let the end of the line terminate it, and use
something like a colon or a period to delimit the tag from the

But don't do this and pretend that it is still XML, because
it won't be.