OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A simple guy with a simple problem

At 09:58 AM 3/15/01 -0800, Joe English wrote:

>Sean McGrath wrote:
> > I am stunned at the number of people who have suggested
> > lexical processing to solve this problem. [...]
>Well, the problem was incompletely specified.
>(This goes without saying of course: the problem
>is *always* incompletely specified).


>If "change 'STUFF' to 'stuff'" refers to the transfer
>syntax, then lexical processing is the correct solution.
>If it refers to the Infoset, then infoset processing
>is the correct solution.  Bob needs to take this
>up with his manager to clarify the requirements.

transfer syntax? infoset? All Bobs manager knows
is that valid XML comes in, some local processing
should happen and then XML should go out. Thats
all the contracts say:-)

> > To those who suggested using SAX, I suggest you fire
> > up your text editor and try it yourself. As a service
> > to the XML community, I suggest you then report
> > back with what you discovered.
>OK; what do you expect that we'll discover?

I expect to discover that certain interesting things
have happened to the output document:
         The internal document type declaration
         presence or absence of defaulted attributes depending
         on whether or not the parser underlying SAX processes
         the external subset
         The prolog PI may have walked
         The CDATA sections may have walked
         And so on.

As I said in a reply to Martin von Loewis, it boils down
to the issue of "sameness" of the output XML in the
parts that should be unaffected by the (admittedly
under-specified! STUFF processing).

I worry about this sort of thing and I guess I'm interested
in discovering the extent to which people have
thought about this and solved it, or thought about
it and ignored it or perhaps not thought about it and
are getting worried.

I come from a long line of worriers. I also come from
a long line of SGML heads who know the sort
of fun and games that can happen in downstream
processing of markup filtered through a "logical"
infoset produced by parsing. (I cut my teeth with
your very own COST system).

I have learned the hard way that insisting on
round-trippable XML/SGML as a base notation
for interchange pays dividends. My enthusiasm
for the SML work was in large part down
to a recognition in this ground that round-trippability
is important.