OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A simple guy with a simple problem

Sean McGrath <sean@digitome.com> writes:

> I am stunned at the number of people who have suggested lexical
> processing to solve this problem. And worse, in one case, a
> suggestion that I was trolling with this simple example!

After thinking about this class of problem - not just your specific
example - the problem seems flawed from the start. The fact that your
problem is difficult to solve is one of the aspects of XML I
appreciate. That is: the input byte stream is irrelevant; the
processed data reported to the application is consistent and
paramount. It's a many-to-one relationship; a nearly-infinite number
of input streams could produce the same canonical processed data.

The benefit is in creative freedom for the document author. This
benefit is arguably most relevant to human authors working in, say,
DocBook or the like. If I choose to use an internal entity for a word,
I can. If you are less "macro inclined," you can skip it and repeat
the word everywhere. If either of us change our mind about how we wish
to type that XML, it won't matter to the consuming application.

Yes, this makes it impossible to use a SAX-style parser (meaning, with
the level of information reported by SAX) to write the filter Bob
needs, but that's fine with me. If we could write such a filter, it
would mean that the freedom-of-authorship benefit was compromised.

This debate reminds of using C/C++ compilers with the preprocessor,
where the relevant split in processing levels is more
apparent. Deciding whether to filter source before or after the
preprocessor depends upon whether you're trying toy with pre-processor
directives. Finally, even if you could get inside the consuming
compiler like you can with an XML processor, you could never re-create
that input source file.

Steven E. Harris        :: seh@speakeasy.org
GnuPG                   :: 0x70248E67