[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: A simple guy with a simple problem
- From: "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
- To: sean@digitome.com
- Date: Thu, 15 Mar 2001 21:59:51 +0100
> >I think I'm missing your point. The document you got afterwards is the
> >same as it was before. Is that not what you wanted?
>
> Therein lies the nub of the issue, the words "the same".
What ultimately matters is: Are they the same when processed with by
the applications at Bob's company? or, if Bob's company sends them to
somebody else: Do they still follow the application-level protocol
that was set between Bob's company and that somebody else?
Bob did not explain what the application was, or how the protocol was
defined (nor did he say exactly why it is desirable to replace STUFF
with stuff). Most likely, what a SAX parser would do would not break
the application.
> Lexical approach: Leaves lots of the document "the same" but it
> is very difficult to get the processing right in the face of all
> the things that are hidden beneath the term "DTD valid XML".
> foo1.xml is an example of these gotchas.
Clearly, if you do processing, the result *won't* be the same,
lexically - or else you would not need the processing. This is the
place where Bob's description of the problem comes into play: "all
occurrences of STUFF" implies "in text and attributes", which means
that rewriting CDATA sections is probably ok. Again, without knowing
what the application is, you cannot say for sure - but if CDATA
sections matter, I'd argue that the application is broken.
> Parser based approach: A lot easier to get the processing
> right but fiendishly difficult to leave unprocessed parts of
> the document "the same" in the face of all the things
> hidden beneath the term "DTD valid XML".
> The output of a SAX or XSLT transform of foo1 is an example
> of the problem.
I lost track what foo1 is, here. Why is the output of the parser an
example? My point is that it *does* leave "unprocessed parts" the same
(even though they change lexically). The task was to globally replace
STUFF, so there are essentially no "unprocessed parts".
I'd suggest a more pragmatic route: If Bob is reasonable confident
that documents that will appear in the context of his applications are
not "broken" by the processing it does, then the processing is
fine. There may be cases where it can be proven that global
search-and-replace will do no harm (e.g. if the negotiated protocol
implies restrictions on the use of XML).
If you really need an environment where more reliable data
transformations are possible, you might need to look for alternatives
to XML :-)
Regards,
Martin