[
Lists Home |
Date Index |
Thread Index
]
On Jan 13, 2004, at 9:24 PM, Michael Champion wrote:
>
> On Jan 13, 2004, at 8:57 PM, Dare Obasanjo wrote:
>>
>> I work on RSS in my free time. The most common well-formedness errors
>> are documents with incorrect encodings or documents that use HTML
>> entities without a reference to the HTML DTD. How exactly do you
>> propose
>> XML 1.x fix these problems?
>>
> I don't; I didn't take the time
I've thought about this more and read some of the other responses. I
guess where I come down is that XML per se is fine for what it does and
purports to do here; we might want to consider different design
patterns, if you will, for using XML in applications.
The conventional approach is to assume that the XML document is in a
file, it knows it's encoding, and is ready to face a draconian parser
and possibly a validator as well. Some other application put that
document in the file, and it had a moral responsibility to do its best
to ensure that the XML is well formed (and possibly valid).
If there is a DTD in the file or referenced from the file, it does a
*lot*-- define internal entities, external entities, vocabularies, and
structure models. That makes XML 1.0 processing sortof a Big Bang that
either produces a fully formed Infoset, or an error message. It leads
to the situation the Atom people are in, where there apparently is a
stark choice between doing the Right Thing according to the spec or
keeping the customers happy by aggregating the information they asked
to be aggregated without annoying them about geeky details that they
care nothing about.
An alternative is to think of a processing *pipeline* connecting a data
source (which may or may not be XML) through a number of processing
steps that eventually lead to an Infoset that meets some vocabulary and
content constraints. This provides any number of steps with which to
adapt, cleanup, transform, and *then* parse and validate XML text. The
Atom people who want to be liberal can simply add a step to their
processing pipeline that does the sort of fixup that we've talked about
-- make sure special characters are escaped properly, fix up the
encoding, maybe even normalize escaped HTML into well-formed XHTML by
running it thru tidy. That would be a service that would plug into the
pipeline (as a library call, a SOAP invocation, a REST resource, or
whatever) and not something that necessarily affected the rest of the
application architecture.
The best statement I know of this point of view is Uche Ogbuji's
"Serenity Through Markup" piece
http://www.adtmag.com/article.asp?id=6758 " As documents move between
systems, trust the remote system's ability to interpret the document to
meet its own local needs. This is known as the principle of loose
coupling, and is reminiscent of the idea of late binding in programming
theory. In the technology sphere, the idea is that an XML document used
in a transaction between multiple systems need not always build in all
possible aspects of the combined systems. Source systems design
documents to meet their needs, while the target system interprets them
to its own needs. ... [This can be done with] pipelines of data, where
a document starts out in one form and ends up in one suited for a
different environment." Sean McGrath has written a lot about the
pipeline approach too, but all I can find are PPT presentations. Do
you have a good link (if you're reading, Sean)?
|