OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Postel's law, exceptions

[ Lists Home | Date Index | Thread Index ]


On Jan 13, 2004, at 9:24 PM, Michael Champion wrote:

>
> On Jan 13, 2004, at 8:57 PM, Dare Obasanjo wrote:
>>
>> I work on RSS in my free time. The most common well-formedness errors
>> are documents with incorrect encodings or documents that use HTML
>> entities without a reference to the HTML DTD. How exactly do you 
>> propose
>> XML 1.x fix these problems?
>>
> I don't; I didn't take the time

I've thought about this more and read some of the other responses.  I 
guess where I come down is that XML per se is fine for what it does and 
purports to do here; we might want to consider different design 
patterns, if you will, for using XML in applications.

The conventional approach is to assume that the XML document is in a 
file, it knows it's encoding, and is ready to face a draconian parser 
and possibly a validator as well.  Some other application put that 
document in the file, and it had a moral responsibility to do its best 
to ensure that the XML is well formed (and possibly valid).
If there is a DTD in the file or referenced from the file, it does a 
*lot*-- define internal entities, external entities, vocabularies, and 
structure models.  That makes XML 1.0 processing sortof a Big Bang that 
either produces a fully formed Infoset, or an error message.  It leads 
to the situation the Atom people are in, where there apparently is a 
stark choice between doing the Right Thing according to the spec or 
keeping the customers happy by aggregating the information they asked 
to be aggregated without annoying them about geeky details that they 
care nothing about.

An alternative is to think of a processing *pipeline* connecting a data 
source (which may or may not be XML) through a number of processing 
steps that eventually lead to an Infoset that meets some vocabulary and 
content constraints.    This provides any number of steps with which to 
adapt, cleanup, transform, and *then* parse and validate XML text.  The 
Atom people who want to be liberal can simply add a step to their 
processing pipeline that does the sort of fixup that we've talked about 
-- make sure special characters are escaped properly, fix up the 
encoding, maybe even normalize escaped HTML into well-formed XHTML by 
running it thru tidy.  That would be a service that would plug into the 
pipeline (as a library call, a SOAP invocation, a REST resource, or 
whatever) and not something that necessarily affected the rest of the 
application architecture.

The best statement I know of this point of view is Uche Ogbuji's 
"Serenity Through Markup" piece 
http://www.adtmag.com/article.asp?id=6758  " As documents move between 
systems, trust the remote system's ability to interpret the document to 
meet its own local needs. This is known as the principle of loose 
coupling, and is reminiscent of the idea of late binding in programming 
theory. In the technology sphere, the idea is that an XML document used 
in a transaction between multiple systems need not always build in all 
possible aspects of the combined systems. Source systems design 
documents to meet their needs, while the target system interprets them 
to its own needs. ... [This can be done with]  pipelines of data, where 
a document starts out in one form and ends up in one suited for a 
different environment."   Sean McGrath has written a lot about the 
pipeline approach too, but all I can find are PPT presentations.  Do 
you have a good link (if you're reading, Sean)?






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS