OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Unicode normalization in XML 1.1

[ Lists Home | Date Index | Thread Index ]


* Lars Marius Garshol
| 
|  - clearly, documents that are not normalized are still well-formed,
|    so if the application is to have any guarantees here the processor
|    must do normalization before passing on the information,

* John Cowan
| 
| Not so.  A processor in normalization-check mode will report
| non-normalized input, so the application may make up its mind
| whether or not to accept it.

Uh, yes. Obviously what I wrote makes no sense.
 
* Lars Marius Garshol
|
| Wouldn't it be far better if the application could be certain that
| an XML 1.1 processor would provide normalized character data and to
| ignore the whole issue of how the document was encoded? After all,
| isn't the whole purpose of *having* XML parsers to insulate
| applications from worries about the lexical details of documents?
 
* John Cowan
|
| The point is that normalization is expensive, and it may be too
| expensive to do at all in small systems.  Therefore, the W3C's
| choice (expressed in the Character Model) is to have senders
| normalize, and receivers check for normalization.  In this way
| documents are normalized once at creation (or publication) time,
| rather than every time a document is received; this conserves
| net-wide cycles, since checking is cheaper than normalizing.

I can't say I like this, but at least I can see that there is
reasoning behind it and that the reasoning makes sense.

Thanks for clearing this up!

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS