xml-dev - Re: [xml-dev] Unicode normalization in XML 1.1

Re: [xml-dev] Unicode normalization in XML 1.1

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Unicode normalization in XML 1.1
From: Lars Marius Garshol <larsga@garshol.priv.no>
Date: 06 Apr 2003 16:33:35 +0200
In-reply-to: <20030403132804.GI29046@ccil.org>
References: <m3u1dfonr4.fsf@pc36.avidiaasen.online.no><20030403132804.GI29046@ccil.org>
Sender: larsga@pc36.avidiaasen.online.no
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1


* Lars Marius Garshol
| 
|  - clearly, documents that are not normalized are still well-formed,
|    so if the application is to have any guarantees here the processor
|    must do normalization before passing on the information,

* John Cowan
| 
| Not so.  A processor in normalization-check mode will report
| non-normalized input, so the application may make up its mind
| whether or not to accept it.

Uh, yes. Obviously what I wrote makes no sense.
 
* Lars Marius Garshol
|
| Wouldn't it be far better if the application could be certain that
| an XML 1.1 processor would provide normalized character data and to
| ignore the whole issue of how the document was encoded? After all,
| isn't the whole purpose of *having* XML parsers to insulate
| applications from worries about the lexical details of documents?
 
* John Cowan
|
| The point is that normalization is expensive, and it may be too
| expensive to do at all in small systems.  Therefore, the W3C's
| choice (expressed in the Character Model) is to have senders
| normalize, and receivers check for normalization.  In this way
| documents are normalized once at creation (or publication) time,
| rather than every time a document is received; this conserves
| net-wide cycles, since checking is cheaper than normalizing.

I can't say I like this, but at least I can see that there is
reasoning behind it and that the reasoning makes sense.

Thanks for clearing this up!

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >

References:
- Unicode normalization in XML 1.1
  - From: Lars Marius Garshol <larsga@garshol.priv.no>
- Re: [xml-dev] Unicode normalization in XML 1.1
  - From: John Cowan <cowan@mercury.ccil.org>

Prev by Date: ANN: OWL Quick Intro
Next by Date: Re: [xml-dev] Partial documents in tree-based APIs
Previous by thread: RE: [xml-dev] Unicode normalization in XML 1.1
Next by thread: RELAX NG and datatypes make XML generation NP-hard
Index(es):
- Date
- Thread