xml-dev - Unicode normalization in XML 1.1

Unicode normalization in XML 1.1

[ Lists Home | Date Index | Thread Index ]

To: <xml-dev@lists.xml.org>
Subject: Unicode normalization in XML 1.1
From: Lars Marius Garshol <larsga@garshol.priv.no>
Date: 03 Apr 2003 12:53:51 +0200
Sender: larsga@pc36.avidiaasen.online.no
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1


The more I look at the text in section 2.13 of XML 1.1, the more
confused I get. There are two things that bother me: why require that
XML documents be normalized, and does the specification require
processors to pass normalized character data to the application?


Let's start with the last one:

 - clearly, documents that are not normalized are still well-formed,
   so if the application is to have any guarantees here the processor
   must do normalization before passing on the information,

 - the text says that "XML processors must not transform the input to
   be in fully normalized form." This seems to say that processors are
   not allowed to do the transformation.

Apparently, the application gets the character data as it was in the
document, and is then informed that "document was normalized" or
"document was not normalized".


This brings me to the first question: why should the application have
to care? Wouldn't it be far better if the application could be certain
that an XML 1.1 processor would provide normalized character data and
to ignore the whole issue of how the document was encoded? After all,
isn't the whole purpose of *having* XML parsers to insulate
applications from worries about the lexical details of documents?

In other words, why not rewrite this so that processors are required
to normalize character data? Then the whole issue of whether or not
documents were normalized just disappears, which means that XML 1.0
and 1.1 documents will appear the same to applications with regard to
normalization.


Or did I just completely misunderstand everything?

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >

Follow-Ups:
- Re: [xml-dev] Unicode normalization in XML 1.1
  - From: John Cowan <cowan@mercury.ccil.org>

Prev by Date: Re: [xml-dev] Design as, one hopes, not premature optimization
Next by Date: Re: [xml-dev] Design as, one hopes, not premature optimization
Previous by thread: Doubt in xslt
Next by thread: Re: [xml-dev] Unicode normalization in XML 1.1
Index(es):
- Date
- Thread