OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] XML 1.1 and Unicode normalization

[ Lists Home | Date Index | Thread Index ]

John Cowan wrote:
> james anderson scripsit:
> > > XML parsers are considered consumers, not producers.
> >
> > that is one of the less intuitively obvious things in these specs.
> If you say so, but it seems intuitive to me:  parsers consume a public text
> stream and produce various internal data structures depending on their
> design.  It is the former, not the latter, that is the subject of CharMod.

? even where charmod devotes attention to the issues of string identity
matching, and string indexing. my intuition is that those topics have to do
with operations on internal data structures produced.

> > there's this passage in charmod which goes something like "a text processing
> > component [an instance of which i would expect an xml processor to be] that
> > receives suspect text [instances of which i would, in general, expect
> > documents to be] must not perform any normalization-sensistive operations
> > [instances of which i would expect any name construction and comparison
> > operations to be] unless it has first confirmed through inspection that the
> > text is in normalized form, ...."
> >
> > which renders the distinction between consumers and producers academic.
> How so?  Producers of public text formats should normalize; consumers
> should verify normalization.

charmod has a nicely amorphous cloud in it somewhere to depict the internet.
as far as an application which uses a parser as a utility sees the internet
when it is trying to work with data transparently, where exactly is the
parser? where is the parser when it migrates into the os?

> > unless there some way to interpret the passage so that it does not apply to
> > things like start/end tag matching, attribute defaulting, and validation.
> In principle, these things should not be done by a parser unless it knows
> (either by verification or by certification) that it is dealing with
> properly normalized text.

as noted in my original post, the intuition of an i18n novice is that this is
not very useful.

> > what is more, the passage continues with the proscription, that "[a text
> > processing component] must not normalize the suspect text."
> >
> > which left me wondering whether a parser would be conformant if, when it
> > signalled an exception upon determining that it was about to construct a name
> > from a non-nfc string, it at least offered the application a restart which
> > attempted to normalize the namestring and continue.
> As in every case of conformance to a standard, it is acceptable for a
> product to have some mode in which it does not conform, provided it
> does not claim that mode to be a conforming mode, and provided that
> there is at least one mode which is conforming.   The common example is
> C compilers, which often accept languages wider or narrower than ISO C
> with appropriate switches.

which is why, to return to the earlier remark, that

> [as] for waiting, it's now or never as far as XML 1.1 is concerned.

i figured it is better to connect these dots after the print is dry rather
than try to figure out what they are intended to mean in-progress.



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS