OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] XML 1.1 and Unicode normalization

[ Lists Home | Date Index | Thread Index ]

james anderson scripsit:

> > XML parsers are considered consumers, not producers. 
> that is one of the less intuitively obvious things in these specs.

If you say so, but it seems intuitive to me:  parsers consume a public text
stream and produce various internal data structures depending on their
design.  It is the former, not the latter, that is the subject of CharMod.

> there's this passage in charmod which goes something like "a text processing
> component [an instance of which i would expect an xml processor to be] that
> receives suspect text [instances of which i would, in general, expect
> documents to be] must not perform any normalization-sensistive operations
> [instances of which i would expect any name construction and comparison
> operations to be] unless it has first confirmed through inspection that the
> text is in normalized form, ...."
> which renders the distinction between consumers and producers academic.

How so?  Producers of public text formats should normalize; consumers
should verify normalization.

> unless there some way to interpret the passage so that it does not apply to
> things like start/end tag matching, attribute defaulting, and validation.

In principle, these things should not be done by a parser unless it knows
(either by verification or by certification) that it is dealing with
properly normalized text.

> what is more, the passage continues with the proscription, that "[a text
> processing component] must not normalize the suspect text." 
> which left me wondering whether a parser would be conformant if, when it
> signalled an exception upon determining that it was about to construct a name
> from a non-nfc string, it at least offered the application a restart which
> attempted to normalize the namestring and continue.

As in every case of conformance to a standard, it is acceptable for a
product to have some mode in which it does not conform, provided it
does not claim that mode to be a conforming mode, and provided that
there is at least one mode which is conforming.   The common example is
C compilers, which often accept languages wider or narrower than ISO C
with appropriate switches.

In the general case, a standard can have nothing to say about products
that don't claim conformance to it.

Business before pleasure, if not too bloomering long before.
        --Nicholas van Rijn
                John Cowan <jcowan@reutershealth.com>
                        http://www.ccil.org/~cowan  http://www.reutershealth.com


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS