OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] UTF-8+names

[ Lists Home | Date Index | Thread Index ]

Tim Bray wrote:
> James Clark wrote:
> > But with +names you don't want to work at the encoding level.  For 
> > example, if you have a ü in your text file, that will be 
> two bytes in
> > UTF-8+names, but you would want to work with it as a single 
> character.
> > To edit a UTF-8+names text file, you need to make your text editor 
> > treat it as if it were encoded in UTF-8. In other words, to make 
> > things work you have to edit it in the wrong encoding.  
> This will be 
> > extremely confusing to users.
> I'm not sure I agree.  In UTF-8+names, ü could show up either 
> as itself 
> as ü 

The point is how you make it show up as ü

You normally don't see on the screen the bits and bytes of the encoding of a character, you see some display form of the encoded character.   & u u m l ;  is the UTF-8 re-interpretation of a UTF-8+names bit pattern.  If this reinterpretation doesn't take place, the human user will not see   & u u m l ;  on her screen - she will see some display form of the  LATIN U WITH DIAERESIS  character.

Editors would have to be modified - and in ways that affect their processing model - to be able to handle this.



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS