Lists Home |
Date Index |
From: "Tim Bray" <email@example.com>
> James Clark wrote:
> > But with +names you don't want to work at the encoding level. For
> > example, if you have a ü in your text file, that will be two bytes in
> > UTF-8+names, but you would want to work with it as a single character.
> > To edit a UTF-8+names text file, you need to make your text editor treat
> > it as if it were encoded in UTF-8. In other words, to make things work
> > you have to edit it in the wrong encoding. This will be extremely
> > confusing to users.
> I'm not sure I agree. In UTF-8+names, ü could show up either as itself
> as ü - if you had the gear that could handle it as itself just put
> it in that way and you're fine. For things like ∯ that it's very
> unlikely you can edit in place, leave it as ∯ in the edit window.
I'm sure I don't agree with that! What you suggest is that if a UTF-8+names
(U8n) encoded character can be displayed in the current screen font it
should be translated to the actual character rather than displayed as the
escaped name. I don't have to wait for the implementation; I can file the
bug reports right now. Consider the scenario where half the U8n characters
are displayable and half are not. User edits file; user saves file. What is
the editor supposed to do? If it saves the text as the user sees it, the U8n
encodings are lost for half the characters. This is a destructive
round-trip, usually not seen as a good thing. OTOH, if the editor attempts
to translate some or all non-ASCII characters back to U8n encodings, it will
inevitably translate some that were not originally encoded as such. Another
This sort of thing is a Bad Idea.