OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] UTF-8+names

[ Lists Home | Date Index | Thread Index ]


Bob Foster wrote:
> 
> Asking for requirements is always a good idea.
> 
> I think users want not to lose the following when they use non-DTD
> validation:
> 
> - Internal entities for common well-known entity sets, like 
> those of XHTML, MathML, etc.
> - Internal entities for user-defined shorthand
> - External parsed entities (includes)
> 
> If internal entities with simple definitions (i.e., no use of 
> parameter
> entities) were the sole requirement and users were willing to 
> have all the entities used in a document defined in the 
> internal subset of every document, then you are right. An 
> editor could easily insert those definitions.


This is one solution that is fine if you want to represent the "difficult"
characters as entity references within the XML document and are willing to
include a number of internal entities in the document.

The other solution I was hinting at is for tools to implement aids for
input/display of difficult characters, limited to the user interface, with
the actual Unicode character being included in the document (i.e., not as an
entity reference).

This would allow, for example, display of an existing XML document
containing difficult characters, by showing the names of those characters to
the user.  (The names can either be shown within an auxiliary view such as
tooltips, or shown in-line.)   No prior transcoding to another encoding
(such as UTF-8+names) would be necessary for displaying an existing document
in this way.

(How does the UTF-8+names proposal address this particular use case?  Prior
conversion to UTF-8+names, perhaps?  And how would one choose among the
multiple encodings available for different characters?  U with diaeresis or
ü?  Greek capital letter Psi or ? ?  and so on.  Should the bytes of the
resulting UTF-8+names -encoded XML document depend, for example, on the
particular linguistic preferences of the user that has touched the document
last?)

Alessandro



> 
> Bob Foster
> 
> From: "Alessandro Triglia" <sandro@mclink.it>
> To: "'Bob Foster'" <bob@objfac.com>; "'Tim Bray'" 
> <tbray@textuality.com>; "'Miles Sabin'" <miles@milessabin.com>
> Cc: <xml-dev@lists.xml.org>
> Sent: Sunday, October 19, 2003 6:17 PM
> Subject: RE: [xml-dev] UTF-8+names
> 
> 
> 
> I have another comment.
> 
> What is that those users have actually been asking for?  What 
> is their actual need?  Do they want to be able to display 
> and/or enter a rare character when using a user interface 
> that doesn't support that character directly?
> 
> If so, isn't this entirely a software issue?  Can't existing 
> XML browsers and editors just be extended so as to support 
> *names* for characters, and we leave the encodings alone?
> 
> For example, if I want to enter a ?   (the cyrillic character) using a
> keyboard that does not support cyrillic, I can currently use 
> some OS-specific means (say, the character map applet in 
> Windows plus a
> copy/paste).    If an XML editor had the inherent ability to 
> accept any
> Unicode character by opening a dialog box showing a list of 
> Unicode names, that would be sufficient for many purposes.  
> Likewise, if an XML viewer had the ability to display the 
> Unicode name of a rare Unicode character when the cursor is 
> above a character, that could be sufficient for many 
> purposes. If some program needs to cope with display hardware 
> that doesn't know how to display a  ? , the software itself 
> can be written so as to show the Unicode name of the ?  
> (CYRILLIC CAPITAL LETTER SHCHA) or some shorter local 
> designation, instead of a small square.
> 
> Recalling one of the cases mentioned, can't an XML editor (as a
> product-specific feature) allow the user to enter  something 
> like   &nbsp;
> and change it on the fly to the Unicode character   NON-BREAK SPACE
> depending on the context?  Can't this XML editor subsequently 
> display a tool
> tip over the character?   Do we really need to  *encode*  the 
> NON-BREAK
> SPACE  as a byte sequence  & n b s p ;   ?
> 
> What fraction of those use cases would be left out, if the 
> issue were regarded as a software issue?
> 
> Alessandro
> 
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org 
> <http://www.xml.org>, an initiative of OASIS 
<http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS