OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Genx

[ Lists Home | Date Index | Thread Index ]

On Jan 21, 2004, at 11:57 AM, jcowan@reutershealth.com wrote:

>> The 'codePoint' typedef may be problematic:
>>
>>     // Unicode code points (4-byte int on most systems)
>>     typedef wchar_t codePoint;
>>
>> ...
> I have argued privately that wchar_t is in fact the Right Thing here
> despite its variability in size (UTF-32 on Unix platforms, UTF-16 on
> Windows), because it makes genx compatible with both standardized and
> non-standardized facilities, most especially "..."L strings.  Some
> conditional logic will be needed to interpret the input as UTF-16 or
> UTF-32, which can be based on sizeof(wchar_t).  Hypothetical platforms
> where sizeof(wchar_t) == 1 can be neglected.

Almost.  How about we leave it as wchar_t, but *not* UTF-16, so a value  
that's in a surrogate block is an error.  Then we change the name from  
codePoint (which could be interpreted as meaning "UTF-16 Code Point" to  
something more explicit like

numericValueCorrespondingToAUnicodeCharacterAsInUPlusFourHexDigitsIsThat 
Clear

John Cowan has suggested that "codeUnit" might be a good name, I'd be  
inclined to "uniChar", any other ideas?

If someone wants to put a generic UTF-16 processor on top of genx, that  
would be fine.  I don't see the demand for supporting it at the input  
end of genx because the UTF-16 centric languages like Java and C# have  
decent xml-writing software already. -Tim





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS