OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   genx - string termination and bounding

[ Lists Home | Date Index | Thread Index ]

I'm really sympathetic to the calls for counted rather than 
null-terminated strings, if only for the genxText() call.

So I was thinking about this back before we nuked the codePoint * 
versions, and I realized that a "length" argument could be confusing 
because in the codePoint* version it would naturally be the number of 
characters, while in the utf8Byte * version it would naturally be the 
number of bytes.  Blecch.  So I thought it would be more natural to 
have something like

  genxText(genxWriter w, utf8Byte * start, utf8Byte * end)

i.e. a pointer to the end of the string, which would have the same 
semantics in both versions of the call.  Well, we're losing the 
codePoint * stuff (good riddance) but I'd kind of like to stay with the 
stop-here argument rather than the byte (or character) count argument.

Of course if you want to null-terminate, you can, just do

  genxText(w, buf, NULL)

Two questions:
- if you have a zero byte in the string before you get to the end mark, 
should it just stop, or throw an error?  The first is more consistent 
with C culture (cf strncpy) but the latter a bit more stringent.  
Moderately leaning to just stopping.
- if the stop marker is stupidly in the middle of a UTF-8 character, 
genx should detect this and declare an error.  The existence of this 
situation is the only good argument for a count rather than a stopper.  
But not quite good enough.  -Tim





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS