[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: XML Blueberry
- From: John Cowan <jcowan@reutershealth.com>
- To: Vincent-Olivier Arsenault <vincent@neuro6.com>
- Date: Fri, 22 Jun 2001 10:27:46 -0400
Vincent-Olivier Arsenault wrote:
> But couldn't that be deduced from the binary representation (on the
> platform level) so that the parser just has to deal with a "current" (at
> the time of the parser implementation) UNICODE spec string? Why does the
> (XML) parser need to know the charset used?
It does not in one sense. But it needs to know what characters are
and are not legal in names, which is quite independent of encoding:
DOUBLE DAGGER is illegal, whether you encode it 0x2028 (UTF-16)
or 0x87 (CP-1252). Since the list of encoded characters is still
growing, although slowly, new name characters come into existence
from time to time.
>> That's dangerous: it leads to interop failures. What if the version of
>> Java at the receiving end has slightly different tables from the one
>> at the sending end?
>
> That's not XML interop but UNICODE interop.
The one depends on the other.
> Aren't such "recovery"
> mechanism specified in UNICODE?
No.
> And anyways the problem exists with
> implementations based on the current spec, you said it yourself : some
> parsers have tables and some don't.
In which case one is RIGHT and the other is WRONG, because there is
a normative list of characters in the XML Rec. We can check.
> Think abstraction!
Think hairiness.
--
There is / one art || John Cowan <jcowan@reutershealth.com>
no more / no less || http://www.reutershealth.com
to do / all things || http://www.ccil.org/~cowan
with art- / lessness \\ -- Piet Hein