[
Lists Home |
Date Index |
Thread Index
]
> >And also, do surrogate pairs really introduce any issues that
> >are not already present in combining character sequences?
>
> Yes, I think they do. In particular for this thread, XML 1.0 names
> (and probably XML 1.1 names) can be checked for well-formedness and
> validity without worrying about combining characters.
Same is true for XML 1.0 names: since characters that would need
surrogates are disallowed, any surrogate in a name/nmtoken context
is malformed. Adjacent chars don't matter.
Similarly, it is _never_ OK to have an unpaired (or incorrectly paired)
surrogate; that's explicitly disallowed in the XML 1.0 grammar. If an
XML parser passes one of those through, it's an error ... and that's
normally handled before the XML parser sees such characters, in
the layer handling different character encodings.
- Dave
|