Lists Home |
Date Index |
On Sat, 2002-01-12 at 14:50, Elliotte Rusty Harold wrote:
> At 12:24 PM +1100 1/11/02, Rick Jelliffe wrote:
> >And also, do surrogate pairs really introduce any issues that
> >are not already present in combining character sequences?
> Yes, I think they do. In particular for this thread, XML 1.0 names
> (and probably XML 1.1 names) can be checked for well-formedness and
> validity without worrying about combining characters. That is, each
> character can be checked in isolation irrespective of which character
> comes before or after it. However, this is not true of surrogates.
> Whether a surrogate char is legal or not depends on what comes before
> or after it.
Elliotte has this right, so far as I can gather from the various
specifications. Combining characters introduce very complex processing,
but that processing is only needed to display or work with XML, not to
test whether the content is legitimately XML in the first place.
Since the only thing Gorille does is check character values against the
productions in the XML 1.0/1.1/whatever spec, combining characters are
not an issue _for Gorille_. Surrogates are an issue for Gorille because
surrogates must be processed in order to properly test character
Both combining characters and surrogates remain an issue for the higher
layers above Gorille because Gorille's surrogate processing is purely
internal - the flow of characters from the XML document isn't changed,
and the application will need to perform the actual surrogate
combination. All that it's gained as far as surrogates are concerned is
an assurance that any surrogates are properly formed.
A Gorille-like approach to processing combining characters might be an
interesting future project, building from a config file that explains
how particular characters combine. (Handy for the Private Use Areas,
certainly.) That would be a much larger project, however.
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!