Lists Home |
Date Index |
> > propose that a feature be added to the XMLReaderFactory, perhaps
> > http://xml.org/sax/features/unicode-normalization-checking. If this
> > feature is enabled, the factory returns readers that perform Unicode
> > normalization checking. The EG felt that this feature was most
> > appropriate at the factory level.
I am not sure why the factory level is the most appropriate? Aren't the
getFeature, setFeature functions part of the XMLReader itself? Why add
feature maintenance at the factory level when it can simply be enumerated
through creation of each available XMLReader (if necessary, as it is likely
known a priori)?
> Seems reasonable. This feature should be false by default, a true
> value should be optional, and if a problem is encountered the error()
> message in the ErrorHandler should be invoked.
<snip really good points about defining errors well>
Define problem. If the feature is not supported or not recognized the
approriate SAXException should be raised during the call to getFeature.
Unfortunately, in this proposal there is nothing per se about what to do
when the document is or is not full normalized. Perhaps Locator2 needs
another function isFullyNormalized() which may or may not return true for a
given entity/or document (here things got fuzzy for me). In cases where the
feature http://xml.org/sax/features/unicode-normalization-checking is not
supported, not recognized, or is false it will always return false. I don't
believe, based on my quick re-reading that a non fully normalized document
is an error-- it doesn't really seem to be given any status, be it warning
or error. Additional wording about the combination of such a feature and the
validation feature might be helpful as well.
Additionally, I would question whether or not a feature should be provided
for normalization transcoding or if that was beyond the scope of SAX. I
suspect it would not be imperative because the results of such a feature
could be semi-reliably deduced if there were an isFullyNormalized function
and the document contained enities in legacy encodings. The current text in
getEncoding might already speak to this... or it may need to be modified:
"Note that some recent W3C specifications require that text in some
encodings be normalized, using Unicode Normalization Form C, before
processing. Such normalization must be performed by applications, and would
normally be triggered based on the value returned by this method. "
I realize a lot of this is half-baked questions, I just wanted to bring up
all of the issues I could think of early on...