Lists Home |
Date Index |
From: "David Brownell" <email@example.com>
> Seems to me that it's time to put this release out! Unless
> someone reports a significant bug in the next few days,
> I soon be finalizing this release, including the javadoc
> updates that are now in CVS.
> Reminder: The goal of this release is just bugfixes, which
> include doc/spec clarifications. It should just "drop in" to
> existing environments.
I think there are two document clarifications needed in org.xml.sax.Locator
1) For the method getSystemId() it says
" * <p>If the system identifier is a URL, the parser must resolve it
* fully before passing it to the application.</p>"
but this is not at all clear. (In Xerces 2 beta 4, it seems that the literal
system identifier is being returned, which is of course no good for
locating the entity if the ID is a relative file path. We are still looking
at it: if it is their bug not our usage we will report it to them, of course.
But we found the wording difficult, and if the Xerces beta has that problem,
presumably they did too.)
I suggest something like the following note should be added:
"For example, if the system ID is a relative file path, then getSystemId()
should return an absolute file path."
Actually, I think getSystemId() needs to be superceded by two functions
getLiteralSystemId() -- returns the original string unchanged
getResolvedSystemId() -- returns the resolved, absolute location of the actual resource used
so that error messages can be given in terms of the original document's text
(the literal string used for the system ID) but error-handling systems need
to track down the entity.
2) The method getColumnNumber() is misleading for CJK (Chinese/Japanese/Korean)
uses, because a Chinese ideograph takes two "columns". It is also misleading for
Right-to-left scripts (e.g. an XML document with Arabic data content) because
with bidirectional text anything will happen.
I suggest something like the following note should be added
"getColumnCount() will typically be the count of the number of UTF-16 characters at
the column in question. This may not correspond to the visual column in a text editor
if the line contains combining character sequences, wide characters, bi-directional text,