OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   I18n and SAX Locator ( was Re: SAX2 r2 ... last call!)

[ Lists Home | Date Index | Thread Index ]

From: "David Brownell" <david-b@pacbell.net>
> Seems to me that it's time to put this release out!  Unless
> someone reports a significant bug in the next few days,
> I soon be finalizing this release, including the javadoc
> updates that are now in CVS.
> Reminder:  The goal of this release is just bugfixes, which
> include doc/spec clarifications.  It should just "drop in" to
> existing environments.
I think there are two document clarifications needed in org.xml.sax.Locator 

1) For the method getSystemId() it says

"     * <p>If the system identifier is a URL, the parser must resolve it
     * fully before passing it to the application.</p>"

but this is not at all clear. (In Xerces 2 beta 4, it seems that the literal
system identifier is being returned, which is of course no good for 
locating the entity if the ID is a relative file path.  We are still looking
at it: if it is their bug not our usage we will report it to them, of course.
But we found the wording difficult, and if the Xerces beta has that problem,
presumably they did too.)

I suggest something like the following note should be added:
  "For example, if the system ID is a relative file path, then getSystemId()
should return an absolute file path."

Actually, I think getSystemId() needs to be superceded by two functions
  getLiteralSystemId()  -- returns the original string unchanged
  getResolvedSystemId() -- returns the resolved, absolute location of the actual resource used
so that error messages can be given in terms of the original document's text
(the literal string used for the system ID) but error-handling systems need
to track down the entity.

2) The method getColumnNumber() is misleading for CJK (Chinese/Japanese/Korean)
uses, because a Chinese ideograph takes two "columns".  It is also misleading for
Right-to-left scripts (e.g. an XML document with Arabic data content) because 
with bidirectional text anything will happen.

I suggest something like the following note should be added 
  "getColumnCount() will typically be the count of the number of UTF-16 characters at
the column in question. This may not correspond to the visual column in a text editor 
if the line contains combining character sequences, wide characters, bi-directional text, 
surrogate pairs."

Rick Jelliffe


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS