[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
SAX-ext proposal #3: entity encoding, version
- From: David Brownell <david-b@pacbell.net>
- To: xml-dev@lists.xml.org, sax-devel@lists.sourceforge.net
- Date: Wed, 01 Aug 2001 18:00:32 -0700
Locator infoset extensions
- Two of the infoset properties for documents are not supported by
the current SAX2 API (including extensions): the character encoding
used, and the XML version used.
- These are actually characteristics of all parsed entities, not just
the document entity, just like the [base URI] currently exposed through
the Locator interface.
- There may be up to three kinds of encoding name to be concerned with:
* What's declared inline, using an xml/text decl, or defaulted
(UTF-8, UTF-16)
* Sometimes an external declaration, through MIME type, which
is authoritative but which may not agree with the inline decl
* For Java, the name of the encoding actually used by a Reader
will often not match the "winning" declaration name. (For one
example, "UTF8" really means "UTF-8".)
The actual encoding used affects the kind of Unicode normalizations
that need to be done. That's what the infoset needs (yes?), and it'd
be the one that's declared (externally, else internally), a non-Java name.
PROPOSAL
- Define a new org.xml.sax.ext interface:
public interface Locator2 extends Locator
{
public String getXMLVersion ();
public String getEncoding ();
}
Strings returned would be the relevant values, or null if
the values are not known. The encoding string would
reflect the active declaration.
That would be implemented by Locator objects provided in
setDocumentLocator() callbacks, to expose this information.
- Define a new org.xml.sax.ext class implementing that
interface, inheriting from org.xml.sax.helpers.LocatorImpl
- Define a new standard feature ID:
http://xml.org/sax/features/use-locator2
Read-only
If true, the Locator object passed in setDocumentLocator
events will also implement the Locator2 interface,
and can be cast to it.
Note that because of the way Java typing works, testing that
feature would be optional: applications could always try to
cast (if they were willing to take the performance hit).
QUESTIONS:
- Is it necessary to expose both types of declared encodings?
If so, proposal: a new String getEncodingDecl () returns the
internal label; getEncoding () would return the (authoritative)
external label. The internal label might be null if it was
defaulted. (Tracking this info costs, and it's not clear any
apps should actually care, which is why it's omitted.)
- Is there a better convention to use for extending interfaces
than the numeric suffix? (Meta-1)
- Is the new implementation class really needed? Alternative:
update LocatorImpl. (Meta-2)