OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SAX-ext proposal #3: entity encoding, version

Locator infoset extensions

- Two of the infoset properties for documents are not supported by
  the current SAX2 API (including extensions):  the character encoding
  used, and the XML version used.

- These are actually characteristics of all parsed entities, not just
  the document entity, just like the [base URI] currently exposed through
  the Locator interface.

- There may be up to three kinds of encoding name to be concerned with:

    * What's declared inline, using an xml/text decl, or defaulted
      (UTF-8, UTF-16)
    * Sometimes an external declaration, through MIME type, which
      is authoritative but which may not agree with the inline  decl
    * For Java, the name of the encoding actually used by a Reader
      will often not match the "winning" declaration name.  (For one
      example, "UTF8" really means "UTF-8".)

  The actual encoding used affects the kind of Unicode normalizations
  that need to be done.  That's what the infoset needs (yes?), and it'd
  be the one that's declared (externally, else internally), a non-Java name.


    - Define a new org.xml.sax.ext interface:

   public interface Locator2 extends Locator
     public String getXMLVersion ();
     public String getEncoding ();
      Strings returned would be the relevant values, or null if
      the values are not known.  The encoding string would
      reflect the active declaration.

      That would be implemented by Locator objects provided in
      setDocumentLocator() callbacks, to expose this information.

    - Define a new org.xml.sax.ext class implementing that
      interface, inheriting from org.xml.sax.helpers.LocatorImpl

    - Define a new standard feature ID:


   If true, the Locator object passed in setDocumentLocator
   events will also implement the Locator2 interface,
   and can be cast to it.
      Note that because of the way Java typing works, testing that
      feature would be optional:  applications could always try to
      cast (if they were willing to take the performance hit).

    - Is it necessary to expose both types of declared encodings?

      If so, proposal:  a new String getEncodingDecl () returns the
      internal label; getEncoding () would return the (authoritative)
      external label.  The internal label might be null if it was
      defaulted.  (Tracking this info costs, and it's not clear any
      apps should actually care, which is why it's omitted.)
    - Is there a better convention to use for extending interfaces
      than the numeric suffix?  (Meta-1)
    - Is the new implementation class really needed?  Alternative:
      update LocatorImpl.  (Meta-2)