OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   DESIGN PROPOSAL: Java XMLIterator

[ Lists Home | Date Index | Thread Index ]
  • To: xml-dev@lists.xml.org
  • Subject: DESIGN PROPOSAL: Java XMLIterator
  • From: John Cowan <jcowan@reutershealth.com>
  • Date: Mon, 17 Dec 2001 17:11:42 -0500
  • User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.6) Gecko/20011120

We currently have two base-level APIs for XML processing in
the Java world: DOM and its variant JDOM, which build in-memory
trees, and SAX, which pushes a stream of events to application
event-handling methods.

This is a first design for XMLIterator, a third base-level API
which allows an application to pull content from XML.  This
avoids the memory demand and navigation issues of DOM, and
is a more straightforward programming model than SAX, which
requires magic data connections between the event handlers in
order to maintain application state.  XMLIterator extends
the familiar Iterator interface, so it models an XML document
as a linear collection of partially specified nodes.

I am asking this list to help me refine the design of XMLIterator,
and then build two implementations:  SAXAdapter and DOMAdapter,
for layering over SAX and DOM parsers respectively.


// This is version 0.1 of XMLIterator
// It supports SAX2 events only, and does not handle
// prefix-mapping events (because I haven't figured out
// what the Right Thing is).  Stuff provided only by
// DOM should be factored in too.

package org.ccil.cowan.iter;
public interface XMLIterator
	extends java.util.Iterator {

// Processing model:  Iteration starts with the
// ELEMENT node of the document element, and on successive
// calls to next(), proceeds through the document,
// returning ELEMENT, ATTRIBUTE, SKIPPED_ENTITY,
// END_ELEMENT, and PI nodes in document order
// (except that ATTRIBUTE nodes appear just after
// their owner ELEMENT node in arbitrary order),
// followed by the END node.

// hasNext is inherited from Iterator, and returns true
// if the current node is not an END node

// next is inherited from Iterator, and returns an XMLNode
// object, which may be self; XMLIterators are encouraged
// to play the role of both the iterator and the component
// object, to avoid excessive object creation (as such,
// the XMLNode is considered invalid after the following
// invocation of next).

// remove is inherited from Iterator, and throws an
// exception, since XMLIterator is read-only

// Node types: legal return values of XMLNode.getType method
public static int END = 0;
public static int ELEMENT = 1;
public static int ATTRIBUTE = 2;
public static int SKIPPED_ENTITY = 3;
public static int END_ELEMENT = 4;
public static int PI = 5;

// Attribute types: legal return values of getAttributeType method
public static int CDATA = 0;
public static int ID = 1;
public static int IDREF = 2;
public static int IDREFS = 3;
public static int NMTOKEN = 4;	// also used for enumerations
public static int NMTOKENS = 5;
public static int ENTITY = 6;
public static int ENTITIES = 7;
public static int NOTATION = 8;


// Convenience methods

// If the current node is an ELEMENT or ATTRIBUTE node,
// skip all nodes to the next non-ATTRIBUTE node.
// This allows us to ignore attributes if we do not care
// about any more.
public void skipAttributes();

// Skip all nodes up to and including the END_ELEMENT node
// corresponding to the most recently seen ELEMENT node
// (the current node, if that is an ELEMENT node)
public void skipElement();

}


package org.ccil.cowan.iter;
public interface XMLNode {

// Read-only properties of the current node
// Lazy implementation is encouraged

// Returns the current Locator object
public org.xml.sax.Locator getLocator();

// Returns a node type code
public int getType();

// If the current node is an ATTRIBUTE node, return
// an attribute type code
public int getAttributeType();

// If the current node is an ELEMENT or ATTRIBUTE node,
// and namespace URI information is available,
// return it
public string getNSURI();

// If the current node is an ELEMENT or ATTRIBUTE node,
// and QName information is available, return it
public string getQName();

// If the current node is an ELEMENT or ATTRIBUTE node, and
// local name information is available, return it;
// if the current node is a PI node, return the target;
// if the current node is a SKIPPED_ENTITY node,
// return the entity name
public string getLocalName();

// If the current node is an ELEMENT node,
// return an Attributes object containing the attributes
public org.xml.sax.Attributes getAttributes();

// If the current node is an ELEMENT node,
//   return all text content up to the next tag;
// if the current node is an ATTRIBUTE node,
//   return the normalized attribute value;
// if the current node is an END_ELEMENT node,
//   return all text content up to the next tag;
// if the current node is a PI node,
//   return the content of the PI
public string getValue();

// If the current node is an ELEMENT or END_ELEMENT node,
// return true if the value property consists of ignorable whitespace
public bool isIgnorableWhitespace();

}


-- 
Not to perambulate             || John Cowan <jcowan@reutershealth.com>
    the corridors               || http://www.reutershealth.com
during the hours of repose     || http://www.ccil.org/~cowan
    in the boots of ascension.  \\ Sign in Austrian ski-resort hotel





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS