OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ANNOUNCE] PullParser 1.1



Pull Parser 1.1  is ideally suited for applications that require very small 
size XML parser - the jar file with compiled classes is around 20 KB. 
Its simple API for pull parsing  (identical for Java and C++) is well
suited for unmarshalling data structures from XML (such as in SOAP). 
It has also small memory requirements and parser instance can be reused.

Full source code both for Java and C++ (tested under Linux, Solaris and
W2K) is available under open source license (see LICENSE.txt) from

     http://www.extreme.indiana.edu/soap

In short - advantages:
 * supports most of XML 1.0 (except validation and external entities)
 * source code for Java & C++ is available under open source license
 * almost identical both for Java and C++ versions
 * pull interface - ideal for deserializing XML objects (like SOAP)
 * fast and simple (thin wrapper around XmlTokenizer class - adds
   about 10% for big documents, 50% additional time for small
   documents)
 * lightweight memory model - minimized memory allocation:
   element content and attributes are only read on explicit
   method calls, both StartTag and EndTag can be reused during parsing
 * small footprint - total compiled size around 20K
 * by default supports namespaces parsing  (can be switched off)
 * support for mixed content can be explicitly disabled
 * minimal memory utilization: does not use memory except for
   input and content buffer (that can grow in size) and to maintain list
   of attributes (reused in parsing)
 * fast: all tokenizing done in one function (simple automata)
 * xml tokenizer supports on demand parsing of
   Characters, CDSect, Comments, PIs etc.

and it has some limitations:
 * this is final beta version - may have still bugs :-)
 * this is non validating parser and it does not parse DTD (recognizes
   only predefined entities) but most of new applications will use XML
   schemas and ignore DTDs...
 * C++ version does not support UNICODE (wchar_t)

It was designed to make SOAP serialization fast and very easy (both for
recursive and multi-ref), for example code to deserialize String could
look as:

  public String readString(XmlPullParser pp, StartTag stag)
    throws DeserializeException, XmlPullParserException, IOException
  {
    String xs = stag.getValue(Soap.XSI_NS, "null");
    if( "1".equals(xs) ) {
      if(pp.next() != XmlPullParser.END_TAG)
        throw new DeserializeException("expected end tag");
      return null;
    }
    if(pp.next() != XmlPullParser.CONTENT)
      throw new DeserializeException("expected content");
    String s = pp.readContent();
    if(pp.next() != XmlPullParser.END_TAG)
      throw new DeserializeException("expected end tag");
    return s;
  }

Your comments and suggestions are welcome!

Thanks,

Aleksander Slominski, 
soaprmi@extreme.indiana.edu
IU Extreme! Computing Lab
--
Aleksander Slominski, IU, http://www.extreme.indiana.edu/~aslom
As I look afar I see neither cherry Nor tinted leaves Just a modest hut
on the coast In the dusk of Autumn nightfall-Fujiwara no Teika(1162-1241)