[
Lists Home |
Date Index |
Thread Index
]
- From: David Megginson <david@megginson.com>
- To: xml-dev@lists.xml.org
- Date: Fri, 22 Sep 2000 14:02:12 -0400 (EDT)
Alex Milowski writes:
> In the ContentHandler interface, there is a method called character()
> which allows the processor to pass the character data that is a child
> of an element to a processing application. If you introduce XML Schemas,
> this allows one to create a streaming type factory to construct the
> actual type instance without having to first instantiate a Java
> string--which is very good from an optimization standpoint.
Yes, although Java Strings are much more efficient than they used to
be, at least in the Linux VM's. I remember running some tests a
couple of years ago when Tim Bray suggested that string allocation was
expensive, and the overhead of allocating thousands of strings turned
out to be negligible. I think that JDK 1.1 must have fixed some
problems there.
> Unfortunately, the same concept does not exist for attributes. An
> attribute's value is already been constructed into a Java string before
> the application can receive the lexical representation. This seems rather
> unforunate for XML Schemas and optimization since the typing of "leaf
> nodes" within an XML document is uniform for attributes and element child
> content.
This was a matter of much discussion during the original SAX 1.0
design, and most people preferred it this way.
> Is it too late to fix this? This would seriously help in building
> optimized XML Schema aware processors.
Yes, it's too late to fix this, at least for now -- I intend a bug-fix
release soon, but no major API changes for a while (except extensions,
which are outside the SAX2 core). I'd be interested in seeing some
profiling data to see how much the string allocation is actually
costing.
Note that a parser (though not a filter, obviously) could perform lazy
allocation of strings -- that might help a bit.
All the best,
David
--
David Megginson david@megginson.com
http://www.megginson.com/
|