OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: SAX, Java, and Namespaces (was Re: Restricted Namespaces for XML)

[ Lists Home | Date Index | Thread Index ]
  • From: Tyler Baker <tyler@infinet.com>
  • To: David Megginson <david@megginson.com>
  • Date: Thu, 04 Feb 1999 19:22:54 -0500

David Megginson wrote:

> Tyler Baker writes:
>
>  > If SAX were to make a simple requirement that all strings that
>  > represent symbols (like names) were to be interned then things
>  > would be a lot cheaper.  The same can be said of the DOM as well.
>
> The problem is that Java's own intern is so terribly inefficient that
> no serious parser writer will use it (most of them have their own,
> custom interns).

As of JDK 1.1.6 things are not so bad and Java 2 is a bit better as interned Strings are under
the hood managed using Weak References.  It could be made better in the JDK though.  I suspect
if they made a real effort in the Java 2 JVM they could make string interns at least twice as
fast as things currently are.  Nevertheless, string interning is a one time cost so lets put
that in perspective here.

> Even then, you wouldn't get any help with the "xmlns:" prefix
> matching, which is the costliest part.  The most efficient way to do

Very true (ouch, ouch, ouch)...

> namespace processing is directly in the parser (which has to look at
> every attribute name anyway), but my own tests have shown that filter
> layer on top of SAX isn't too bad.

Unfortunately as in the case with all XML or XSL benchmarks, the test data can vary
enormously.  If you have documents that have few elements with attributes (except of course
namespace attributes), then things probable will not be so bad.  However, if you have lots of
attributes in elements, then you need to check every single attribute to see if it starts with
"xmlns:" (ouch, ouch, ouch).

So I suppose we should no encourage document designers to model data only as character content
in elements and only use attributes for ID's and namespaces declarations.

For types like a rectangle, I think using attributes makes a lot more sense in the general
case, but in the presence of "Namespaces in XML" I would change things from:

<Rectangle x="0" y="1" width="59" height="23">

to:

<myprefix:Rectangle xmlns:myprefix="YabbaDabbaDoo">
  <myprefix:x>
    0
  </myprefix:x>
  <myprefix:y>
    1
  </myprefix:y>
  <myprefix:width>
    59
  </myprefix:width>
  <myprefix:height>
    23
  </myprefix:height>
</myprefix:Rectangle>

The really sad thing about this is that there tends to be a feeling among a lot of people that
meaningful prefixes do not matter at all.  If XML is ever going to be editable by an average
internet user for some common tasks, meaningful prefixes do matter.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS