xml-dev - Re: [xml-dev] Problem parsing XML file with Xerces-J

Re: [xml-dev] Problem parsing XML file with Xerces-J

[ Lists Home | Date Index | Thread Index ]

To: Michael Kay <mike@saxonica.com>
Subject: Re: [xml-dev] Problem parsing XML file with Xerces-J
From: Midsummer Sun <midsummer.sun@gmail.com>
Date: Fri, 1 Apr 2005 13:05:20 +0530
Cc: xml-dev@lists.xml.org
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=pABM317WkY4vS7e0v88reK0q+TIBPKYWFn1j4Q7O+CnagKc8CBXuQxJza8y5KYsNcj3cG57MRwWpF3dEvCVhW3gqZd6ZATZuOEfkouO461UtbC6Wfmzn4OlNRGm21kQzK/9DqPlinJRre38+mzx2Rw8EJj+t5Yp8S5LiqVvlt/o=
In-reply-to: <bb5e8b8605033120201ef8eb03@mail.gmail.com>
References: <bb5e8b86050331053731edca96@mail.gmail.com> <424c03c3.421dc0eb.67bd.ffffb62bSMTPIN_ADDED@mx.gmail.com> <bb5e8b8605033120201ef8eb03@mail.gmail.com>
Reply-to: Midsummer Sun <midsummer.sun@gmail.com>

> I think pre-editing of response XML (i.e. stripping DTD declration) is
> more better "for me". For my requirement, DTD in the XML is useless to
> me. Implementing EntityResolver imposes significant performance
> overhead to my program. The parser is always pooling for callback
> events.. So I think pre-editing by a simple string method is far
> efficient..

I amend my above observation slightly..

My program is doing:
DocumentBuilderFactoryImpl factory = new DocumentBuilderFactoryImpl();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(rsp)));

So I am using a DOM parser! But a DOM parser underneath is probably
using a SAX handler (to implement a DOM). i.e. a SAX handler is
despatching events to the DOM parser, as it is reading the XML
document. And DOM implementation is constructing a DOM object by
"assembling input from SAX implementation". I read this in a nice
article somewhere.

My class implements EntityResolver interface, and calls
builder.setEntityResolver(obj); i.e. it registers the class object
itself(obj) as a handler for EntityResolver. This is probably a very
lightweight reference within JVM, and is nothing expensive worth
worrying about..

So the DOM parser starts to parse the document. If it encounter a DTD
reference it will call resolveEntity method. It will probably call
this method after a full DOM tree is constructed (so that all entity
references can be resolved). The calling of resolveEntity method will
only be one time. So there I no expensive processing going on, as I
thought before ;)

Please do correct me if I am wrong.

If  the resource consumption by implementing EntityResolver is same as
the pre-editing solution(or there is a very marginal difference), I'll
prefer implementing the EntityResolver interface! It could be a USP in
my application!

I am eagerly waiting for your opinion.

Best regards,

Follow-Ups:
- RE: [xml-dev] Problem parsing XML file with Xerces-J
  - From: "Michael Kay" <mike@saxonica.com>

Prev by Date: REST, SOAP, Speech Acts and the mustUnderstand model of SOA communications(was: Re: What Does SOAP/WS Do that A REST System Can't?)
Next by Date: RE: [xml-dev] Problem parsing XML file with Xerces-J
Previous by thread: REST, SOAP, Speech Acts and the mustUnderstand model of SOA communications(was: Re: What Does SOAP/WS Do that A REST System Can't?)
Next by thread: RE: [xml-dev] Problem parsing XML file with Xerces-J
Index(es):
- Date
- Thread