OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Problem parsing XML file with Xerces-J

[ Lists Home | Date Index | Thread Index ]
  • To: Michael Kay <mike@saxonica.com>
  • Subject: Re: [xml-dev] Problem parsing XML file with Xerces-J
  • From: Midsummer Sun <midsummer.sun@gmail.com>
  • Date: Fri, 1 Apr 2005 14:55:41 +0530
  • Cc: xml-dev@lists.xml.org
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=F3DhsGOUSGp9lZpQ7C/Yg9JE6p0vVGFG/TY6x+dAF9MJOZPShE/TxTsxsytQLzuYofcgN+4j9vBJfX8hZjnCs4KXtvoDe5PSHoFo6Y7dmtoDf/8H6K0vXl1r6KDFlHvf0O1GaaJzpcHNq/UR1uazdkAP5NgPvXCcs1iK7O3vfJg=
  • In-reply-to: <424d0215.3d4bc219.3207.ffff9c02SMTPIN_ADDED@mx.gmail.com>
  • References: <bb5e8b8605033123356074d3e1@mail.gmail.com> <424d0215.3d4bc219.3207.ffff9c02SMTPIN_ADDED@mx.gmail.com>
  • Reply-to: Midsummer Sun <midsummer.sun@gmail.com>

My 2 questions are unanswered:
1) Which method is faster - implementing EntityResolver or pre-editing
the XML file. This consideration is important for me. My program is
pooling a remote process very frequently(every 15 seconds) and
fetching the XML documents. If these 2 methods have some performance
difference, then response time of my program will be slow in one of
the case. So I must select whichever is fast.

I am sure somebody has the answer..

2) Is there a way to "not create" a redundant resource (like x.dtd below)

 public InputSource resolveEntity(java.lang.String publicId,
                                   java.lang.String systemId)
 {

     InputSource is = new InputSource();
     is.setSystemId("file:///C:/x.dtd");

     return is;
 }

Somehow this does'nt look good to me. I'll be happy if  something like
"" or null can work ;)

One other question is:
If I don't override the resolveEntity method (and not implement
EntityResolver) , and let my Java program fetch the DTD from remote
location, how can I set some property in my program to increase the
timeout. The error I am getting is "connection timed out". The DTD
"does exists" at the remote server.
(I can explore this option also).

Another wierd thought is: My program has to fetch the DTD from the
remote location (i.e. in the non-EntityResolver way to solve the
problem). My program is doing these steps:
a)DocumentBuilderFactoryImpl factory = new DocumentBuilderFactoryImpl();
b)DocumentBuilder builder = factory.newDocumentBuilder();
c)Document document = builder.parse(new InputSource(new StringReader(rsp)));

So at line c) the parser will parse the XML (and will also fetch the
remote DTD). It will be using HTTP transport for fetching DTD (the
timeout error also indicates  reference to java.net package). My PC is
behind a proxy server. I have a proxy server userid and password to
access internet. So the "parser HTTP hook" must have this proxy server
"userid & password" available to it to connect to internet. Is there a
way to provide a userid and password like this..?
Another way to ask this question is: How will the "parser HTTP hook"
behave, if it finds the HTTP connection behind a proxy server (which
requires a userid/password for authentication) ?

My other thoughts are..
Presently the XML I am fetching does not contain any external entity
references. Everything can be resolved in the XML document itself. And
I don't need to perform any validation.
But at future date, I may need to resolve references from the DTD. The
best solution for this I think is having a local copy of the DTD and
override the resolveEntity method as usual and pointing the Entity
Resolver to the local file. I think this is best.. I am fetching XML
from a real world service provider. They may change the XML structure
in future (and may possibly store entity definitions in the DTD, which
the parser must resolve).

Please let me know your thoughts ..

Best regards,

On Apr 1, 2005 1:40 PM, Michael Kay <mike@saxonica.com> wrote:
> I'm glad you've got it working. Looks good.
> 
> Michael Kay
> http://www.saxonica.com/
> 
> > -----Original Message-----
> > From: Midsummer Sun [mailto:midsummer.sun@gmail.com]
> > Sent: 01 April 2005 08:35
> > To: Michael Kay
> > Cc: xml-dev@lists.xml.org
> > Subject: Re: [xml-dev] Problem parsing XML file with Xerces-J
> >
> > > I think pre-editing of response XML (i.e. stripping DTD
> > declration) is
> > > more better "for me". For my requirement, DTD in the XML is
> > useless to
> > > me. Implementing EntityResolver imposes significant performance
> > > overhead to my program. The parser is always pooling for callback
> > > events.. So I think pre-editing by a simple string method is far
> > > efficient..
> >
> > I amend my above observation slightly..
> >
> > My program is doing:
> > DocumentBuilderFactoryImpl factory = new DocumentBuilderFactoryImpl();
> > DocumentBuilder builder = factory.newDocumentBuilder();
> > Document document = builder.parse(new InputSource(new
> > StringReader(rsp)));
> >
> > So I am using a DOM parser! But a DOM parser underneath is probably
> > using a SAX handler (to implement a DOM). i.e. a SAX handler is
> > despatching events to the DOM parser, as it is reading the XML
> > document. And DOM implementation is constructing a DOM object by
> > "assembling input from SAX implementation". I read this in a nice
> > article somewhere.
> >
> > My class implements EntityResolver interface, and calls
> > builder.setEntityResolver(obj); i.e. it registers the class object
> > itself(obj) as a handler for EntityResolver. This is probably a very
> > lightweight reference within JVM, and is nothing expensive worth
> > worrying about..
> >
> > So the DOM parser starts to parse the document. If it encounter a DTD
> > reference it will call resolveEntity method. It will probably call
> > this method after a full DOM tree is constructed (so that all entity
> > references can be resolved). The calling of resolveEntity method will
> > only be one time. So there I no expensive processing going on, as I
> > thought before ;)
> >
> > Please do correct me if I am wrong.
> >
> > If  the resource consumption by implementing EntityResolver is same as
> > the pre-editing solution(or there is a very marginal difference), I'll
> > prefer implementing the EntityResolver interface! It could be a USP in
> > my application!
> >
> > I am eagerly waiting for your opinion.
> >
> > Best regards,
> >
> 
>




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS