[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] practical question re: Java/XML handling
- From: Andrew Welch <andrew.j.welch@gmail.com>
- To: "David A. Lee" <dlee@calldei.com>
- Date: Thu, 3 Sep 2009 14:40:09 +0100
How do you handle entities in the XML ?
2009/9/3 David A. Lee <dlee@calldei.com>:
> I solved this problem in a different that is less destructive. This also
> works to replace a DTD with a different one or to force validation on a
> schema even if a non-existant DTD is specified.
>
> This particular implementation requires using the SAXParser but I belive
> the idea would work with other parsers that provide similar functionality,
> namely an override of "resolveEntity". The key trick is to resolve all
> DTD's with a "NullInputStream" ( these are trivial to write so I wont supply
> the code here)
> An empty DTD file validates any XML (atleast it does in my tests).
>
> Here's the snippet
>
>
> private class ValidatorHandler extends DefaultHandler {
> ..... // other methods as needed
> @Override
> public InputSource resolveEntity(String publicId, String systemId)
> throws IOException,
> SAXException {
>
> if( systemId.toLowerCase().endsWith(".dtd"))
> return new InputSource( new NullInputStream());
> else
> return super.resolveEntity(publicId, systemId);
> }
> }
>
>
>
> SAXParserFactory f = SAXParserFactory.newInstance();
> .... setup the factory
>
>
>
> SAXParser parser = f.newSAXParser();
> ... setup the parser
>
>
>
> parser.parse(xml, new ValidatorHandler());
>
>
>
>
>
>
>
>
> David A. Lee
> dlee@calldei.com
> http://www.calldei.com
> http://www.xmlsh.org
> 812-482-5224
>
> Mike Sokolov wrote:
>
> After all the discussion about "What is data?" I don't know if this list is
> the place to discuss actual details of implementation, but please feel free
> to send me elsewhere if you can think of a better venue.
>
> I have a need to handle XML that references a non-existent DTD. The DTD is
> irrelevant to the actual processing of the XML, and isn't available
> anywhere, but it is declared in in the DOCTYPE. I'm sure many of you have
> encountered this situation: it's practically the norm, in my experience.
>
> After years of dealing with this inherently unsatisfactory situation in a
> variety of ways, I came up with a new one that I am liking at the moment,
> which is to insert a Stream into a Java XML processing stack that strips out
> the prolog of the XML document before handing it off to a parser. This has
> the nice property that it doesn't require modifications to the stored XML
> files. It loses PIs and comments and the XML decl, but I can live with
> that.
>
> My question is twofold:
>
> 1) does the following code snippet actually do what it is claiming to? Does
> anybody see any obvious mistakes? My knowledge of the format of DOCTYPE
> decls and so on is somewhat limited. I read the spec and this seems to work
> on the examples I have, but I suspect there are some cases I'm not handling.
>
> 2) Is there a better approach? Existing code to do the same thing? Some
> way to tell parsers to ignore the DOCTYPE (even though that seems to run
> counter to the spec)?
>
> Thanks for your attention...
>
> -Mike Sokolov
>
> /**
> * An InputStream for XML that strips off the prolog of an XML
> * document. The idea is to avoid having to prevent parsers from
> attempting
> * to process an external DTD.
> *
> * @author sokolov
> *
> */
> class XmlNoPrologInputStream extends PushbackInputStream {
> XmlNoPrologInputStream (InputStream base) throws IOException {
> super (base, 2);
> int c;
> while ((c = read()) >= 0) {
> if (c == '<') {
> int c1 = read();
> if (c1 < 0) {
> // ill-formed
> reset();
> return;
> }
> // XML declaration, PI, comment or DOCTYPE
> if (c1 == '?' || c1 == '!')
> continue;
> // must be the start of the document: arrange to begin
> // reading here
> unread(c1);
> unread(c);
> return;
> }
> }
> }
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
--
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]