XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] practical question re: Java/XML handling

How do you handle entities in the XML ?


2009/9/3 David A. Lee <dlee@calldei.com>:
> I solved this problem in a different that is less destructive.  This also
> works to replace a DTD with a different one or to force validation on a
> schema even if a non-existant DTD is specified.
>
> This particular implementation  requires using the SAXParser but I belive
> the idea would work with other parsers that provide similar functionality,
> namely an override of "resolveEntity".  The key trick is to resolve all
> DTD's with a "NullInputStream" ( these are trivial to write so I wont supply
> the code here)
> An empty DTD file validates any XML (atleast it does in my tests).
>
> Here's the snippet
>
>
>     private class ValidatorHandler extends DefaultHandler {
>     ..... // other methods as needed
>         @Override
>         public InputSource resolveEntity(String publicId, String systemId)
> throws IOException,
>                 SAXException {
>
>             if( systemId.toLowerCase().endsWith(".dtd"))
>                 return new InputSource( new NullInputStream());
>             else
>                 return super.resolveEntity(publicId, systemId);
>         }
>      }
>
>
>
>     SAXParserFactory f = SAXParserFactory.newInstance();
>     .... setup the factory
>
>
>
>         SAXParser parser = f.newSAXParser();
>        ... setup the parser
>
>
>
>         parser.parse(xml, new ValidatorHandler());
>
>
>
>
>
>
>
>
> David A. Lee
> dlee@calldei.com
> http://www.calldei.com
> http://www.xmlsh.org
> 812-482-5224
>
> Mike Sokolov wrote:
>
> After all the discussion about "What is data?" I don't know if this list is
> the place to discuss actual details of implementation, but please feel free
> to send me elsewhere if you can think of a better venue.
>
> I have a need to handle XML that references a non-existent DTD.  The DTD is
> irrelevant to the actual processing of the XML, and isn't available
> anywhere, but it is declared in in the DOCTYPE.  I'm sure many of you have
> encountered this situation: it's practically the norm, in my experience.
>
> After years of dealing with this inherently unsatisfactory situation in a
> variety of ways, I came up with a new one that I am liking at the moment,
> which is to insert a Stream into a Java XML processing stack that strips out
> the prolog of the XML document before handing it off to a parser.  This has
> the nice property that it doesn't require modifications to the stored XML
> files.  It loses PIs and comments and the XML decl, but I can live with
> that.
>
> My question is twofold:
>
> 1) does the following code snippet actually do what it is claiming to?  Does
> anybody see any obvious mistakes?  My knowledge of the format of DOCTYPE
> decls and so on is somewhat limited.  I read the spec and this seems to work
> on the examples I have, but I suspect there are some cases I'm not handling.
>
> 2) Is there a better approach?  Existing code to do the same thing?  Some
> way to tell parsers to ignore the DOCTYPE (even though that seems to run
> counter to the spec)?
>
> Thanks for your attention...
>
> -Mike Sokolov
>
>    /**
>     * An InputStream for XML that strips off the prolog of an XML
>     * document.  The idea is to avoid having to prevent parsers from
> attempting
>     * to process an external DTD.
>     *
>     * @author sokolov
>     *
>     */
>    class XmlNoPrologInputStream extends PushbackInputStream {
>               XmlNoPrologInputStream (InputStream base) throws IOException {
>            super (base, 2);
>            int c;
>            while ((c = read()) >= 0) {
>                if (c == '<') {
>                    int c1 = read();
>                    if (c1 < 0) {
>                        // ill-formed
>                        reset();
>                        return;
>                    }
>                    // XML declaration, PI, comment or DOCTYPE
>                    if (c1 == '?' || c1 == '!')
>                        continue;
>                    // must be the start of the document: arrange to begin
>                    // reading here
>                    unread(c1);
>                    unread(c);
>                    return;
>                }
>            }
>        }
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>



-- 
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS