OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] practical question re: Java/XML handling

On Thu, Sep 3, 2009 at 7:26 AM, Mike Sokolov <sokolov@ifactory.com> wrote:
After all the discussion about "What is data?" I don't know if this list is the place to discuss actual details of implementation, but please feel free to send me elsewhere if you can think of a better venue.

For my part, I find it refreshing a place where one can discuss such fundamental matters as well as the lineaments of running code.  I think you'll find in the archives plenty of discussion of code, and plenty of code-free discussion alike.

I have a need to handle XML that references a non-existent DTD.  The DTD is irrelevant to the actual processing of the XML, and isn't available anywhere, but it is declared in in the DOCTYPE.  I'm sure many of you have encountered this situation: it's practically the norm, in my experience.

After years of dealing with this inherently unsatisfactory situation in a variety of ways, I came up with a new one that I am liking at the moment, which is to insert a Stream into a Java XML processing stack that strips out the prolog of the XML document before handing it off to a parser.  This has the nice property that it doesn't require modifications to the stored XML files.  It loses PIs and comments and the XML decl, but I can live with that.

Expat allows you to specify a standalone flag, which in effect expunges all external parameter entity declarations (and other such external resources incompatible with standalone="yes").  This certainly skates the edges of XML spec compliance, but I think it's legit, because I see it as an implicit transform.  Anyway, your Java tools might have the equivalent.  FWIW, I know that Jython 2.5 includes Expat wrapped for the core XMl libs, so that might be an option.

In Amara 2.x we expose this flag very conveniently.  You can do:

import amara
doc = amara.parse(myxml, standalone=True) #flag uses boolean values, not strings

And it will in effect ignore those pesky parameter entitiy decls, including declarations of external subset.

The rest of your post is Java-specific, so I'll snip and run like hell :)

Uche Ogbuji                       http://uche.ogbuji.net
Founding Partner, Zepheira        http://zepheira.com
Linked-in profile: http://www.linkedin.com/in/ucheogbuji
Articles: http://uche.ogbuji.net/tech/publications/
Friendfeed: http://friendfeed.com/uche
Twitter: http://twitter.com/uogbuji
Join me at Balisage:
* http://www.balisage.net/

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS