[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Parsing without resolving entities
- From: David Carlisle <davidc@nag.co.uk>
- To: rmcgarvey@generalcode.com
- Date: Mon, 29 Oct 2007 17:06:21 GMT
It depends a bit on what you are going to do with the document, and in
particular whether you are using an XML API that can support undefined
entity references. if you are (DOM for example) all you need is to
arrange that the DTD that defines the entities is not read.
If you are not (and most XSLT processing for example requires the
entities expanded as the xpath data model does not support undefined
entities) then you need to remove them somehow, perhaps in a form that
lets them be replaced.
As you suggest, preprocessing to hide the ampersand works (especially
for more complicated entities that you could not re-constitute just from
the character data).
Or you can modify the dtd so that — expaands to &mdash; and
then you don't need to change the document on input (but do still need
to post process the result to get rid of the extra quoting.
or (perhaps) you can let all the entities expand but then finally
serialise the data using entities rather than characters where possible
(for example XSLT will do this if writing html, or XSLT2 you can specify
a character map (eg
http://www.w3.org/2003/entities/iso9573-2003/iso9573-2003map.xsl)
that will do the same thing. Note that this doesn't preserve the
original entities, juist uses entities wherever possible, whether or not
the input used that form.
David
________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.
________________________________________________________________________
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]