OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   [SAX] How to keep entities unresolved in the result ?

[ Lists Home | Date Index | Thread Index ]

Hi there,

I'm using Sax in Java to parse an XHTML document.
Everything works fine, except that the entities found are always translated
by the parser to their equivalent in the characters() method :
& becomes &
  becomes space
é becomes é
This is fine, but how do I get the ref back ? I must in my case keep the
existant otherwise I get errors in the XHTML generated.

Moreover, depending of encoding issue, some entities such as ’ are
translated to "?". I've set the encoding to ISO-8859-1, and didn't find
which one to use to get back the ’ ...

I found a way to get the entity (startEntity / endEntity) using my own
handler for everything (DTD, Content, Error, lexical...), but it seems the
characters() method is called after all entities contained between two
elements are translated, so I don't know how to do what I want...
I've searched the xml-dev archive, there was an old thread about this, but
didn't end as I wanted it too :)
http://lists.xml.org/archives/xml-dev/200005/msg00211.html

I simply want to keep the #8217 or whatever was the entity, in the
characters() method.

Even more, if it's possible, I'd like not to resolve entities at all, cause
I don't work with it and it's causing me more troubles than solutions
(typical error is "Undefined entity...").

Thanks in advance,
Aurelien





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS