|
Re: [xml-dev] Newbie question : accent, special chars,...
|
[
Lists Home |
Date Index |
Thread Index
]
You're right! the document is not well-formed. I've change the encoding to "UTF-8" and it seems to be well-formed now, but the xerces DOMparser still have trouble with it since I got the following error:
org.xml.sax.SAXParseException: Element type "Item" must be followed by either attribute specifications, ">" or "/>".
at org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1213)
at org.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(XMLDocumentScanner.java:579)
at org.apache.xerces.framework.XMLDocumentScanner.abortMarkup(XMLDocumentScanner.java:628)
at org.apache.xerces.framework.XMLDocumentScanner.scanElement(XMLDocumentScanner.java:1800)
at org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1182)
at org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)
at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1098)
Here is the corresponding code:
...
// File f exists
FileInputStream fis = new FileInputStream(f);
org.xml.sax.InputSource is = new org.xml.sax.InputSource(fis);
org.apache.xerces.parsers.DOMParser parser = new org.apache.xerces.parsers.DOMParser();
parser.parse(is);
...
Here is the xml in File f :
<?xml version="1.0" encoding="UTF-8" ?>
<Item description="voici quelques caractères accentués : é ï è à utilisés en français"/>
It seems to me that the problem is more with the DOM parser than with the xml file. Should I make some configuration on it to make it run correctly with UTF-8 ?
On 15 oct. 04, at 15:44, Liam Quin wrote:
On Fri, Oct 15, 2004 at 02:59:00PM +0200, Benoit Mangez wrote:
Here is the content of a non-valid xml file :
<?xml version="1.0" encoding="ISO-8859-1" ?>
<Item description="voici quelques caractères accentués : é ï è à
utilisés en français"/>
It's not valid because of the special chars inside attribute
"description".
XML uses two term -- well-formed and valid.
As long as you actually use ISO 8859-1 for those characters, the
document should be well-formed. It isn't valid because you don't
have a "DTD". But I'll assume you just want well formed.
You didn't include the exact error message, so I can only guess that in
fact your file is in UTF-8 and not ISO-8859-1, so changing the encoding
may solve your problem.
Liam
--
Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/
http://www.holoweb.net/~liam/
Benoit Mangez
___________________________________________
DENALI sa - http://www.denali.be
Château de Clerlande - 1340 Ottignies - Belgium
Tel +32 (0) 10 43 99 51 - Fax +32 (0) 10 43 99 52
___________________________________________
|
|
|
|
|