OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Newbie question : accent, special chars,...

[ Lists Home | Date Index | Thread Index ]

You're right! the document is not well-formed. I've change the encoding to "UTF-8" and it seems to be well-formed now, but the xerces DOMparser still have trouble with it since I got the following error:

org.xml.sax.SAXParseException: Element type "Item" must be followed by either attribute specifications, ">" or "/>".
at org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1213)
at org.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(XMLDocumentScanner.java:579)
at org.apache.xerces.framework.XMLDocumentScanner.abortMarkup(XMLDocumentScanner.java:628)
at org.apache.xerces.framework.XMLDocumentScanner.scanElement(XMLDocumentScanner.java:1800)
at org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1182)
at org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)
at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1098)

Here is the corresponding code:

// File f exists
FileInputStream fis = new FileInputStream(f);
org.xml.sax.InputSource is = new org.xml.sax.InputSource(fis);
org.apache.xerces.parsers.DOMParser parser = new org.apache.xerces.parsers.DOMParser();

Here is the xml in File f :

<?xml version="1.0" encoding="UTF-8" ?>
<Item description="voici quelques caractères accentués : é ï è à utilisés en français"/>

It seems to me that the problem is more with the DOM parser than with the xml file. Should I make some configuration on it to make it run correctly with UTF-8 ?

On 15 oct. 04, at 15:44, Liam Quin wrote:

On Fri, Oct 15, 2004 at 02:59:00PM +0200, Benoit Mangez wrote:
Here is the content of a non-valid xml file :

<?xml version="1.0" encoding="ISO-8859-1" ?>
<Item description="voici quelques caractères accentués : é ï è à
utilisés en français"/>

It's not valid because of the special chars inside attribute

XML uses two term -- well-formed and valid.
As long as you actually use ISO 8859-1 for those characters, the
document should be well-formed. It isn't valid because you don't
have a "DTD". But I'll assume you just want well formed.

You didn't include the exact error message, so I can only guess that in
fact your file is in UTF-8 and not ISO-8859-1, so changing the encoding
may solve your problem.


Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/

Benoit Mangez


- http://www.denali.be
Château de Clerlande - 1340 Ottignies - Belgium
Tel +32 (0) 10 43 99 51 - Fax +32 (0) 10 43 99 52


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS