[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] External subset processing by browsers
- From: "Andrew Welch" <andrew.j.welch@gmail.com>
- To: elharo@metalab.unc.edu
- Date: Mon, 8 Dec 2008 11:10:53 +0000
Hi Elliotte,
2008/12/5 Elliotte Rusty Harold <elharo@metalab.unc.edu>:
> Firefox. There are two separate issues here:
>
> 1. Whether Firefox should read the external DTD subset.
> 2. How it should treat unrecognized entities when it doesn't read the
> external subset.
>
> Let me check the spec, but my recollection is that if the external DTD
> subset is not read, unrecognized entities are not a fatal error.
I have a similar issue, for example there are some RSS feeds which
contain entity references but no doctype:
<foo>foo € bar</foo>
I was trying the handle them by supplying a LexicalHandler (to trap
and convert them to numeric refs), and setting a few Xerces features,
but it always throws an exception for it before the startEntity event.
Sample code (using Xerces 2.9.0):
public class Test extends XMLFilterImpl implements LexicalHandler {
public static void main(String... args) throws Exception {
new Test();
}
public Test() throws Exception {
String xml = "<foo>foo € bar</foo>";
XMLReader xmlReader =
XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
xmlReader.setProperty("http://xml.org/sax/properties/lexical-handler",
this);
xmlReader.setFeature("http://apache.org/xml/features/scanner/notify-char-refs",
true);
xmlReader.setFeature("http://apache.org/xml/features/validation/unparsed-entity-checking",
false);
xmlReader.setFeature("http://xml.org/sax/features/external-parameter-entities",
false);
xmlReader.setEntityResolver(this);
xmlReader.parse(new InputSource(new StringReader(xml)));
}
@Override
public void startDocument() throws SAXException {
super.startDocument();
}
public void startEntity(String name) throws SAXException {
System.out.println("Start ent: " + name);
}
public void endEntity(String name) throws SAXException { }
public void startCDATA() throws SAXException { }
public void endCDATA() throws SAXException { }
public void startDTD(String name, String publicId, String
systemId) throws SAXException { }
public void endDTD() throws SAXException { }
public void comment(char[] ch, int start, int length) throws
SAXException { }
}
The output when running this is:
[Fatal Error] :1:16: The entity "euro" was referenced, but not declared.
Exception in thread "main" org.xml.sax.SAXParseException: The entity
"euro" was referenced, but not declared.
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at Test.<init>(Test.java:37)
It would be really nice to handle this non-well-formed input using XML
tools without resorting to a regex replace across every feed... I'm
not sure it's possible but the features make it seem like it should be
- any ideas?
thanks
--
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]