Hi Folks,
You are familiar with URLs such as this:
That URL uses the http scheme.
There are other schemes in addition to the http scheme. One such scheme is the
data URI scheme. Let’s look at it. But first, let’s motivate its use.
Motivation for the Data URI Scheme
Oftentimes a web page wants to display an image. One way to accomplish this is to provide, in the HTML, the URL to the image file. Like this:
<h2>Link to an image</h2>
<img alt="image"
When that HTML is dropped into a browser, the browser displays this:
A disadvantage of that approach (linking to an image file) is the browser must download the web page (the HTML file) and then download the image file. Two sequential fetches. That’s expensive.
Alternatively, you can use a data URL in your web page. The data URL contains the image, inlined as base64-encoded text. I used this
online tool to encode the image to base64 and then replaced the link with the base64 text of the image. Here’s what the HTML looks like now, using a data URL:
<h2>Inline the image
using a data URL</h2>
<img alt="image"
src=""></img>
When that HTML is dropped into a browser, the browser displays the same image as before:
Although the base64 text increases the size of the HTML file, the browser needs only do one fetch. That can be more efficient.
Disadvantage of Data URLs
A downside of data URLs is that it can circumvent certain detection methods and filtering. Consider this: a company wants to prohibit images from certain web sites. The company has a blacklist of links to those image files. Whenever the
company’s firewall sees a web page containing one of those links, the firewall removes the link, thereby preventing the image from being downloaded. With a data URL, however, the image is smuggled in by the web page without notice because the data URL does
not identify the location of the image. Data URLs in XML Documents
A data URL is a URL. XML schema has a datatype for URL values – the xs:anyURI datatype. Can an XML element that is declared to be of type xs:anyURI hold a data URL? Let’s see.
I created an XML schema and declared an element to be of type xs:anyURI:
<xs:element name="Image"
type="xs:anyURI"
/>
The value of the <Image> element in the following XML instance is an ordinary URL:
<Image>https://upload.wikimedia.org/wikipedia/commons/3/38/JPEG_example_JPG_RIP_001.jpg</Image>
That instance is schema-valid. Next, I replaced that ordinary URL with a data URL. For brevity, I elided most of the data URL:
<Image>
… //9k=</Image>
That also validated.
Conclusion
An xs:anyURI value can be a data URL. An xs:anyURI value can either link to external data or it can inline the data using a data URL.
A data URL in an XML instance document exposes the recipient of the XML to the risks described above. Namely, link filters can be circumvented and undesirable data can be smuggled in without notice.
To prevent an xs:anyURI value from containing a data URL, use the xs:pattern facet to constrain the value.
Comments?
/Roger |