Lists Home |
Date Index |
When an XML document is read, a transcoder converts it from a
sequence of bytes to a sequence of characters (in the particular
encoding conventions used by the receiving system).
In order to embed binary data, you would have to transcode the
incoming data character-by-character and parse it token-by-token
to know which mode to read the contents of the next element.
However, it was exactly this kind of modal parsing that XML
is a rejection of: in SGML you have a lot of parsing modes,
depending on markup or on DTDs. Also, it is very inefficient
to transcode character-by-character.
Why not, then, transcode the document but keep the original bytes
and then use them? Because a transcoder is supposed to fail
if an incorrect byte sequence is found, and binary data could
easily contain them. So we cannot really use this method,
unless we use some exception system that skips passed
bad encoded sections, but this seems pretty complicated.
Another reason is that we want to be able to open an XML file
in a suitable text editor, alter it, then save it. Binary data would
probably be corrupted.
Another reason is that arbitrary binary data contains 0x00.
If the data is being read into C's char type and manipulated
as a string, for example, this will cause a problem.
If we changed XML so that
- it was not a textual format
- it used only the ASCII encoding
- it does not have to allow implementations by null-terminated strings
then binary sections could be included. But they are quite big changes:
removing the ML from XML!
----- Original Message -----
From: "Linda Grimaldi" <firstname.lastname@example.org>
Sent: Wednesday, March 26, 2003 1:46 PM
Subject: [xml-dev] Naive question about binary encodings
I'm sure this is really na´ve of me, but I have to admit that I don't
understand why binary data cannot be sent within an xml document by
providing appropriately namespace-controlled attributes that identify an
element type as binary and specify its length. For example:
<foo xml:mytype="binary" xml:binarylength="1776">
or some such thing, followed by 1776 octets of whatever one likes. It's
kind of like MIME in some respects- as long as its type and length are
identified, you can do whatever you like with it.
It's so obvious that I am sure it has been rejected already, but, being
a mere implementor, I don't quite understand why.
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>
The list archives are at http://lists.xml.org/archives/xml-dev/
To subscribe or unsubscribe from this list use the subscription