OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Proposal for embedding octet-streams in XML (was: XML DTD...binary data)

[ Lists Home | Date Index | Thread Index ]
  • From: John Cowan <cowan@locke.ccil.org>
  • To: ricko@allette.com.au
  • Date: Sat, 20 Jun 1998 01:34:46 -0400 (EDT)

Rick Jelliffe wrote:

> The most common notation to use is Base64. You can find base 64 specified in
> an RFC.
> You can make a more efficient encoding by using all the available
> characters. There are sevearal thousand, so you might want to invent your
> own Base4K  encoding, for example, if it was really a big problem.

I propose a compromise: what might be called Base-256 encoding.

To embed a stream of arbitrary octets into an XML document,
they should appear as the #PCDATA content of a suitable element.
Each octet from 0-255 is encoded using the Unicode Private
Zone characters U+F000-U+F0FF respectively.  These characters are conveniently
located in the middle of the Private Zone.

Using this convention causes the data to be expanded by 2:1 in a UCS-2
representation, by 3:1 in a UTF-8 representation, and by 7:1 in a
numeric-character-reference representation.  Therefore, it is suitable
only for relatively small amounts of octet data embedded in a basically
textual matrix.

John Cowan					cowan@ccil.org
		e'osai ko sarji la lojban.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS