OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: Proposal for embedding octet-streams in XML (was: XML DTD...binary d

[ Lists Home | Date Index | Thread Index ]
  • From: Chris Olds <colds@nwlink.com>
  • To: John Cowan <cowan@locke.ccil.org>, XML-Dev <xml-dev@ic.ac.uk>
  • Date: Fri, 19 Jun 1998 16:43:00 -0700

I couldn't let this go...

John Cowan wrote:
> 
> Rick Jelliffe wrote:
> 
> > The most common notation to use is Base64. You can find base 64 specified
> > in an RFC.

http://www.faqs.org/rfcs/rfc1341.html

> > You can make a more efficient encoding by using all the available
> > characters. There are sevearal thousand, so you might want to invent your
> > own Base4K  encoding, for example, if it was really a big problem.

Base64 is designed to use characters that don't change depending on which page
of ISO 646 is used, and are represented consistently in all versions of EBCDIC
as well.  If you use more than 6 bits, you lose some of these properties.

> I propose a compromise: what might be called Base-256 encoding.

[details of the encoding snipped]

> Using this convention causes the data to be expanded by 2:1 in a UCS-2
> representation, by 3:1 in a UTF-8 representation, and by 7:1 in a
> numeric-character-reference representation.  Therefore, it is suitable
> only for relatively small amounts of octet data embedded in a basically
> textual matrix.

Umm....  This is only makes sense in a UCS-2 document.
Since Base-64 expands 3 bytes into 4 (ASCII) characters, encoding such a
document in UCS-2 would effectively expand 3 bytes into 8 bytes.  In that case,
the 2:1 penalty for shifting the byte into the UCS-2 private use zone is less
than encoding Base64 in UCS-2.
However, since all of the Base64 characters are 7-bit ASCII characters, the
Base64 overhead of 4:3 is much less than the UTF-8 representation of U+F0xx, and
even the UCS-2 representation of Base64 encoding (at 8:3) in smaller then the
7:1 a (UTF-8 or ISO Latin) character entity requires.

	/cco

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS