OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] BASE64 (was Re: CDATA)

[ Lists Home | Date Index | Thread Index ]

> -----Original Message-----
> From: Seairth Jacobs [mailto:seairth@seairth.com] 
> Sent: Wednesday, April 02, 2003 13:49
> To: xml-dev
> Subject: [xml-dev] BASE64 (was Re: CDATA)
> I still don't see why a <![BASE64[ ]]> isn't added.
> 1) Nothing needs escaping.
> 2) The encoded form falls neatly into all content encoding 
> forms (I think), so parsers don't have to switch between 
> "character" and "octet" hats.
> 3) When someone asks "how do I handle binary?", the answer 
> would be a flat "<![BASE64[ ]]>" instead of "Well, can do 
> this... or this... or this... and you are responsible to all 
> encoding/decoding".  I suspect much less grumbling will occur.
> 4) For anyone arguing that it causes bloat: why are you using 
> XML in the first place then?
> 5) It's a clean, simple, and well-used technique.
> 6) It's about as 80/20 a solution as I can think of.
> So why not add it?

I think this **might** work, but...

It would require an important conceptual change in the Infoset
specification, because none of the current children of the element
information item is able to hold a binary octet.  Should a new
information item, as a new kind of child, be introduced?  Probably a
single octet, or an octet string.  

The character information item cannot be used because it holds an ISO
10646 character, not a binary octet.  Even with the XML 1.1 provision
allowing control characters in XML (as numeric character references),
control characters are still ISO 10646 characters, not binary octets -
and the NUL character is not permitted anyway.

Moreover, if an XML document is converted to an infoset and then back to
XML, we will want those BASE64 sections back, we won't want them changed
into a series of meaningless characters or numeric character references.
Also, we don't want a sequence of characters or numeric character
references to be changed into a BASE64 section.  These seem good
arguments for introducing a distinct information item, if a BASE64
section is introduced in XML.

What about schema languages?  They don't know anything about octet
information items.  How would such data be described/constrained?
Intuitively, an  xsd:base64binary or an  xsd:hexBinary  should validate
an element that contains a BASE64 section and nothing else (or should
they?).  But clearly, in this case the datatype is not constraining the
child characters of the element information item, but its child binary
octets.  So there are conceptual changes here also.

There may be a lot more of complications down the road, in these or
other areas.

Alessandro Triglia

> ---
> Seairth Jacobs
> seairth@seairth.com
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org 
> <http://www.xml.org>, an initiative of OASIS 

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS