OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: CDATA Section

[ Lists Home | Date Index | Thread Index ]
  • From: "Gordon, Simon" <Simon.Gordon@swi.galileo.com>
  • To: "'John Evdemon'" <JohnE01@xmls.com>, xml-dev@ic.ac.uk
  • Date: Thu, 28 Oct 1999 15:02:02 -0600

Any ideas? Well, sort of...

We have two or perhaps three problems when trying to transfer binary data
within XML elements;
1) Transfer should not break XML's character set restrictions
2) Transfer should be economic of bandwidth
3) The encoding/decoding should be cognisant of the character encoding in
use

1 is a given; it's why this discussion got started. 2 and 3 are less so and
more of a desire on my part to see an elegant and economic solution that
lies within XML itself. For instance, BASE64 is _OK_ when encoding binary
data into 8-bit characters but what happens when we have 16-bit Unicode
characters? The (in)efficiency goes from 133% to 266%. If the encoder knew
the character set in use, it could send data more efficiently without
relying on a compression algorithm. In fact, XML declares only 29 of the 256
8-bit characters to be illegal so we could pass the 227 legal ones straight
through and only escape the remaining 29. Under Unicode we have 63457 legal
and 2079 illegal characters. Burying this inside of the XML parser (which
does have full knowledge of the character encoding) would make it invisible
to end-users. Attributes in the XML namespace could control the
encode/decode behavior. In fact, this corresponds with the concept outlined
in the latest XML Schemas specification (Part 2, Section 3.2.9) where they
talk about an encoding facet**.

Assuming a totally random input stream, the inefficiencies drop to 111% for
8-bit characters and only 109% for 16-bit characters, ignoring any overhead
in Controlling XML attributes and assuming a 1 : 2 expansion ratio for
illegal characters. With this sort of overhead, it now makes economic sense
to compress large files, then apply this sort of encoding without worrying
too much about the gain of the compression being lost in the encoding. Seen
XMLZIP?

I'd put in more but I don't want to post too much in one go and besides, I'm
sure this sort of thing must've been proposed before (but then, why are we
still stuck with using base64 or &#xx; to send binary data?)

** Couldn't this be the URL of a translation service? Just send it the
stream to en/decode or request the applet - A Web-centric application as
proposed by Tim O'Reilly recently?
Regards,

Simon Gordon
Systems Engineer,
Systems Integration,
Galileo International, Denver, USA.

-----Original Message-----
From: John Evdemon [mailto:JohnE01@xmls.com]
Sent: Wednesday, October 27, 1999 11:15
To: Gordon, Simon; xml-dev@ic.ac.uk
Subject: RE: CDATA Section


I've run into a similar issue with CDATA, although we were transferring
mainframe reports within XML, not binary data.  The suggested workaround was
base64 -- I would love to see something more elegant.  Any ideas?

John Evdemon
Architect
XML Solutions
http://www.xmls.com

-----Original Message-----
From: owner-xml-dev@ic.ac.uk [mailto:owner-xml-dev@ic.ac.uk]On Behalf Of
Gordon, Simon
Sent: Wednesday, October 27, 1999 11:48 AM
To: xml-dev@ic.ac.uk
Subject: RE: CDATA Section


Thanks for the info. I guess I was trying to mix the ATTLIST and ELEMENT
syntax. The !ATTLIST  declaration allows CDATA but then doesn't seem to use
it in the same way as when you specify <!ELEMENT x (#PCDATA)> then use
<![CDATA[...]]> in the XML. Most confusing.

Now for the next question; binary data and CDATA sections. According to Tim
Bray's annotated XML spec., I can use a CDATA section to send binary data
yet I can't get it to work. [The annotation link is the last in the first
paragraph of section 2.7 CDATA Sections].

This (IMHO) seems to contradict the XML Spec. where it defines the CDATA
data to consist of chars which is further defined as consisting of TAB, CR,
LF and 0x20-..etc. That is, not all possible binary values. I've checked
this using the RXP parser (good, very good) and it does reject any values
outside the defined ranges. Big disappointment.

Now I'll have to look at using base64 to exchange binary data with our
vendors; we'll all have to implement an encoding/decoding scheme, probably
involving attributes and NOTATIONs and a load more work. Unless anyone has a
better idea?


Regards,

Simon Gordon

-----Original Message-----
From: John Cowan [mailto:cowan@locke.ccil.org]
Sent: Wednesday, October 20, 1999 13:23
To: Simon.Gordon@swi.galileo.com
Cc: xml-dev@ic.ac.uk
Subject: Re: CDATA Section


Gordon, Simon scripsit:

> > <?xml version='1.0'?>
> > <!DOCTYPE TEST [
> > <!ELEMENT TEST (CDATA)>

The problem is with this line, which says that TEST elements must contain
a single *element* whose name is "CDATA".  This has nothing to do with
CDATA sections.

> > Warning: CDATA section not allowed here
> >  in unnamed entity at line 5 char 16 of file:test.xml

Quite right; your document is invalid because your TEST element
contains character data instead of an element named CDATA.

> > PS. Just tried <!ELEMENT TEST {#PCDATA)> and removing the DOCTYPE
section
> > altogether. The former passes the validation test and the latter passes
> > the well-formedness test (rxp -xs test.xml)!

And rightly so.  You cannot *compel* the content of an element to be
a CDATA section or not by using DTD-based validation.  Elements declared
with #PCDATA content may express that content with or without
CDATA sections.

--
John Cowan                                   cowan@ccil.org
       I am a member of a civilization. --David Brin

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN
981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS