OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] XML is text-only ... why?

Here's a few thought experiments.

1. What if you receive an XML document in which the markup and the text 
are both in a language you do not understand, perhaps using cuneiform 
characters. What advantage is there to having the document be 
represented as text?

Answer: You can still see at least the hierarchical structure. If 
someone gives you a query based on the document, and you can match 
characters by hand, you can still tell what the results of the query 
should be. You can write tools for people who speak languages you don't 
understand and still have a pretty good idea whether your tools are working.

2. What if you receive a document in an encoding you can not read. What 
advantage is there to having the document be represented as text?

Answer: None, unless you can find some way to read or convert the 
document encoding.

Note: The ability of XML to cleanly support various text encodings 
really sets it apart from most other text formats. In some applications, 
people stick plain text in an XML wrapper just so applications know how 
to deal with character representation across language.

Note: It's still XML, even if you can't read it.

3. What if you receive a document in an encoding you can not read, and 
that encoding happens to be very compact so that it can be transmitted 
efficiently, and the encoding format also supports indexes for the most 
common operations, and annotations providing type information obtained 
by processing the XML with a schema. Is it still XML?

Answer: To me, this is a useful way of thinking about binary XML. 
Encoding a document in various ways and extending it with annotations 
for processing have a long tradition in SGML and XML.

Note: Suppose you think of binary documents as an alternate encoding 
that you need software to read, with great performance characteristics. 
What is lost? Primarily the archival value of being able to read any 
language in an encoding and language you understand. But if there were a 
standard binary encoding that had all the above performance 
characteristics, and we really adopted one such binary representation, 
perhaps it would be as useful for archival purposes as zip files, 
another commonly used binary format?


Costello, Roger L. wrote:
> Hi Folks,
> Below are a few notes I put together concerning the text-only nature of
> XML. At the bottom of this message are a few questions that I would
> very much appreciate your thoughts on. /Roger
> ------------------------------------
> An XML document is comprised purely of text. That is, the contents of
> an XML document is just a string of characters. There are no integers
> in an XML document. There are no floating point values in an XML
> document. There are only characters.
> ------------------------------------
> Here is a simple XML document. It would appear that the value of the
> <x> element is an integer:
>     <?xml version="1.0"?>
>     <x>23</x>
> However, that is not the case. The 23 represents two characters, 2 and
> 3.
> You can see that they are indeed characters by viewing the hex values
> of the XML document:
> http://www.xfront.com/hex-values-of-a-simple-XML-document.gif  
> In the graphic you see that the hex values of 23 are x32 and x33, which
> corresponds to the character 2, and the character 3.
> Compare with an integer value 23; it's binary value is 00010111, which
> has a hex value 17. 
> Thus, XML is just text. And you use a "text" editor to create an XML
> document.
> ------------------------------------
> Consider manipulating an XML document using XSLT. Here is shown an XSLT
> statement which multiplies the value of the <x> element by the number
> 2:
>     <xsl:value-of select="x * 2"/>
> How can the two characters 23 be multiplied by an integer 2?
> Answer: the XSLT processor first converts the two characters into an
> integer:
>     Convert these two hex values: 32 33 into this hex value: 17
> After doing the conversion then the XSLT processor performs the
> multiplication.
> ------------------------------------
> Consider an XML Schema that declares the element <x> as an integer:
>     <element name="x" type="integer"/>
> This is not stating: 
>     "The value of the element x in an XML instance document is an
> integer." 
> Rather, it is stating: 
>     "The value of the element x in an XML instance document may be
> converted to an integer."
> ------------------------------------
> 1. Is the above accurate?
> 2. Is there such a thing as a document which contains both text and
> integers?
> 3. Do different platforms represent, say, integers differently? Is that
> why XML decided to be text-only?
> ------------------------------------
> Note: I found a very nice editor that enables you to view the hex
> version of a text file: PSPad at http://www.snapfiles.com/get/spad.html
> _______________________________________________________________________
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS