XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] XML is text-only ... why?

 

> -----Original Message-----
> From: Jonathan Robie [mailto:jonathan.robie@redhat.com] 
> Sent: Wednesday, September 26, 2007 10:56
> To: Costello, Roger L.
> Cc: xml-dev@lists.xml.org
> Subject: Re: [xml-dev] XML is text-only ... why?
> 
> Here's a few thought experiments.
> 
> 1. What if you receive an XML document in which the markup 
> and the text are both in a language you do not understand, 
> perhaps using cuneiform characters. What advantage is there 
> to having the document be represented as text?
> 
> Answer: You can still see at least the hierarchical 
> structure. If someone gives you a query based on the 
> document, and you can match characters by hand, you can still 
> tell what the results of the query should be. You can write 
> tools for people who speak languages you don't understand and 
> still have a pretty good idea whether your tools are working.
> 
> 2. What if you receive a document in an encoding you can not 
> read. What advantage is there to having the document be 
> represented as text?
> 
> Answer: None, unless you can find some way to read or convert 
> the document encoding.
> 
> Note: The ability of XML to cleanly support various text 
> encodings really sets it apart from most other text formats. 
> In some applications, people stick plain text in an XML 
> wrapper just so applications know how to deal with character 
> representation across language.
> 
> Note: It's still XML, even if you can't read it.
> 
> 3. What if you receive a document in an encoding you can not 
> read, and that encoding happens to be very compact so that it 
> can be transmitted efficiently, and the encoding format also 
> supports indexes for the most common operations, and 
> annotations providing type information obtained by processing 
> the XML with a schema. Is it still XML?
> 
> Answer: To me, this is a useful way of thinking about binary XML. 
> Encoding a document in various ways and extending it with 
> annotations for processing have a long tradition in SGML and XML.
> 
> Note: Suppose you think of binary documents as an alternate 
> encoding that you need software to read, with great 
> performance characteristics. 
> What is lost? Primarily the archival value of being able to 
> read any language in an encoding and language you understand. 
> But if there were a standard binary encoding that had all the 
> above performance characteristics


There is a standard already.

Fast Infoset is compact enough and fast enough for many practical purposes.
It is an ISO/IEC standard and an ITU-T Recommendation.  It is designed and
specified around the XML infoset and therefore it fits very well the data
model of such standards as XPath, XQuery, and XML Schema.  For example, XML
Schema's validity assessment is specified in terms of an XML infoset being
validated, as opposed to an XML document being validated.

Fast Infoset addresses "typing" in a special way.  It does not try to encode
any "values" of any "types".  Instead, it provides a standard set of
encoding algorithms which a document creator can use to generate an
optimized representation of certain **character strings**.  The "data" being
encoded is still **characters**--it just happens that those characters are
represented as an integer or as a string of bytes in the fast infoset
document.  There is no dependency on a schema, even though the document
creator is free to exploit any a-priori knowledge of the documents
(including schemas) to decide what optimizations to use.  Then any consumer
will be able to read the document without having the same a-priori knowledge
the creator had.

There are several implementations available on different platforms and
languages, including the Sun-coordinated open source project.

Alessandro Triglia


> , and we really adopted one 
> such binary representation, perhaps it would be as useful for 
> archival purposes as zip files, another commonly used binary format?
> 
> Jonathan
> 
> Costello, Roger L. wrote:
> > Hi Folks,
> >
> > Below are a few notes I put together concerning the 
> text-only nature 
> > of XML. At the bottom of this message are a few questions 
> that I would 
> > very much appreciate your thoughts on. /Roger
> >
> > ------------------------------------
> > XML IS TEXT
> >
> > An XML document is comprised purely of text. That is, the 
> contents of 
> > an XML document is just a string of characters. There are 
> no integers 
> > in an XML document. There are no floating point values in an XML 
> > document. There are only characters.
> >
> > ------------------------------------
> > EXAMPLE
> >
> > Here is a simple XML document. It would appear that the 
> value of the 
> > <x> element is an integer:
> >
> >     <?xml version="1.0"?>
> >     <x>23</x>
> >
> > However, that is not the case. The 23 represents two 
> characters, 2 and 
> > 3.
> >
> > You can see that they are indeed characters by viewing the 
> hex values 
> > of the XML document:
> >
> > http://www.xfront.com/hex-values-of-a-simple-XML-document.gif
> >
> > In the graphic you see that the hex values of 23 are x32 and x33, 
> > which corresponds to the character 2, and the character 3.
> >
> > Compare with an integer value 23; it's binary value is 
> 00010111, which 
> > has a hex value 17.
> >
> > Thus, XML is just text. And you use a "text" editor to 
> create an XML 
> > document.
> >
> > ------------------------------------
> > MANIPULATING XML
> >
> > Consider manipulating an XML document using XSLT. Here is shown an 
> > XSLT statement which multiplies the value of the <x> element by the 
> > number
> > 2:
> >
> >     <xsl:value-of select="x * 2"/>
> >
> > How can the two characters 23 be multiplied by an integer 2?
> >
> > Answer: the XSLT processor first converts the two characters into an
> > integer:
> >
> >     Convert these two hex values: 32 33 into this hex value: 17
> >
> > After doing the conversion then the XSLT processor performs the 
> > multiplication.
> >
> > ------------------------------------
> > DECLARING AN ELEMENT'S DATATYPE IN A SCHEMA
> >
> > Consider an XML Schema that declares the element <x> as an integer:
> >
> >     <element name="x" type="integer"/>
> >
> > This is not stating: 
> >
> >     "The value of the element x in an XML instance document is an 
> > integer."
> >
> > Rather, it is stating: 
> >
> >     "The value of the element x in an XML instance document may be 
> > converted to an integer."
> >
> > ------------------------------------
> > QUESTIONS
> >
> > 1. Is the above accurate?
> >
> > 2. Is there such a thing as a document which contains both text and 
> > integers?
> >
> > 3. Do different platforms represent, say, integers differently? Is 
> > that why XML decided to be text-only?
> >
> > ------------------------------------
> > Note: I found a very nice editor that enables you to view the hex 
> > version of a text file: PSPad at 
> > http://www.snapfiles.com/get/spad.html
> >
> >
> > 
> ______________________________________________________________________
> > _
> >
> > XML-DEV is a publicly archived, unmoderated list hosted by OASIS to 
> > support XML implementation and development. To minimize spam in the 
> > archives, you must subscribe before posting.
> >
> > [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> > Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> > subscribe: xml-dev-subscribe@lists.xml.org List archive: 
> > http://lists.xml.org/archives/xml-dev/
> > List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
> >
> >
> >   
> 
> 
> ______________________________________________________________
> _________
> 
> XML-DEV is a publicly archived, unmoderated list hosted by 
> OASIS to support XML implementation and development. To 
> minimize spam in the archives, you must subscribe before posting.
> 
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org List archive: 
> http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
> 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS