Re: [xml-dev] XML is text-only ... why?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: richard@inf.ed.ac.uk (Richard Tobin)
To: xml-dev@lists.xml.org
Date: Wed, 26 Sep 2007 13:31:28 +0100 (BST)

In article <B8415163A689094689542C617ECA036601E8B46D@IMCSRV5.MITRE.ORG> you write:

>Here is a simple XML document. It would appear that the value of the
><x> element is an integer:
>
>    <?xml version="1.0"?>
>    <x>23</x>
>
>However, that is not the case. The 23 represents two characters, 2 and
>3.

It *is* two characters.  It doesn't represent them!

>You can see that they are indeed characters by viewing the hex values
>of the XML document:
>
>http://www.xfront.com/hex-values-of-a-simple-XML-document.gif  
>
>In the graphic you see that the hex values of 23 are x32 and x33, which
>corresponds to the character 2, and the character 3.

I'm not sure that you should be emphasizing hex here.  Just as there
are no numbers in an XML document, there are no hex values.  What you
are examining is the unicode code points of the characters, displayed
in hex.

>How can the two characters 23 be multiplied by an integer 2?
>
>Answer: the XSLT processor first converts the two characters into an
>integer:
>
>    Convert these two hex values: 32 33 into this hex value: 17

It converts the sequence of unicode characters into a number.
Computers don't use hex.  In practice they use binary, but that's an
irrelevant low-level detail: what's happening here is the conversion
of characters into numbers.  Referring to the value of 23 in hex is
particularly confusing!

>    <element name="x" type="integer"/>
>
>This is not stating: 
>
>    "The value of the element x in an XML instance document is an
>integer." 
>
>Rather, it is stating: 
>
>    "The value of the element x in an XML instance document may be
>converted to an integer."

It's also stating that it must have the form allowed for integers.

>3. Do different platforms represent, say, integers differently?

Not in any way that's relevant here, I think.

>Is that why XML decided to be text-only?

It's text only because it's a human-readable markup language for text.

Digression:

This historical perspective on XML is often forgotten.  Many aspects
of XML are ones inherited from the idea of a markup language as taking
an existing text document and wrapping markup around it.  There was a
widespread assumption that you could rip out the markup and get "the
original".  This explains why, for example, HTML titles are content
while hypertext references are attributes: the titles are part of the
text, but the URLs aren't.  It also explains the default template rules
in XSLT.

The use of SGML and XML for arbitrary structured data is a later idea.

You might find the "Origin and Goals" section of the XML spec helpful
in this context.

  http://www.w3.org/TR/REC-xml/#sec-origin-goals

-- Richard
-- 
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.

References:
- XML is text-only ... why?
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]