OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] [ Revised ] 15 elementary truths about XML

On 01/11/2011 13:40, Costello, Roger L. wrote:
> Hi Folks,
> Thank you John, Bjoern, Peter, Michael, Andrew, Michael, and Toby for your excellent feedback.
> Based on your feedback, I revised the statements.  Do you agree with the current formulation of each statement?  /Roger
> 1. An XML document is a sequence of zeros and ones called bits.

This need not be true at all.
> 2. A byte consists of 8 bits.
this is (as usually interpreted) true, but irrelevant to XML
> 3. Thus, the content an XML document is a sequence of bytes.
No, the XML spec says:
A parsed entity contains text, a sequence of characters, ...
character is an atomic unit of text as specified by ISO/IEC 10646:200
nowhere does it say that an xml document is a sequence of bytes.

> 4. Here is an example of a byte: 00110001

That is a binary number, but isn't an elementary truth about XML.

> 5. That byte may be interpreted in various ways by software applications. For example, it may be interpreted as:
>      - corresponding to an integer in base two.
>        In base 10 it represents the integer 49.
>      - corresponding to a character.
>        In the ASCII character encoding scheme it
>       represents the character 1.

This is true but isn't an elementary truth about XML, just a 
tangentially related fact.
> 6. XML processors always interpret the bytes in XML documents as characters.

Documents consist of characters not bytes, the xml processor may or may 
not, depending of the encoding, treat the bytes in the encoding of an 
entity as characters.
> 7. Thus, XML processors interpret the content of XML documents as a sequence of characters.
that is how the xml spec defines documents, as a sequence of entities 
each of which is a sequence of characters, which is an atomic unit as 
defined by Unicode and ISO/IEC 10646.
> 8. There are various character encoding schemes, such as ASCII and UTF-8. Some character encoding schemes require more than one byte to encode a character.
> 9. An XML processor may identify the character encoding scheme used by an XML document either by its encoding attribute in the XML declaration or by some out-of-band means.

Yes or No, depending how you parse that. The document identifies to the 
XML processor its encoding by the means you specify.

> 10. An XML processor is software that reads the bytes in an XML document and makes them available to XML applications.
It may read characters that are not composed of bytes.

An XML document consists of characters not bytes. Characters may be 
encoded by whatever means. An XML processor must be able to decode at 
least utf8 and utf16 (which do encode each character as a sequence of bytes)

> 11. An XML application is software that processes the output of an XML processor. Metaphorically, an XML application is a layer of software on top of an XML processor.
> 12. An XML Schema validator is an XML application.
> 13. XML applications may interpret the bytes in XML documents differently than how an XML processor interprets the bytes.

XML applications do not interpret the bytes at all as they are not 
reported by the XML processor.

> 14. For example, consider the XML Schema that declares an element A with a Boolean data type:
>      <element name="A" type="boolean" />
>      Suppose the value of<A>  is the byte 00110001.
>      The element declaration informs the XML Schema validator
>      and the XML Schema validator interprets the byte as the
>      Boolean value "true."
that's an example of something, not an elementary truth (just commenting 
on your numbering, not the actual example) However to comment on teh 
example, the value (by which I assume you mean content) of an element is 
never a byte, but a sequence of characters.

> 15. Thus, an XML processor interprets the byte 00110001 as representing the character 1 whereas an XML Schema validator interprets the same byte  as representing the Boolean value "true."
No, the processor sees the unicode character U+0031 and may make of that 
what it wishes.


The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS