[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] [ Revised ] 15 elementary truths about XML
- From: David Carlisle <davidc@nag.co.uk>
- To: "Costello, Roger L." <costello@mitre.org>
- Date: Tue, 01 Nov 2011 14:01:57 +0000
On 01/11/2011 13:40, Costello, Roger L. wrote:
> Hi Folks,
>
> Thank you John, Bjoern, Peter, Michael, Andrew, Michael, and Toby for your excellent feedback.
>
> Based on your feedback, I revised the statements. Do you agree with the current formulation of each statement? /Roger
>
> 1. An XML document is a sequence of zeros and ones called bits.
This need not be true at all.
>
> 2. A byte consists of 8 bits.
this is (as usually interpreted) true, but irrelevant to XML
>
> 3. Thus, the content an XML document is a sequence of bytes.
No, the XML spec says:
A parsed entity contains text, a sequence of characters, ...
character is an atomic unit of text as specified by ISO/IEC 10646:200
nowhere does it say that an xml document is a sequence of bytes.
>
> 4. Here is an example of a byte: 00110001
That is a binary number, but isn't an elementary truth about XML.
>
> 5. That byte may be interpreted in various ways by software applications. For example, it may be interpreted as:
>
> - corresponding to an integer in base two.
> In base 10 it represents the integer 49.
>
> - corresponding to a character.
> In the ASCII character encoding scheme it
> represents the character 1.
This is true but isn't an elementary truth about XML, just a
tangentially related fact.
>
> 6. XML processors always interpret the bytes in XML documents as characters.
Documents consist of characters not bytes, the xml processor may or may
not, depending of the encoding, treat the bytes in the encoding of an
entity as characters.
>
> 7. Thus, XML processors interpret the content of XML documents as a sequence of characters.
that is how the xml spec defines documents, as a sequence of entities
each of which is a sequence of characters, which is an atomic unit as
defined by Unicode and ISO/IEC 10646.
>
> 8. There are various character encoding schemes, such as ASCII and UTF-8. Some character encoding schemes require more than one byte to encode a character.
>
yes
> 9. An XML processor may identify the character encoding scheme used by an XML document either by its encoding attribute in the XML declaration or by some out-of-band means.
Yes or No, depending how you parse that. The document identifies to the
XML processor its encoding by the means you specify.
> 10. An XML processor is software that reads the bytes in an XML document and makes them available to XML applications.
It may read characters that are not composed of bytes.
An XML document consists of characters not bytes. Characters may be
encoded by whatever means. An XML processor must be able to decode at
least utf8 and utf16 (which do encode each character as a sequence of bytes)
> 11. An XML application is software that processes the output of an XML processor. Metaphorically, an XML application is a layer of software on top of an XML processor.
>
> 12. An XML Schema validator is an XML application.
>
> 13. XML applications may interpret the bytes in XML documents differently than how an XML processor interprets the bytes.
XML applications do not interpret the bytes at all as they are not
reported by the XML processor.
> 14. For example, consider the XML Schema that declares an element A with a Boolean data type:
>
> <element name="A" type="boolean" />
>
> Suppose the value of<A> is the byte 00110001.
> The element declaration informs the XML Schema validator
> and the XML Schema validator interprets the byte as the
> Boolean value "true."
that's an example of something, not an elementary truth (just commenting
on your numbering, not the actual example) However to comment on teh
example, the value (by which I assume you mean content) of an element is
never a byte, but a sequence of characters.
> 15. Thus, an XML processor interprets the byte 00110001 as representing the character 1 whereas an XML Schema validator interprets the same byte as representing the Boolean value "true."
>
No, the processor sees the unicode character U+0031 and may make of that
what it wishes.
David
________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.
________________________________________________________________________
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]