[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] [ Revision #2 ] 15 elementary truths about XML
- From: Jens Østergaard Petersen <oesterg@gmail.com>
- To: "Costello, Roger L." <costello@mitre.org>
- Date: Thu, 3 Nov 2011 19:44:17 +0100
On Nov 2, 2011, at 4:19 PM, Costello, Roger L. wrote:
> Hi Folks,
>
> Thanks again for the outstanding feedback.
>
> Based on yesterday's feedback I revised the statements. Please let me know of any errors. /Roger
>
> PREAMBLE
>
> Inside computers there are no strings, Booleans, integers, or URLs. There's only sequences of zeros and ones called bits. A byte consists of 8 bits.
One could argue with this point, holding to the position of old that "byte" is "the smallest unit of data that can be read from or written to at one time in a computer system" (<http://www.tcpipguide.com/free/t_BinaryInformationandRepresentationBitsBytesNibbles-3.htm>) and that these units of data can be of variable length, not only historically, but also in different (Unicode) encodings and different UTF-8 planes.
I think it is more consistent to say that a byte of (run-of-the-mill) English is one octet long, whereas a byte of e.g. Chinese is several octets long: when you pass a Chinese character to an application, you pass it "the smallest unit of data that can be read from or written to at one time in a computer system," only in Chinese - you don't pass it two "bites" of information in Chinese.
On this interpretation, a byte represents (in the present day) a Unicode code point. However, this is a losing battle - even Unicode.org has given up this distinction <http://unicode.org/glossary/#octet>.
Jens
> So inside computers are sequences of bytes.
>
> Software applications may be written to read the bytes inside a computer. Here is an example of a byte: 00110001. Different software applications may interpret that byte in different ways. For example, an application may interpret it as:
>
> - corresponding to an integer in base two.
> In base 10 it represents the integer 49.
Does this not presuppose a prior encoding of "00110001" as the ASCII character "1" - which is then reinterpreted as the binary character "1"?
> - corresponding to a character.
> In the ASCII character encoding scheme it
> represents the character 1.
>
> Note that there are various character encoding schemes such as ASCII and UTF-8. Some character encoding schemes require more than one byte to encode a character.
>
> When a text editor reads a sequence of bytes it always interprets them as characters. Conversely, when a text editor writes characters it writes them encoded to a character encoding scheme. Some text editors can be configured to a particular character encoding scheme.
>
>
> ELEMENTARY TRUTHS ABOUT XML
>
> 1. An XML document is a sequence of characters.
>
> 2. As noted above there are no characters in a computer, only bytes. Thus, "An XML document is a sequence of characters" actually means that an XML document is an abstraction of the underlying sequence of bytes.
>
> 3. An XML processor is software that reads and processes the characters in an XML document. Colloquially, XML processors are known as XML parsers.
>
> 4. There exists software that can read bytes, interpret them as characters, and output the character abstraction. The software that does the bytes-to-character-abstraction is programming language specific. An XML processor uses this programming language specific software to read the sequence of characters. Metaphorically, an XML processor is a layer of software on top of programming language specific software.
>
> 5. An XML processor processes the characters in XML documents and makes the results available to XML applications.
>
> 6. An XML application is software that processes the output of an XML processor. Metaphorically, an XML application is a layer of software on top of an XML processor.
>
> 7. An XML Schema validator is an XML application.
>
> 8. XML applications may interpret the characters in XML documents as other than characters.
>
> 9. For example, consider the XML Schema that declares an element A with a Boolean data type:
>
> <element name="A" type="boolean" />
>
> Suppose the content of <A> is 1.
> The element declaration informs the XML Schema validator
> and the XML Schema validator interprets the 1 as the
> Boolean value "true."
>
> 10. Thus, an XML processor interprets the 1 as representing the character 1 whereas an XML Schema validator interprets the same character as representing the Boolean value "true."
>
>
> EPILOGUE
>
> An XML document is a character abstraction of the sequence of bytes that actually exists inside the computer. Programming language specific software is used to read the sequence of bytes and generate a character abstraction of them (i.e., generate characters). An XML processor reads the characters, processes them, and makes available the results to XML applications. XML applications may interpret the characters as strings, Booleans, integers, or URLs.
>
> This is the layering and processing:
>
> a. In the computer is a sequence of bytes
>
> b. Programming language specific software interprets the bytes as characters and output a sequence of characters
>
> c. An XML processor reads the sequence of characters, processes them, and outputs the results
>
> d. An XML application reads the XML processor's output and interprets the characters as strings, Booleans, integers, and URLs.
>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]