XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Types of tokens in XML instance documents?

Hi Folks,

What are the types of tokens in XML documents? Leaving aside the DTD stuff, I think these are the types of tokens in XML documents:

1. The '<' character
2. The '>' character
3. Empty element terminator "/>"
4. End tag start </
5. Element name
6. Attribute name
7. The '=' character
8. Attribute value delimited by quote characters
9. Attribute value delimited by apostrophe characters
10. Text data
11. Processing instruction <?...?>
12. Entity &...;
13. Character decimal entity &#..; 
14. Character hexadecimal entity &#x...;
15. CDATA section <![CDATA[ ... ]]>
16. Comment <!-- ... -->

Am I missing anything?

Here is an XML document that contains all 16 token types:

<aircraft type='F-16'>
    <!-- Snapshot of an F-16 in flight -->
    <altitude units="meters">10000</altitude>
    <headingNorth/>
    <?altimeter reading="30.1"?>
    <description><![CDATA[bumpy ride due to turbulence]]></description>
    <pilots>Johnson &amp; Smith</pilots>
    <footnote>Producer &#169; aerodata </footnote>
    <footnote>Editor &#xA9; workshop</footnote>
</aircraft>

Here's how we might tokenize the XML document:

token type			token
------------------------------------------------------------
<				<
element name			aircraft
attribute name			type
=				=
apostrophed attribute value 	'F-16'	
>				>
comment			<!-- Snapshot of an F-16 in flight -->
<				<
element name			altitude
attribute name			units
quoted attribute value		"meters"
text data			10000
start end tag			</
element name			altitude
>				>
<				<
element name			headingNorth
empty element terminator	/>
processing instruction		<?altimeter reading="30.1"?>
<				<
element name			description
> 				>
CDATA section			<![CDATA[bumpy ride due to turbulence]]>
start end tag			</
element name			description
>				>
<				<
element name			pilots
> 				>
text data			Johnson
entity				&amp;
text data			Smith
start end tag			</
element name			pilots
>				>
<				<
element name			footnote
> 				>
text data			Producer
character decimal entity	&#169;
text data			aerodata
start end tag			</
element name			footnote
>				>
<				<
element name			footnote
> 				>
text data			Editor
character hex entity		&#xA9;
text data			workshop
start end tag			</
element name			footnote
>				>

Wow!

XML is a simple, beautiful language.

/Roger

P.S. For a fantastic description of parsing a language by breaking it up into token types and tokens, see page 125 of "The C Programming Language (second edition)" by Kernighan & Ritchie.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS