Hi Folks,
What are the types of tokens in XML documents? Leaving aside the DTD stuff, I think these are the types of tokens in XML documents:
1. The '<' character
2. The '>' character
3. Empty element terminator "/>"
4. End tag start </
5. Element name
6. Attribute name
7. The '=' character
8. Attribute value delimited by quote characters
9. Attribute value delimited by apostrophe characters
10. Text data
11. Processing instruction <?...?>
12. Entity &...;
13. Character decimal entity &#..;
14. Character hexadecimal entity &#x...;
15. CDATA section <![CDATA[ ... ]]>
16. Comment <!-- ... -->
Am I missing anything?
Here is an XML document that contains all 16 token types:
<aircraft type='F-16'>
<!-- Snapshot of an F-16 in flight -->
<altitude units="meters">10000</altitude>
<headingNorth/>
<?altimeter reading="30.1"?>
<description><![CDATA[bumpy ride due to turbulence]]></description>
<pilots>Johnson & Smith</pilots>
<footnote>Producer © aerodata </footnote>
<footnote>Editor © workshop</footnote>
</aircraft>
Here's how we might tokenize the XML document:
token type token
------------------------------------------------------------
< <
element name aircraft
attribute name type
= =
apostrophed attribute value 'F-16'
>comment <!-- Snapshot of an F-16 in flight -->
< <
element name altitude
attribute name units
quoted attribute value "meters"
text data 10000
start end tag </
element name altitude
>< <
element name headingNorth
empty element terminator />
processing instruction <?altimeter reading="30.1"?>
< <
element name description
>CDATA section <![CDATA[bumpy ride due to turbulence]]>
start end tag </
element name description
>< <
element name pilots
>text data Johnson
entity &
text data Smith
start end tag </
element name pilots
>< <
element name footnote
>text data Producer
character decimal entity ©
text data aerodata
start end tag </
element name footnote
>< <
element name footnote
>text data Editor
character hex entity ©
text data workshop
start end tag </
element name footnote
>
Wow!
XML is a simple, beautiful language.
/Roger
P.S. For a fantastic description of parsing a language by breaking it up into token types and tokens, see page 125 of "The C Programming Language (second edition)" by Kernighan & Ritchie.
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.
[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php