[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Types of tokens in XML instance documents?
- From: Roger L Costello <costello@mitre.org>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Fri, 12 Mar 2021 13:55:55 +0000
Hi Folks,
What are the types of tokens in XML documents? Leaving aside the DTD stuff, I think these are the types of tokens in XML documents:
1. The '<' character
2. The '>' character
3. Empty element terminator "/>"
4. End tag start </
5. Element name
6. Attribute name
7. The '=' character
8. Attribute value delimited by quote characters
9. Attribute value delimited by apostrophe characters
10. Text data
11. Processing instruction <?...?>
12. Entity &...;
13. Character decimal entity &#..;
14. Character hexadecimal entity &#x...;
15. CDATA section <![CDATA[ ... ]]>
16. Comment <!-- ... -->
Am I missing anything?
Here is an XML document that contains all 16 token types:
<aircraft type='F-16'>
<!-- Snapshot of an F-16 in flight -->
<altitude units="meters">10000</altitude>
<headingNorth/>
<?altimeter reading="30.1"?>
<description><![CDATA[bumpy ride due to turbulence]]></description>
<pilots>Johnson & Smith</pilots>
<footnote>Producer © aerodata </footnote>
<footnote>Editor © workshop</footnote>
</aircraft>
Here's how we might tokenize the XML document:
token type token
------------------------------------------------------------
< <
element name aircraft
attribute name type
= =
apostrophed attribute value 'F-16'
> >
comment <!-- Snapshot of an F-16 in flight -->
< <
element name altitude
attribute name units
quoted attribute value "meters"
text data 10000
start end tag </
element name altitude
> >
< <
element name headingNorth
empty element terminator />
processing instruction <?altimeter reading="30.1"?>
< <
element name description
> >
CDATA section <![CDATA[bumpy ride due to turbulence]]>
start end tag </
element name description
> >
< <
element name pilots
> >
text data Johnson
entity &
text data Smith
start end tag </
element name pilots
> >
< <
element name footnote
> >
text data Producer
character decimal entity ©
text data aerodata
start end tag </
element name footnote
> >
< <
element name footnote
> >
text data Editor
character hex entity ©
text data workshop
start end tag </
element name footnote
> >
Wow!
XML is a simple, beautiful language.
/Roger
P.S. For a fantastic description of parsing a language by breaking it up into token types and tokens, see page 125 of "The C Programming Language (second edition)" by Kernighan & Ritchie.
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]