XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Types of tokens in XML instance documents?

I find this use of the term 'token' ambiguous in this context, as it is liable to be confused with xs:token data types - I think it would be more helpful to borrow the terminology of symbols from e.g. EBNF grammar definitions.

Then you could use the terminal symbols from an existing grammar (e.g. https://www.liquid-technologies.com/XML/EBNF1.1.aspx ) to cross-check your list (you have missed out the XML declaration).


T

_________________
Tomos Hillman
eXpertML Ltd
+44 7793 242058
On 12 Mar 2021, 13:56 +0000, Roger L Costello <costello@mitre.org>, wrote:
Hi Folks,

What are the types of tokens in XML documents? Leaving aside the DTD stuff, I think these are the types of tokens in XML documents:

1. The '<' character
2. The '>' character
3. Empty element terminator "/>"
4. End tag start </
5. Element name
6. Attribute name
7. The '=' character
8. Attribute value delimited by quote characters
9. Attribute value delimited by apostrophe characters
10. Text data
11. Processing instruction <?...?>
12. Entity &...;
13. Character decimal entity &#..;
14. Character hexadecimal entity &#x...;
15. CDATA section <![CDATA[ ... ]]>
16. Comment <!-- ... -->

Am I missing anything?

Here is an XML document that contains all 16 token types:

<aircraft type='F-16'>
<!-- Snapshot of an F-16 in flight -->
<altitude units="meters">10000</altitude>
<headingNorth/>
<?altimeter reading="30.1"?>
<description><![CDATA[bumpy ride due to turbulence]]></description>
<pilots>Johnson &amp; Smith</pilots>
<footnote>Producer &#169; aerodata </footnote>
<footnote>Editor &#xA9; workshop</footnote>
</aircraft>

Here's how we might tokenize the XML document:

token type token
------------------------------------------------------------
< <
element name aircraft
attribute name type
= =
apostrophed attribute value 'F-16'
>
comment <!-- Snapshot of an F-16 in flight -->
< <
element name altitude
attribute name units
quoted attribute value "meters"
text data 10000
start end tag </
element name altitude
>
< <
element name headingNorth
empty element terminator />
processing instruction <?altimeter reading="30.1"?>
< <
element name description
>
CDATA section <![CDATA[bumpy ride due to turbulence]]>
start end tag </
element name description
>
< <
element name pilots
>
text data Johnson
entity &amp;
text data Smith
start end tag </
element name pilots
>
< <
element name footnote
>
text data Producer
character decimal entity &#169;
text data aerodata
start end tag </
element name footnote
>
< <
element name footnote
>
text data Editor
character hex entity &#xA9;
text data workshop
start end tag </
element name footnote
>

Wow!

XML is a simple, beautiful language.

/Roger

P.S. For a fantastic description of parsing a language by breaking it up into token types and tokens, see page 125 of "The C Programming Language (second edition)" by Kernighan & Ritchie.


_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS