The IPTC NewsML2 Working Group (www.iptc.org/dev) has
tried to avoid NMTOKENS is drafting its specification.
The reason is that using a list of space-separated
values is not really conformant to “the XML way” (it is à priori sensible to have a sub-element for
each enumerated value). And that we don’t know about an easy way to
process them using XSLT, XPATH or DOM for example. The problem seems rooted in
XML parsers themselves, which usually treat them as strings (and not as collections
of tokens) [1].
But in fact HTML – so broadly used – defines
many NMTOKENS (eg class). HTML processors are everywhere. So it must not be so
difficult to process them.
We have already adopted in NewsML2 one HTML (&
CSS) attribute – media - which specifies the target media type(s) of a
label, and is defined as NMTOKENS in HTML.
Ex of use:
<title
media="handheld screen">Hello world</title>
We are more and more ambivalent about NMTOKENS: using
attributes to convey collections of values could be interesting in many use
cases, making the NewsML2 syntax more compact:
For example we could imagine replacing something
like:
<subject type=”type:organisation” code=”nasdaq:MFRT”>
<sameAs code=”isin:23234”/>
<sameAs code=”sicovam:23234”/>
<title>The Company</title>
</subject>
By
<subject type=”type:organisation” code=”nasdaq:MFRT”
altcodes=”isin:23234 sicovam:ERT-345”>The Company</subject>
We need the help of the community to get clear ideas
about questions like:
-
What are the problems for
developers of XML processors when reading NMTOKENS and when editing them?
-
How shall implementers
process NMTOKENS ?
o
If they use DOM?
o
If they use XSLT?
o
If they use XPATH?
Laurent Le Meur
IPTC, NewsML2 Architecture WP chair
[1] http://lists.xml.org/archives/xml-dev/199806/msg00509.html
“One other problem with the above syntax is the use of NMTOKENS for the
enumeration. An earlier objection on this list pointed out that most parsers
today return a string of Nmtoken's, rather than an array, which forces the
application to parse the string themselves.” (rbourret)