[
Lists Home |
Date Index |
Thread Index
]
- From: Jean Paoli <jeanpa@microsoft.com>
- To: "'w3c-sgml-wg@w3.org'" <w3c-sgml-wg@w3.org>, "'xml-dev@ic.ac.uk'" <xml-dev@ic.ac.uk>, "'w3c-sgml-erb@hpsgml.fc.hp.com'" <w3c-sgml-erb@hpsgml.fc.hp.com>
- Date: Sun, 22 Jun 1997 22:37:56 -0700
I am pleased to present XML-Data, a Position Paper from Microsoft.
XML-Data is an application of XML for exchanging
structured data and metadata on the Internet.
This position paper is sent to multiple working groups
in the W3C dealing with this subject (XML, meta-data)
and we expect this paper to be discussed and improved
by these working groups.
The current proposal needs namespaces and uses the Layman/Bray
proposal.
The URL of this paper (on the Microsoft site) will be posted tomorrow.
-Jean Paoli
----------------
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="Template"
content="C:\MSOffice\Templates\Letters & Faxes\VFPSPEC97.dot">
<meta name="GENERATOR" content="Microsoft FrontPage 2.0">
<title>XML-Data</title>
</head>
<body bgcolor="#FFFFFF" text="#000000" link="#0000EE"
vlink="#551A8B" alink="#FF0000">
<p align="right"><font size="4"><b>XML-Data.html</b></font> </p>
<p><font size="4"><b>Position Paper from Microsoft<br>20 June 1997
</b></font>
</p>
<h1 align="center">XML-Data</h1>
<dl>
<dt>Authors: </dt>
<dd><a href="mailto:andrewl@microsoft.com">Andrew Layman</a>,
Microsoft Corporation<br>
<a href="mailto:jeanpa@microsoft.com">Jean Paoli</a>,
Microsoft Corporation<br>
<a href="mailto:sjd@eps.inso.com"><font size="3">Steve De
Rose</font></a><font size="3">, Inso Corporation</font><br>
<a href="mailto:ht@cogsci.ed.ac.uk">Henry S. Thompson</a>,
University of Edinburgh <br>
</dd>
<dt>Acknowledgements:</dt>
<dd><font size="3">We thank </font><a
href="mailto:paul@arbortext.com"><font size="3">Paul
Grosso</font></a><font size="3"> (Arbortext), </font><a
href="mailto:sca@eps.inso.com"><font size="3">Sharon
Adler</font></a><font size="3"> (Inso Corporation), </font><a
href="mailto:alb@eps.inso.com"><font size="3">Anders
Berglund</font></a><font size="3"> (Inso Corporation), </font><a
href="mailto:fcha@ais.Berger-Levrault.fr">François
Chahuneau</a> (AIS/Berger-Levrault),<font color="#0000FF"
size="2" face="Arial"> </font><font size="3">and </font><a
href="mailto:edwardj@microsoft.com"><font size="3">Edward
Jung</font></a><font size="3"> (Microsoft) for their help
and contributions to this proposal.</font></dd>
</dl>
<p>Copyright (c) 1997 Microsoft Corp. <br>
</p>
<hr>
<h2 align="left">Abstract</h2>
<p align="left">This document provides the specification for
exchanging structured and networked data on the Web. This
specification uses XML, the Extensible Markup Language for
describing data as well as data about data. We expect this
specification to be useful for a wide range of applications such
as describing database transfers, digital signatures or
remotely-located web resources.</p>
<h2 align="left">1. Introduction</h2>
<p><font color="#000000" size="3">The Internet holds the
potential to integrate all information in a global network (with
many private but integrated domains). The Internet promises
access to information any time and, with wireless technology,
anywhere. Today, however, the Internet is merely an <i>access
medium </i>to text and pictures. To actualize the Internet's
potential, we need to add intelligent search, data exchange,
adaptive presentation, and personalization. The Internet must go
beyond setting an information <em>access</em> standard, and must
set an information <i>understanding </i>standard, which means: a
standard way of representing data so that software can better
search, move, display, and otherwise manipulate information
currently hidden in contextual obscurity.</font></p>
<p><font color="#000000" size="3">XML is an important step in
this direction. It offers a standard syntax for textual structure
of tagged data, based on extensive industry and theoretical
experience. Its lexical format easily depicts a tree structure. A
tree is a natural format that is richer than a simple flat list,
yet (compared to a generalized graph) also respectful of
cognitive and data processing requirements for economy and
simplicity. </font></p>
<p><font color="#000000" size="3">Looking at this point in more
detail, there are several ways of structuring data. One is a flat
tagging system. In this system, sets of keywords are applied to
data elements. This is a simple form of data structure, but it
does not capture any relationships between the keywords.</font></p>
<p><font color="#000000" size="3">A more advanced means of
structuring information is a tree. A tree allows expression of
subsumption, containment, or any other single (contextual)
relationship such as "manages." Trees correspond to
object-oriented class hierarchies, file system hierarchies,
organizational hierarchies and so forth. Trees are relatively
easy to understand and to construct. Trees are efficient to
process, and there is a linear (<em>e.g.</em> textual) structure
that a program can parse incrementally, and determine when it is
finished. This makes trees particularly useful as a transmission
format for asynchronous, distributed systems such as the
Internet, and also for display purposes where the single
relationship (usually visual containment) enables incremental
display.</font></p>
<p><font color="#000000" size="3">A still more elaborate
structure is a directed graph. A graph allows expression of
arbitrary binary relationships, that is, many relationships
between two things. A graph can express subsumption, containment,
and any number of other relationships simultaneously. It is
therefore a superset of a tree. This makes graphs very expressive
for real-world semantics, but it also makes them harder to
understand, more difficult to construct, and less efficient to
process than trees. There is no efficient linear (<em>e.g.</em>
textual) structure of a graph that can be incrementally
processed. Therefore, while they are particularly useful for
representing (and instrumenting) the complete semantics of a
system, they are typically not suitable for transmission,
display, or immediate processing.</font></p>
<p><font color="#000000" size="3">The tree structure is proved
broadly implementable and easy to deploy, not just in theory but
also widely in practice. Industrial implementations, in the SGML
community and elsewhere, demonstrate its intrinsic quality and
industrial strength, e.g. aircraft (ATA), automotive (J2008),
banking (OFX), and semiconductors (Pinnacles PCIS).</font></p>
<p><font color="#000000" size="3">This proposal shows how to add
a single convention to XML so that graph arcs are easily added
into a lexical tree structure, without requiring decomposition of
tree format into a "lowest common denominator"
nodes-and-arcs structure. (For a quick look at the difference,
see the </font><a href="#XML-Data-vs-MCF"><font color="#000000"
size="3">XML-Data versus MCF in XML comparison</font></a><font
color="#000000" size="3">.)</font></p>
<p><font color="#000000" size="3">XML-Data consists of a
collection of related technologies. First, it unifies lexical
trees with graph structures. Second, it builds on this to define
a representation for schemata based on XML instance syntax. It
offers a mechanism to organize element types into a hierarchy,
and proposes a small set of basic types. Finally, it adds
facilities for lexical typing and proposes a small collection of
lexical types.</font></p>
<p><font color="#000000" size="3">XML-Data can encode the
content, semantics and schemata for a gamut of cases, from simple
and prosaic to complex and sophisticated:</font></p>
<ul>
<li><font color="#000000" size="3">An ordinary document</font></li>
<li><font color="#000000" size="3">A structured record, such
as a appointment record or purchase order</font></li>
<li><font color="#000000" size="3">An object, with data and
methods</font></li>
<li><font color="#000000" size="3">A data record, such as the
result set of a query</font></li>
<li><font color="#000000" size="3">Information in a database
or a web site (<em>e.g. </em>CDF)</font></li>
<li><font color="#000000" size="3">Graphical presentation
(<em>e.g.</em>
an application user interface)</font></li>
<li><font color="#000000" size="3">Upper ontology (standard
schema entities and types)</font></li>
<li><font color="#000000" size="3">UberWeb (all the links
between information and people on the web)</font></li>
</ul>
<p><font color="#000000" size="3">The resulting flexibility of a
single homogenous data representation system allows any reader to
uniformly determine the structural semantics of a data element.
Information can then be reused for new purposes and in novel
contexts. For example, a record from a database of restaurants
and a record from a client contact database might be reused in
the context of an appointment, say in setting a lunch date with a
client. The relationships between the restaurant and contact data
do not reside in the schema data described by either database
individually, but are extensions defined by the instance of the
appointment.</font></p>
<p><font color="#000000" size="3">This proposal, building on the
earlier <em>Web Collections in XML </em>proposal, shows how to
use a single syntax for a broad range of data, using that syntax
for data and schemata, permitting the expressiveness of graph
data when such power is required, but retaining the benefits of
lexical trees.</font></p>
<h2 align="left">2. Examples of XML-Data</h2>
<h3><font size="4" face="Times New Roman"><code>Data</code></font></h3>
<p><font size="4" face="Times New Roman"><code>The following
example shows a simple order from a bookstore for several books,
a record, and a cup of coffee.</code></font></p>
<pre><code><ORDER>
<SOLD-TO>
<PERSON><LASTNAME><strong>Layman</strong></PERSON>
<FIRSTNAME><strong>Andrew</strong></FIRSTNAME>
</PERSON>
</SOLD-TO>
<SOLD-ON><strong>19970317</strong></SOLD-ON>
<ITEM>
<PRICE><strong>5.95</strong></PRICE>
<BOOK>
<TITLE><strong>Number, the Language of
Science</strong></TITLE>
<AUTHOR><strong>Dantzig, Tobias</strong></AUTHOR>
</BOOK>
</ITEM>
<ITEM>
<PRICE><strong>12.95</strong></PRICE>
<BOOK>
<TITLE><strong>Introduction to Objectivist
Epistemology</strong></TITLE>
<AUTHOR><strong>Rand, Ayn</strong></AUTHOR>
</BOOK>
</ITEM>
<ITEM>
<PRICE><strong>12.95</strong></PRICE>
<RECORD>
<TITLE><COMPOSER><strong>Tchaikovsky's</strong></COMPOSER
><strong> First Piano Concerto</strong></TITLE>
<ARTIST>><strong>Janos</strong></ARTIST>
</RECORD>
</ITEM>
<ITEM>
<PRICE><strong>1.50</strong></PRICE>
<COFFEE>
<SIZE><strong>small</strong></SIZE>
<STYLE><strong>cafe macchiato</strong></STYLE>
</COFFEE>
</ITEM>
</ORDER></code></pre>
<p><font size="4" face="Times New Roman"><code>XML-Data is
flexible enough to encode heterogeneous structures, for example
books, records and coffee all within one sales order. These
different kinds of items do not need to all have the same
internal parts. For example, books have titles, coffee generally
doesn't. XML-Data allows values to be expressed as element
content (for example the book titles shown) or with a <em>value</em>
attribute (for example the author and artist elements).
Properties of elements can be expressed as attributes (e.g. size
and style of coffee) or as sub-elements (e.g. author, artist).
XML-Data can appear in separate documents or within other
documents (such as HTML pages).</code></font></p>
<h3><font size="4" face="Times New Roman"><code>Data about Other
Data</code></font></h3>
<p><font size="4" face="Times New Roman"><code>XML-Data is
suitable for complex, self-contained data structures such as the
book order, and also for information such as the </code></font><a
href="http://www.microsoft.com/standards/cdf-f.htm"><code>Channel
Definition Format</code></a><code>, </code><font size="4"
face="Times New Roman"><code>which describes remotely-located web
resources, many of which are themselves data:</code></font></p>
<pre><code><CHANNEL>
<ITEM
HREF="<strong>http://www.zoosports.com/intro.htm</strong>"
level="<strong>2</strong>"
precache="<strong>NO</strong>">
<A
HREF="<strong>http://www.zoosports.com/page1.htm</strong>">
<strong>This is a link to page 1.</strong></A>
<TITLE><strong>Welcome to ZooSports!</strong></TITLE>
<ABSTRACT><strong>ZooSports articles, news, and promotional
offers</strong></ABSTRACT>
</ITEM>
<SCHEDULE ENDDATE="<strong>1994-11-05</strong>">
<INTERVALTIME DAY="<strong>1</strong>"/>
<EARLIESTTIME HOUR="<strong>12</strong>"/>
<LATESTTIME HOUR="<strong>18</strong>"/>
</SCHEDULE>
</CHANNEL></code></pre>
<h3><font size="4" face="Times New Roman"><code>PICS-NG
Labels</code></font></h3>
<p><font size="4" face="Times New Roman"><code>XML-Data can
express PICS-NG Labels</code></font><font size="5"
face="Times New Roman"><code>:</code></font></p>
<p><font size="4" face="Times New Roman"><code>(This uses the
</code></font><a
href="http://www.w3.org/XML/Group/9705/namespace.htm"><font
size="4" face="Times New Roman"><code>Layman-Bray proposal for
namespaces</code></font></a><font size="4" face="Times New
Roman"><code>.)</code></font></p>
<pre><code><xml>
<xml:schema>
<namespaceDcl
href="<strong>http://purl.org/Schemas</strong>"
name="<strong>purl</strong>"/>
<namespaceDcl
href="<strong>http://www.foo.com</strong>"
name="<strong>foo</strong>"/>
</xml:schema>
<xml:data>
<purl:description1
href="<strong>http://purl.color.org/document.html</strong>">
;
<title><strong>Light and Dark: A study of
color</strong></title>
<subject><LCSH>
<for><strong>Color and Color
Palettes</strong></for></LCSH> </subject>
<author> <foo:author>
<name><strong>John
Smith</strong></name>
<affiliation><strong>thedarkside</strong></affiliation>
<email><strong>john@thedarkside</strong></email></foo:aut
hor>
<foo:author>
<name><strong>Smith, Jane
Q.</strong></name>
<affiliation><strong>thelightregion</strong></affiliation>
<email><strong>jane@thelightregion</strong></email></foo:
author></purl:description1>
</xml:data>
</xml></code></pre>
<h3><font size="4" face="Times New Roman"><code>Digital
Signatures, Security &Authentication</code></font></h3>
<p><font size="4" face="Times New Roman"><code>Returning to the
bookstore example, this is the same order with a digital
signature added. The structured nature of XML-Data makes it easy
to sign whole elements or parts of them.</code></font></p>
<pre><code><ORDER>
<dsig:DSIG>
<MANIFEST>><strong>80183589575795589189518915</strong></MANIFEST
>
<SIG
href="<strong>http://XYX/Joe@company.com</strong>"/>
</dsig:DSIG>
<SOLD-TO>
<PERSON><LASTNAME>><strong>Layman</strong></PERSO>
<FIRSTNAME>><strong>Andrew</strong></FIRSTNAME>
</PERSON>
</SOLD-TO>
<SOLD-ON>><strong>19970317</SOL</strong>>
<ITEM>
<PRICE><strong>5.95</strong></PRICE>
<BOOK>
<TITLE><strong>Number, the Language of
Science</strong></TITLE>
<AUTHOR><strong>Dantzig, Tobias</strong></AUTHOR>
</BOOK>
</ITEM>
<ITEM>
<PRICE><strong>12.95</strong></PRICE>
<BOOK>
<TITLE><strong>Introduction to Objectivist
Epistemology</strong></TITLE>
<AUTHOR><strong>Rand, Ayn</strong></AUTHOR>
</BOOK>
</ITEM>
<ITEM>
<PRICE><strong>12.95</strong></PRICE>
<RECORD>
<TITLE><COMPOSER><strong>Tchaikovsky's</strong></COMPOSER
><strong> First Piano Concerto</strong></TITLE>
<ARTIST>><strong>Janos</strong></ARTIST>
</RECORD>
</ITEM>
<ITEM>
<PRICE><strong>1.50</strong></PRICE>
<COFFEE>
<SIZE><strong>small</strong></SIZE>
<STYLE><strong>cafe macchiato</strong></STYLE>
</COFFEE>
</ITEM>
</ORDER></code></pre>
<h3><font size="4" face="Times New Roman"><code>Database
Information</code></font></h3>
<p><font size="4" face="Times New Roman"><code>While XML-Data can
represent complex structures, it can also represent simple ones,
for example a simple list of database records:</code></font></p>
<pre><code><BOOK-MASTER-LIST>
<BOOK id="book1">
<TITLE><strong>Number, the Language of
Science</strong></TITLE>
<AUTHOR>><strong>Dantzig, Tobias</strong></AUTHOR>
</BOOK>
<BOOK id="book2">
<TITLE><strong>Introduction to Objectivist
Epistemology</strong></TITLE>
<AUTHOR>><strong>Rand, Ayn</strong></AUTHOR>
</BOOK>
<BOOK id="book3">
<TITLE><strong>I, The Jury</strong></TITLE>
<AUTHOR>><strong>Spillane, Mickey</strong></AUTHOR>
</BOOK>
<BOOK id="book4">
<TITLE><strong>Half Magic</strong></TITLE>
<AUTHOR>><strong>Eager, Edward</strong></AUTHOR>
</BOOK>
<BOOK id="book5">
<TITLE><strong>QED</strong></TITLE>
<AUTHOR>><strong>Feynmann, Richard P.</strong></AUTHOR>
</BOOK>
<BOOK-MASTER-LIST></code></pre>
<h3><font size="4" face="Times New Roman"><code>Graph
Structures</code></font></h3>
<p><font size="4" face="Times New Roman"><code>An XML-Data
element may include links to resources outside the immediate
tree. When it meets application needs, this <em>href</em>
facility can be used to break up a single structure into multiple
parts, with relations among them indicated by Universal Resource
Identifier (URI) links. The references can be local or remote. In
this example, they are inventory records from the database table
we just looked at.</code></font></p>
<pre><code><ORDER id="order1">
<dsig:DSIG>
<MANIFEST>><strong>80183589575795589189518915</strong></MANIFEST
>
<SIG
href="<strong>http://XYX/Joe@company.com</strong>"/>
</dsig:DSIG>
<SOLD-TO>
<PERSON><LASTNAME>><strong>Layman</strong></PERSO>
<FIRSTNAME>><strong>Andrew</strong></FIRSTNAME>
</PERSON>
</SOLD-TO>
<SOLD-ON><strong>19970317<</strong></SOLD-ON>
<ITEM
href="<strong>http://bigbookstore.com/data/bookmaster?XML-XPTR=book
1</strong>">
<PRICE>5.95</PRICE>
</ITEM>
<ITEM
href="<strong>http://bigbookstore.com/data/bookmaster?XML-XPTR=book
2</strong>">
<PRICE>12.95</PRICE>
</ITEM>
<ITEM
href="<strong>http://bigbookstore.com/data/musicmaster?XML-XPTR=cd1
</strong>">
<PRICE>12.95</PRICE>
</ITEM>
<ITEM>
<PRICE>1.50</PRICE>
<COFFEE>
<SIZE><strong>small</strong></SIZE>
<STYLE><strong>cafe macchiato</strong></STYLE>
</COFFEE>
</ITEM>
</ORDER></code></pre>
<p><font size="4" face="Times New Roman"><code>Notice that each
of the ITEM elements establishes a relationship between the ORDER
and a BOOK, and that the <em>relationship itself</em>
has attributes, in this case the price at which the book was
sold. Relations can have attributes, can contain elements and the
process can be carried to any needed level of detail.</code></font></p>
<h3><font size="4" face="Times New Roman"><code>Discontiguous
Information (propertyOf)</code></font></h3>
<p><font size="4" face="Times New Roman"><code>Information about
an element can be contained in the element, but also can sit
outside it. For example, the following applies a digital
signature to a sales order without actually modifying the
order:</code></font></p>
<pre><code><dsig:DSIG>
<xml:propertyOf
href="<strong>http://bigbookstore.com/data/orders?XML-XPTR=order1&q
uot;/></strong>
<MANIFEST
><strong>80183589575795589189518915</strong></MANIFEST>
<SIG
href="<strong>http://XYX/Joe@company.com</strong>"/>
</dsig:DSIG></code></pre>
<h3><font size="4" face="Times New
Roman"><code>Schema</code></font></h3>
<p><font size="4" face="Times New Roman"><code>Every data object,
such as a purchase order, contains certain parts, such as
sold-to, sold-on date, items, etc. We can write a formal
description of what these parts are and which are allowed where.
This is called a "schema" and is written using a form
of XML-Data:</code></font></p>
<pre><code><xml:schema ID="BookOrderSchema">
<!-- This schema is digitally signed. Schemas are a form of data,
so they, too, can be signed. -->
<dsig:DSIG>
<MANIFEST
><strong>*(&#&$&@*$&%*&@*$&$*@</strong></M
ANIFEST>
<SIG
href="<strong>http://XYX/Jane@company.com</strong>"/>
</dsig:DSIG>
<!-- Here are all the element types, their contents,
attributes and relations. -->
<elementType id="<strong>ORDER</strong>">
<relation href="<strong>#SOLD-TO</strong>"/>
<relation href="<strong>#SOLD-ON</strong>"/>
<relation href="<strong>#ITEM</strong>"
occurs="<strong>STAR</strong>"/>
</elementType>
<relationType id="<strong>SOLD-TO</strong>">
<elt href="<strong>#PERSON</strong>"/>
</relationType>
<relationType id="<strong>SOLD-ON</strong>">
<pcdata/>
<!-- Date is YYYYMMDD -->
<attribute name="<strong>lextype</strong>"
default="<strong>DATE.ISO8061</strong>"
presence="<strong>fixed</strong>"/>
</relationType>
<elementType id="<strong>PERSON</strong>">
<relation href="<strong>#LASTNAME</strong>"/>
<relation href="<strong>#FIRSTNAME</strong>"/>
</elementType>
<elementType id="<strong>LASTNAME</strong>">
<pcdata/>
</elementType>
<elementType id="<strong>FIRSTNAME</strong>">
<pcdata/>
</elementType>
<relationType id="<strong>PRICE</strong>">
<pcdata/>
</relationType>
<relationType id="<strong>ITEM</strong>">
<any/>
<relation href="<strong>#PRICE</strong>"/>
<range href="<strong>#BOOK</strong>"/>
<range href="<strong>#RECORD</strong>"/>
<range href="<strong>#COFFEE</strong>"/>
</relationType>
<elementType id="<strong>BOOK</strong>">
<relation href="<strong>#TITLE</strong>"/>
<relation href="<strong>#AUTHOR</strong>"/>
</elementType>
<elementType id="<strong>RECORD</strong>">
<relation href="<strong>#TITLE</strong>"/>
<relation href="<strong>#ARTIST</strong>"/>
</elementType>
<relationType id="<strong>SIZE</strong>">
<pcdata/>
</relationType>
<relationType id="<strong>STYLE</strong>">
<pcdata/>
</relationType>
<elementType id="<strong>COFFEE</strong>">
<relation href="<strong>#SIZE</strong>"/>
<relation href="<strong>#STYLE</strong>"/>
</elementType>
<elementType id="<strong>TITLE</strong>">
<mixed><elt
href="<strong>#COMPOSER</strong>"/></mixed>
</elementType>
<relationType id="<strong>AUTHOR</strong>">
<pcdata/>
</relationType>
<relationType id="<strong>ARTIST</strong>">
<pcdata/>
</relationType>
<relationType id="<strong>COMPOSER</strong>">
<pcdata/>
</relationType>
</xml:schema></code></pre>
<h3><font size="4" face="Times New Roman"><code>Type
Extension</code></font></h3>
<p><font size="4" face="Times New Roman"><code>Sometimes some
elements are variants of others, in which case we can organize
the element types into a genus-species hierarchy using the
<em>extends</em>
attribute:</code></font></p>
<pre><code><xml:schema ID="<strong>ArtSchema</strong>">
<elementType id="<strong>artistic-work</strong>">
<relation href="<strong>#TITLE</strong>"/>
</elementType>
<elementType id="<strong>BOOK</strong>"
extends="<strong>#artistic-work</strong>">
<relation href="<strong>#AUTHOR</strong>"/>
</elementType>
<elementType id="<strong>RECORD</strong>"
extends="<strong>#artistic-work</strong>">
<relation href="<strong>#ARTIST</strong>"/>
<relation href="<strong>#COMPOSER</strong>"
occurs="<strong>OPTIONAL</strong>"/>
</elementType>
<relationType id="<strong>AUTHOR</strong>">
<pcdata/>
</relationType>
<relationType id="<strong>COMPOSER</strong>"
extends="<strong>#AUTHOR</strong>"/>
<relationType id="<strong>ARTIST</strong>">
<pcdata/>
</relationType>
</xml:schema></code></pre>
<p><font size="4" face="Times New Roman"><code>Here we see that
books and records are both types of artistic work, and that a
composer is a type of author.</code></font></p>
<h3><font size="4" face="Times New Roman"><code>Schema
Extension</code></font></h3>
<p><font size="4" face="Times New Roman"><code>We can use also
use this ability to customize a schema that has useful features,
but which is too general. In this example, we show a general
schema for orders, then another one that is customized for our
bookstore:</code></font></p>
<pre><code><xml:schema
ID="<strong>GenericOrderSchema</strong>">
<elementType id="<strong>ORDER</strong>">
<relation href="<strong>#SOLD-TO</strong>"/>
<relation href="<strong>#SOLD-ON</strong>"/>
</elementType>
<relationType id="<strong>SOLD-TO</strong>">
<elt href="<strong>#PERSON</strong>"/>
</relationType>
<elementType id="<strong>PERSON</strong>">
<relation href="<strong>#LASTNAME</strong>"/>
<relation href="<strong>#FIRSTNAME</strong>"/>
</elementType>
<relationType id="<strong>LASTNAME</strong>">
<pcdata/>
</relationType>
<relationType id="<strong>FIRSTNAME</strong>">
<pcdata/>
</relationType>
</xml:schema>
<xml:schema id="BookOrderSchema">
<elementType id="<strong>ORDER</strong>"
extends="<strong>http://generic.com/genericOrder?XML-XPTR=ID(ORDER)
</strong>">
<relation href="<strong>#ITEM</strong>"
occurs="<strong>STAR</strong>"/>
</elementType>
<relationType id="<strong>ITEM</strong>">
<any/>
<relation
href="<strong>http://generic.com/genericOrder?XML-XPTR=ID(ORDER)</s
trong>"/>
<range
href="<strong>http://art.com/schemata?XML-XPTR=ID(BOOK)</strong>&qu
ot;/>
<range
href="<strong>http://art.com/schemata?XML-XPTR=ID(RECORD)</strong>&
quot;/>
<range href="<strong>#COFFEE</strong>"/>
</relationType>
<relationType id="<strong>SIZE</strong>">
<pcdata/>
</relationType>
<relationType id="<strong>STYLE</strong>">
<pcdata/>
</relationType>
<elementType id="<strong>COFFEE</strong>">
<relation href="<strong>#SIZE</strong>"/>
<relation href="<strong>#STYLE</strong>"/>
</elementType>
</xml:schema></code></pre>
<h2 align="left">3. XML-Data Schema</h2>
<p align="left">The XML-Data schema language defines element
types, attributes, relations, and which of these can be used in
which combinations with others. It also provides features for
organizing element types into a genus-species hierarchy, a basic
set of element types, and a small set of lexical types. The
schema contains other features from XML Document Type Definition
(DTD) language, such as entity and notation declarations. The
XML-Data schema is powerful enough to express the same structural
information and constraints as XML DTDs. It covers all the
features of XML-DTDs. An XML DTD can be mechanically converted to
an XML-Data schema. </p>
<p>Schemata are composed of principally of declarations for: </p>
<ul>
<li>element types, represented by <i>elementType</i></li>
<li>attributes of elements, represented by attribute</li>
<li>relations<em> </em>among elements, represented by
<em>relationType</em></li>
<li>rules governing the valid combinations of the above,
represented by <em>any, mixed </em>and<em> pcdata; </em>also
by<em> ent</em>, <em>group</em>, <em>relation, </em>and<em>
range.</em>.</li>
<li>internal and external entities, represented by
<i>intEntityDecl</i>
and <i>extEntityDecl</i></li>
<li>notations, represented by <i>notationDcl</i></li>
</ul>
<p>Comments can be interspersed as usual in XML, and there is
provision for using references to external schemata or schema
fragments.</p>
<h3><b>3.1. The schema document element type: </b><b><i>schema</i></b>
</h3>
<p>All schema elements are contained within a schema element,
like this:</p>
<pre><code><?XML version='1.0' rmd='all'?>
<!doctype schema SYSTEM
"http://www.w3c.org/pub/sotr/schema.dtd">
<xml:schema id='ExampleSchema'>
<!-- schema goes here. -->
</xml:schema></code></pre>
<h3><b>3.2. The element type declaration element type:
elementType</b> </h3>
<p><em>Key terms used here:</em> <strong>element, elementType,
empty, any, mixed, pcdata</strong>, <strong>content model.</strong></p>
<p>The heart of an XML-Data schema is the <strong>elementType</strong>
declaration which defines a class of elements, gives them
attributes, establishes a grammar of which other element types
and character data are allowed in their contents and defines
their allowable relationships to elements of other classes. (The
allowable content, including relations, is called "content
model.")</p>
<pre><code><elementType id="example"> <!-- element
example (p*) -->
<elt href="#p" occurs="STAR"/>
</elementType>
<elementType id="p"> <!-- element p
((#PCDATA|p)*) -->
<mixed><elt href="#p"/></mixed>
</elementType></code></pre>
<p>The name attribute is optional if id is present, in which case
the id is used as the name.</p>
<p>Within an elementType, <em>elt</em> indicates that instances
are permitted to only have a single element type in their
content. The <em>occurs</em> attribute of <em>elt</em> specifies
whether this content is optional, and gives its cardinality. </p>
<p><em>Empty</em> and <em>any</em> content are expressed using
predefined elements <em>empty</em> and <em>any</em>. (<em>Empty</em>
may be omitted. <em>Any</em> signals that any mixture of elements
and parsed character data is legal.) Parsed character data
content is similarly expressed with a <em>pcdata</em> item.
<em>Mixed</em>
content (a mixture of parsed character data and one or more
element types), is identified by a <em>mixed</em> element, whose
content identifies the element types allowed in addition to
parsed character data (see below). </p>
<pre><code><elementType id="ARTIST">
<pcdata/>
</elementType></code></pre>
<p>More complex content models are created using <em>group</em>:</p>
<pre><elementType id="animalFriends" >
<group groupType="OR" occurs="STAR">
<group groupType="OR" occurs="PLUS">
<elt href="#cat"/>
<elt href="#dog"/>
</group>
<elt href="#bird"/>
<elt href="#rabbit"/>
<elt href="#pig"/>
<elt href="#fish"/>
</group>
</elementType></pre>
<h3>3.3 Relations</h3>
<p><em>Key terms used here:</em> <strong>relationType, relation,
XML-Link locator, href.</strong></p>
<p><em>Relation</em> element types express a relationship between
one element (usually the relation's parent) and either another
element or an atomic value (such as a simple number, string or
date). Relations use the XML-Link <em>locator</em> without
implying navigation. The target of a relation is the element
referenced by the <em>href</em> attribute if one is present,
else the element contents. This single convention unifies graphs
and trees.</p>
<p>Including a relation in an elementType makes it an implicit
part of that element's content model, with the default for occurs
being OPTIONAL. Relations must occur (in a valid document
instance) after any other content. RelationsTypes are elements,
and the full content model is as if there were a sequential group
containing first the explicitly provided content model, then the
relations in a <em>starred</em> <em>or</em> group with all the
relations as content. </p>
<p>Two element types are used in the schema to effect a relation:
The <em>relationType</em> is a specialized kind of <em>elementType</em>,
while <em>relation</em> has the same function as <em>elt </em>(
but validates that it refers to a relationType). </p>
<p>If a <em>default</em> attribute is specified for a relation,
it becomes the default of the <em>value</em> attribute of the
relation elt. The <em>range</em> element, if present, declares a
restriction on the valid target of a relation. Each range element
references one elementType; any of which are valid. </p>
<pre><code> <relationType id="favoriteFood"
><mixed/></relationType>
<relationType id="chases"
><any/></relationType>
<elementType id="dog" >
<any/>
<attribute name="name"/>
<relation href="favoriteFood"/>
<relation href="chases"/>
</elementType></code></pre>
<h3>3.4 Attributes</h3>
<p><em>Key terms used here:</em> <strong>attribute, attribute,
values, default. </strong></p>
<p>After the content model, attribute declarations may occur,
which are divided into attributes with enumerated or notation
values, and all other kinds.</p>
<pre><code><elementType id="p1"> <!-- element
p1 ((#PCDATA|p1)*) -->
<mixed><elt href="#p"/></mixed>
<attribute name='id' type='ID'/> <!-- attlist p id
ID=#IMPLIED
exm (a|b|c) 'c'
x CDATA FIXED
'y' -->
<attribute name='exm' type='ENUMERATION' values='a b
c'default='c'/>
<attribute name='x' defType='FIXED' default='y'/>
</elementType></code></pre>
<p>An attribute may be given a <em>default</em> value. Whether it
is required or optional is signaled by <i>presence</i>. (Presence
ordinarily defaults to IMPLIED, but if omitted and there is an
explicit default, <i>presence</i> is set to the SPECIFIED.)</p>
<p>Attributes with enumerated (and notation) values permit a
<em>values</em>
attribute, a space-separated list of legal values.. The <em>values</em>
attribute is required when the <em>type</em> is ENUMERATION or
NOTATION,<em> </em>else it is forbidden. In these cases, if a
default is specified it must be one of the specified values.</p>
<p>Similar to the facility of multiple ATTLISTs, we sometimes
need to have <em>attributesDcls</em> declared separately from the
elementType they refer to. We can do this with the <em>propertyOf</em>
element, discussed later.</p>
<h3><b>3.5 The internal and external entity declaration element
type: </b><b><i>intEntityDcl</i></b> and <b><i>extEntityDcl</i></b></h3>
<p><em>Key terms used here:</em> <strong>entity, internal entity,
external entity, notation.</strong></p>
<p>This and the next two declarations cover <em>entities</em> in
general. Entities are a powerful shorthand mechanism, similar to
macros in a programming language.</p>
<pre><code><intEntityDcl name="LTG">
<entityDef>Language Technology Group</entityDef>
</intEntityDcl></code></pre>
<pre><code><extEntityDcl name="dilbert">
<notation href="#gif"/>
<systemId
href="http://www.ltg.ed.ac.uk/~ht/dilb.gif"/>
</extEntityDcl></code></pre>
<p>Here as elsewhere, following XML, <em>systemId</em> must be a
URL, absolute or relative, and <em>publicId</em>, if present,
must be a Public Identifier as defined in ISO/IEC 9070:1991,
Information technology -- SGML support facilities -- Registration
procedures for public text owner identifiers.. If a <em>notation</em>
is given, it must be declared (see below) and the entity will be
treated as binary, i.e., not substituted directly in place of
references.</p>
<pre><code><notationDcl name="gif">
<systemId href='http://who.knows.where/'/>
</notationDcl></code></pre>
<h3><b>3.6. The external declarations element type:
</b><b><i>extDcls</i></b>
</h3>
<p><em>Key terms used here:</em> <strong>external entity with
declarations.</strong></p>
<p>Although we allow an external entity with declarations to be
included, we recommend a different declaration for schema
modularization. The <em>extDcls</em> declaration gives a clean
mechanism for importing (fragments of) other schemata. It
replaces the common SGML idiom of declaring an external parameter
entity and then immediately referring to it, and has the same
import, namely, that the text referred to by the combination of
<b>systemId</b>
and <b>publicId</b> is included in the schema in place of the
<b>extDcls</b>
element, and that replacement text is then subject to the same
validity constraints and interpretation as the rest of the
schema.</p>
<h3>3.7. Type Extension</h3>
<p><em>Key terms used here:</em> <strong>type (class), typeOf,
extension (inheritance, subclassing), implements, extends, typeOf
(genus).</strong></p>
<p>Schema of all types can benefit from a subtyping mechanism:
indicating that one class of object is a specialization of
another more general class. For example, cat and dog both have
the type <em>pet</em> as their more general category. To make
more effective use of such classes, we introduce one new schema
attribute, which can be used to declare explicitly that an
element type is a subclass of another: <em>extends</em>: </p>
<pre><code><xml:schema>
<elementType id="animalFriends" >
<elt href="#pet" occurs="PLUS" />
</elementType>
<elementType id="pet" >
<any/>
</elementType>
<elementType id="cat" extends="#pet"/>
<elementType id="dog" extends="#pet"/>
</xml:schema></code></pre>
<p>This schema says that the <em>animalFriends</em> element class
can contain one or more elements from the <em>pet</em> class,
such as a <em>cat</em> or a <em>dog</em>. Also, that each cat and
dog instance is a pet (<font size="3">that is, any cat is
semantically a pet, and any valid cat is also a valid pet</font>).
So the following data is now valid under this schema: </p>
<pre><code><animalFriends>
<cat/>
<dog/>
<cat/>
</animalFriends></code></pre>
<h4>Type Extension</h4>
<p>It is frequently necessary to <em>add</em> new attributes to a
subclass. This requires no extra machinery, because XML already
permits multiple attribute list declarations, which cumulatively
add attributes to element types. So each subclass may easily add
any new attributes desired, as shown here: </p>
<pre><code><elementType id="dog"
extends="#pet"/>
<attribute name="age"/>
</elementType></code></pre>
<p>If the super type has content model, (attributes, etc.) these
are inherited, that is, they are also declared implicitly for the
derived class. In the following example, we give an <em>owner</em>
attribute to <em>pet</em>. This are inherited, so both <em>cat</em>
and <em>dog</em> now also now have an <em>owner</em> attribute..</p>
<pre><code><xml:schema>
<elementType id="animalFriends" >
<elt href="#pet" occurs="PLUS" />
</elementType>
<elementType id="pet">
<any/>
<attribute id='name'/>
<attribute id='owner'/>
</elementType>
<elementType id="cat" extends="#pet"/>
<elt href='#kittens'/>
<attribute id='lives' type='NMTOKEN'/>
</elementType>
<elementType id="dog" extends="#pet"/>
<elt href='#puppies'/>
<attribute id='breed'/>
</elementType>
<xml:schema></code></pre>
<p>This schema says that the animalFriends element class can
contain one or more <em>pet</em> elements. Because <em>cat</em>
and <em>dog</em> are subtypes of <em>pet</em>, they can occur as
well. So the following instance fragment is now valid under this
schema: </p>
<pre><code><animalFriends>
<cat name="Fluffy" lives='9'/>
<pet name="Diego"/>
<dog name="Gromit" owner='Wallace' breed='mutt'/>
</animalFriends></code></pre>
<p>Additional relations can also be added, but only be added if
the content model of the superType consists of a single list of
optional, repeatable element types.</p>
<p>When defining a derived element class, one can also override
existing attributes and relations. The following example adds a
<em>Height</em>
relation and overrides the <em>favoriteFood</em> relation, giving
it a default value of "Fish." (We also do something
fancy here. Making this overridden element itself have its super
type favoriteFood ensures that the derived element is in all
other respects identical.) </p>
<pre><code><relationType id="height">
<any/>
</relationType>
<relationType id="#favoriteCatFood"
extends="#favoriteFood"/>
<elementType id="cat" extends="#pet"/>
<relation href="#height"/>
<relation href="#favoriteCatFood"
default="Fish"/>
</elementType></code></pre>
<h4>Schema Extension</h4>
<p>We can also use subtyping to extend an existing schema without
editing it. Suppose that we cannot edit the schema defining pet,
cat or dog, but want to use elements with those names and
semantics in our document. The following adds the
"eyeColor" property to <em>cat</em>.</p>
<pre><code><relationType id="eyeColor"
extends="http://whereever.org/#eyeColor">
<pcdata/>
</relationType>
<elementType id="cat"
extends="http://whereever.org/#cat"/>
<relation href="#eyeColor"/>
</elementType></code></pre>
<p>The rules for allowable subtyping must enforce certain
constraints, which are in principle that a subtype can have
additional relations and attributes (provided this is consistent
with the super type's content model, but never fewer) and can add
restrictions (but never relax them). In practice, this principle
leads to rules such as that default values can be added if there
are none, changed, or converted to FIXED if DEFAULT.</p>
<h4>Implements</h4>
<p>Subtyping as we have described it here is actually a
combination of two effects: First, we assert that an element of
one type is also of another (as in a cat is a pet).</p>
<p>Second, we achieve economies and maintainability in the
declarations to make sure that the first is true. That is, the
derived element class is automatically provided with all the
properties of the super type. Sometimes it is valuable to have
the first effect without the second. (This is equivalent to the
Java <em>implements</em> facility.) We indicate this by using the
<em>implements</em> element, as in </p>
<pre><code><relationType id="favoriteFood" >
<mixed/>
</relationType>
<relationType id="weight" >
<mixed/>
</relationType>
<elementType id="cat" >
<implements href="http://whereever.org/#pet" />
<attribute name="name"/>
<relation href="#favoriteFood" />
<relation href="#weight" />
</elementType<em>></em></code></pre>
<p><font size="3">This has no effect on the attributes or
relations of instances of cat, but asserts in the schema that
every cat is also a pet (that is, any cat is semantically a pet,
and any valid cat is also a valid pet).</font></p>
<h4>Relation of Type Extension to Parameter Entities</h4>
<p>Sophisticated DTDs often make complex use of <em>parameter
entities</em> in an attempt to consolidate common structures in
one, reusable place. Such parameter entities often represent
implicit classes.</p>
<p>The need is real, but the approach often leads to obscurity,
and reduced maintainability. Further, expansion of entities loses
all connection with their source: once expanded, the fact that
some set of element types was a co-declared set, re-used in
multiple places, is lost. </p>
<h3>3.8 Lexical Data Types</h3>
<p>Information such as dates and numbers is often expressed in a
format that requires some further parsing. For example, the same
date can be written "October 22, 1954" or
"19541022". (And from what I've seen, about 300 other
ways.) The <em>lextype</em> attribute discriminates formats.
Appearing on instance elements, it describes the format of the
remainder of the element. The value of the lextype attribute is
always by reference to a URI identifying the parsing rules.
XML-Data should define a small number of these. We propose
NUMBER, INTEGER, REAL and DATE.ISO8061.</p>
<pre><code><birthday
lextype="<strong>DATE.ISO8061</strong>"><strong>19541022</s
trong></birthday></code></pre>
<p><font size="4" face="Times New Roman"><code>These are declared
in the schema as follows:</code></font></p>
<pre><code><relationType id="<strong>birthday</strong>">
<attribute name="<strong>lextype</strong>"
default="<strong>DATE.ISO8061</strong>"
presence="<strong>fixed</strong>"/>
</relationType></code></pre>
<p><font size="4" face="Times New Roman"><code>When giving the
lexical type of an <em>attribute</em>
in the schema, <em>lextypeIs</em> is
used, as in:</code></font></p>
<pre><code><attribute name="<strong>price</strong>"
presence="<strong>REQUIRED</strong>"
lextypeIs="<strong>number</strong>"/></code></pre>
<p>Some patterns will indicate that several properties or
attributes should be used in combination to arrive at a value.
For example, a custom pattern could indicate a date expressed as
the following: </p>
<pre><code><relationType id="<strong>birthday</strong>">
<attribute name="lextype"
default="<strong>DATE.ATTR-YMD</strong>"
presence="<strong>specified</strong>"/>
</relationType>
...
<birthday year="<strong>1954</strong>"
month="<strong>10</strong>"
day="<strong>22</strong>" >
</code></pre>
<h3>3.9. Basic Semantic Data Types</h3>
<p>We need to define here a small number of basic types and their
hierarchy, corresponding to simple data types such as Number and
Date. (Dates are a subtype of numbers.) </p>
<p>We also need to define the expression of each of the basic
Java and SQL data types in terms of these basic ones, plus
additional properties giving units, precision, min, max, default
pattern, and other properties. For example, an INTEGER typically
is a number a certain min and max property values. Note that
units should be an element type with possible structure, so that
things like "miles/hours" or "feet/(sec*sec)"
can be represented and used for automatic conversions.</p>
<h2 align="left">4. Standard Vocabulary</h2>
<p align="left">We expect standard libraries of vocabulary to be
developed to capture common semantic used in vertical
applications and particularly in industry and application
domains. Dublin Core and CDF are two examples of such standard
libraries.</p>
<h2 align="left">5. Relations to other proposed standards</h2>
<p align="left"><font size="3">The W3C site at</font><font
size="4"> </font><a href="http://www.w3.org/PICS/Member/NG/"><font
color="#0000FF"
size="3"><u>http://www.w3.org/PICS/Member/NG</u></font></a><font
color="#0000FF" size="3"><u> </u></font><font color="#000000"
size="3">contains links to several related papers, including Ora
Lassila's </font><a
href="http://www.w3.org/pub/WWW/Member/9705/WD-pics-ng-metadata-970514.h
tml"><font
color="#000000" size="3">PICS-NG document</font></a><font
color="#000000" size="3">, Renato Ianella's small PICS extension
proposal, CDF, MCF in XML, the </font><a
href="http://www.w3.org/pub/WWW/Member/9703/XMLsubmit.html"><font
color="#000000" size="3">Web Collections using XML</font></a><font
color="#000000" size="3"> proposal. Specific notes on some of
these follow:</font></p>
<h3>5.1 XML-LINK</h3>
<p>All relations use <em>href</em> in a manner consistent with <a
href="http://www.w3.org/pub/WWW/TR/WD-xml-link-970406.html">XML-LINK</a>
working draft dated April 6, 1997 (the most recent as of the time
of this writing). XML-Links are a type of <em>relation</em> (with
extra attributes, elements, and semantics indicating traversal).</p>
<h3>5.2 PICS-NG</h3>
<p><a
href="http://www.w3.org/pub/WWW/Member/9705/WD-pics-ng-metadata-970514.h
tml#intro">PICS-NG
Metadata Model and Label Syntax</a> describes a set of
requirements for structured data to be used on the Internet.
XML-Data is an application of XML concepts to those requirements.</p>
<h3>5.3 CDF</h3>
<p><font size="3">The </font><a
href="http://www.microsoft.com/standards/cdf-f.htm"><font
size="3">Channel Definition Format</font></a><font size="3">
(CDF) is a natural application of XML-Data and is fully
compatible with the syntax and the ideas presented in this
document</font>. Its format is a validatable grammar given a
proper schema. The existing use of href in CDF is consistent with
XML-LINK and XML-Data usage. CDF defines a number of basic
element types that would be appropriate for a standard library.</p>
<h3>5.4 MCF in XML</h3>
<p><a href="http://www.w3.org/Member/9706/xmlmcf.htm">MCF in XML</a>
has two principal components: The ability to represent a
"directed labeled graph" and also a set of predefined
element types. The first of these is effected by a convention on
use of the <em>href</em> attribute (the same convention used in
XML-Data <em>relations</em>, with the same effect). Of the
second, some element types are genuinely necessary to represent
schemata and a type system (these are also present in XML-Data)
while others would be appropriate for a standard library.</p>
<p>XML-Data has a number of features not in MCF: </p>
<ul>
<li>Principally, XML-Data permits <strong>tree structures</strong>
in cases when MCF only permits a graph. (MCF requires
that the target of all relations must be out-of-line when
it is an element. XML-Data allows in-line targets.) </li>
<li>XML-Data hrefs are explicitly <strong>URI</strong>s.
(Though MCF <em>unit</em>s can be URIs, it is not clear
from the current document when they are and when they are
not.)</li>
<li>Finally, names in XML-Data were chosen for more
compatibility with <strong>existing XML usage</strong>
(or at least that is the intention).</li>
<li>XML-Data schemata can represent all the information in an
XML <strong>DTD</strong>, while it is not clear that MCF
can do this. </li>
<li>XML-Data has additional capabilities for expressing
<strong>relationships
in the schema</strong> (relation, relationType, extends,
implements). </li>
<li>XML-Data proposes <em><strong>lextypes</strong></em> as a
basic element type, a feature not discussed in MCF. </li>
</ul>
<p>This chart tabulates the MCF "bootstrap" element
types and describes their equivalence in XML-Data</p>
<dl>
<dt>Category</dt>
<dd>"elementType" in XML-Data.</dd>
<dt>typeOf</dt>
<dd>"typeOf" relation in XML-Data.
Also,"extends" and "implements" in
XML-Data assert the relationship in the schema. </dd>
<dt>Unit</dt>
<dd>"href" in XML-Data.</dd>
<dt>domain</dt>
<dd>"propertyOf" in XML-Data.</dd>
<dt>range</dt>
<dd>"range" in XML-Data. This gives the allowed
type of the target of a property.</dd>
<dt>superType</dt>
<dd>This may correspond to "implements" XML Data.
However the MCF document is not clear on this point.</dd>
<dt>Property</dt>
<dd>This corresponds to the abstract concept of a link class
expressed in schemata by <em>relation</em> and
<em>relationType</em>..
</dd>
<dt>FunctionalProperty</dt>
<dd>This appears to be a <em>relation</em> with <em>occurs</em>
= OPTIONAL or REQUIRED (that is, occurs at most once).</dd>
<dt>mutuallyDisjoint</dt>
<dd>This is a relationship asserted among the members of an
enumeration. XML-Data does not contain a predefined
propertyType for this. It could be added easily if this
is useful. </dd>
<dt>parent</dt>
<dd>A generic property, whose meaning appears to be
contextual. XML-Data does not contain a predefined
elementType for this. It is unneeded because parentage is
expressed by containment, while when out-of-line,
specific meanings are conveyed by more precise
relationship types such as <em>propertyOf</em>.</dd>
<dt>name</dt>
<dd>"name" in XML-Data. However, note that like
parent, the interpretation of name in MCF seems to be
contextual.</dd>
<dt>description</dt>
<dd>XML-Data does not contain a predefined elementType for
this. We think that this belongs to a standard library
and not in this specification.</dd>
<dt>Sequence</dt>
<dd>This is a special arc type in MCF that expresses the same
fact as lexical order in XML.</dd>
<dt>ord</dt>
<dd>This is a MCF helper element type for Sequence.</dd>
</dl>
<p><a name="XML-Data-vs-MCF">Comparative examples of XML-Data and
MCF in XML</a> representation of an order for several books. (All
persons in this example are assumed to be not in the document,
but elsewhere.) The <em>id</em> attribute is on all elements
representing real-world objects, in both models. In the MCF model
<em>id</em> also appears on elements needed artificially for
reference. </p>
<table border="0">
<tr>
<td><font size="4">MCF in XML</font></td>
<td><font size="4">XML-Data</font></td>
</tr>
<tr>
<td valign="top"><pre><code>
<ORDER id="order1">
<SOLD-TO
unit="<strong>http:/people#person1</strong>"/>
<SOLD-ON value="<strong>19970317</strong>"/>
<ITEMS unit="<strong>sequence1</strong>"/>
</ORDER>
<BOOK id="book1">
<TITLE value="<strong>Number, the Language of
Science</strong>"/>
<AUTHOR unit="<strong>http:/people#person2</strong>"/>
</BOOK>
<SEQUENCE id="sequence1">
<ORD UNIT="book1">
<PRICE value=<strong>"5.95"</strong>/>
</ORD>
<ORD UNIT="cd1">
<PRICE value=<strong>"12.95"</strong>/>
</ORD>
<ORD UNIT="book2">
<PRICE value=<strong>"6.95"</strong>/>
</ORD>
<ORD UNIT="food1">
<PRICE value=<strong>"1.50"</strong>/>
</ORD>
</SEQUENCE>
<COFFEE id="food1">
<size value="<strong>small</strong>"/>
<style value="<strong>cafe macchiato</strong>"/>
</RECORD>
<RECORD id="cd1">
<TITLE value="<strong>Rachmaninoff's Second Piano
Concerto</strong>"/>
<ARTIST unit="<strong>http:/people#person3</strong>"/>
</RECORD>
<BOOK id="book2">
<TITLE value="<strong>The Evolution of
Complexity</strong>"/>
<AUTHOR unit="<strong>http:/people#person4</strong>"/>
</BOOK></code></pre>
</td>
<td valign="top"><pre>
<code><ORDER id="order1">
<SOLD-TO
href="<strong>http:/people#person1</strong>"/>
<SOLD-ON value="<strong>9970317"</strong>/>
<ITEM>
<PRICE><strong>5.95</strong></PRICE>
<BOOK id="book1">
<TITLE ><strong>Number, the Language of
Science</strong></TITLE>
<AUTHOR
href="<strong>http:/people#person2</strong>"/>
</BOOK>
</ITEM>
<ITEM>
<PRICE><strong>12.95</strong></PRICE>
<RECORD id="cd1">
<TITLE ><strong>Rachmaninoff's Second Piano
Concerto</strong></TITLE>
<ARTIST
href="<strong>http:/people#person3</strong>"/>
</RECORD>
</ITEM>
<ITEM>
<PRICE><strong>6.95</strong></PRICE>
<BOOK id="book2">
<TITLE ><strong>The Evolution of
Complexity</strong></TITLE>
<AUTHOR
unit="<strong>http:/people#person4</strong>"/>
</BOOK>
</ITEM>
<ITEM>
<PRICE><strong>1.50</strong></PRICE>
<COFFEE>
<SIZE><strong>small</strong></SIZE>
<STYLE><strong>cafe macchiato</strong></STYLE>
</COFFEE>
</ITEM>
</ORDER></code></pre>
</td>
</tr>
</table>
<p> </p>
<h2 align="left">6. Conclusion</h2>
<p><font color="#000000" size="3">Future applications of the
Internet will focus on adding user value to information through
semantic annotation. Semantics will permit information to be
discovered, targeted, reused, and integrated. Not only does this
make the content more usable, but it opens up opportunities for
software developers to build components that exploit these
semantics. Such components could include applications as prosaic
as application or user logging, or as futuristic as user agents
that assist in finding or organizing contents, World-Wide Web
"surf buddies" that accompany a user's browsing and
adding valuable or entertaining comments, or natural language
query systems. Semantic annotation turns the Internet into a
platform for programming powerful and valuable applications.</font></p>
<p><font color="#000000" size="3">This proposal lays the
foundation for how applications can annotate their information
content. The proposal adds powerful new constructs for
representing semantics, sufficiently advanced for use in
artificial intelligence and natural language systems, yet retains
the architecture and investment of existing XML and the
efficiency of its representation.</font></p>
<hr>
<h2 align="left">Appendix A - The XML DTD for a schema</h2>
<pre><code>
<!ENTITY % nodeattrs 'id ID #IMPLIED' >
<!-- href is as per XML-LINK, but is not required unless there is
no content -->
<!ENTITY % exattrs 'extends CDATA #IMPLIED' >
<!ENTITY % linkattrs 'id ID #IMPLIED
href CDATA #IMPLIED' >
<!-- The shared content model of elementType, linkType and
relationType -->
<!-- Omitted element type same as "empty." -->
<!ENTITY % extendedmodel 'implements*,
(elt|group|empty|any|pcdata|mixed)?,
(relation|attribute)*'>
<!-- The top-level container -->
<!element schema ((elementType|propertyOf|linkType|
relationType|extendType|augmentElementType|
intEntityDcl|extEntityDcl|
notationDcl|extDcls|c)*)>
<!attlist schema %nodeattrs;>
<!-- Element Type Declarations -->
<!element elementType (%extendedmodel)>
<!-- Either name or id must be present - - absent name defaults to id
-->
<!attlist elementType %nodeattrs;
%exattrs;
name CDATA #IMPLIED>
<!-- Element types allowed in content model -->
<!-- Note this is just short for a model group with only one elt in
it -->
<!element elt EMPTY>
<!-- Elements can have exponents as well as groups -->
<!-- The href is required -->
<!attlist elt %linkattrs;
occurs (required|optional|star|plus) 'required'>
<!-- A group in a content model, sequential or disjunctive -->
<!element group ((group|elt)+)>
<!attlist group %nodeattrs;
groupType (seq|or) 'seq'
occurs (required|optional|plus) 'required'>
<!element any EMPTY>
<!element empty EMPTY>
<!element pcdata EMPTY>
<!-- mixed content is just a flat, non-empty list of elts -->
<!-- We don't need to say anything about #pcdata, it's implied -->
<!element mixed (elt+)>
<!attlist mixed %nodeattrs;>
<!-- Attributes -->
<!-- default value must be present iff presence is specified or fixed
-->
<!-- presence defaults to specified if default is present, else
implied -->
<!-- name attribute is locally unique, defaults to id if absent
-->
<!element attribute empty>
<!attlist attribute %linkattrs;
name CDATA #IMPLIED
type
(id|idref|idrefs|entity|entities|nmtoken|nmtokens|
enumeration|notation|cdata) 'cdata'
default CDATA #IMPLIED
values NMTOKENS #IMPLIED
presence (implied|specified|required|fixed) #IMPLIED
lextypeIs CDATA #IMPLIED>
<!-- Relations - - relationTypes are pointed to from relations,
just as elementTypes are pointed to from elts -->
<!element relationType (%extendedmodel;,
range*)>
<!attlist relationType %nodeattrs;
%exattrs;
name CDATA #IMPLIED >
<!element range empty >
<!attlist range %linkattrs; >
<!element relation EMPTY>
<!attlist relation %linkattrs;
default CDATA #IMPLIED
occurs (required|optional|star|plus) 'optional'>
<!-- For adding attributes to existing element types -->
<!element propertyOf EMPTY>
<!attlist propertyOf href CDATA #REQUIRED>
</code><font color="#000000" size="3"><!element augmentElementType
((relation|attribute)*)>
<!attlist augmentElementType %linkattrs;
%</font><code>exattrs</code><font
color="#000000" size="3">;></font><code>
<!-- Shorthand for simple XML-LINKs -->
<!element linkType (%extendedmodel;)>
<!attlist linkType %nodeattrs;
%exattrs;
name CDATA #IMPLIED
role CDATA #IMPLIED
title CDATA #IMPLIED
show (embed|replace|new) #IMPLIED
actuate (auto|user) #IMPLIED
behaviour CDATA #IMPLIED >
</code><font size="4"><code>
</code></font><code><!element implements EMPTY>
<!attlist implements href CDATA #REQUIRED>
<!-- Entity Declarations -->
<!-- Note as this is written only external entities
can have structure without escaping it -->
<!-- Name defaults to id if absent -->
<!element intEntityDcl (#PCDATA)>
<!attlist intEntityDcl %nodeattrs;
name CDATA #IMPLIED>
<!-- The entity will be treated as binary if a notation is present
-->
<!-- systemID and publicId (if present) must have the required syntax
-->
<!element extEntityDcl ( systemId, publicId?)>
<!attlist extEntityDcl %nodeattrs;
name CDATA #IMPLIED
notation CDATA #IMPLIED>
<!-- Pointers for above -->
<!element systemID EMPTY>
<!attlist systemID %linkattrs;>
<!-- Must be empty if href is used -->
<!element publicID (#PCDATA) >
<!attlist publicID %linkattrs;>
<!-- Notation Declarations -->
<!-- systemID and publicId (if present) must have the required syntax
-->
<!element notationDcl (systemId, publicId?)>
<!attlist notationDcl %linkattrs;
name CDATA #IMPLIED>
<!-- External entity with declarations to be included -->
<!-- systemID and publicId (if present) must have the required syntax
-->
<!element extDcls empty>
<!attlist extDcls
systemId CDATA #REQUIRED
publicId CDATA #IMPLIED>
<!-- Namespace Declarations -->
<!-- systemID and publicId (if present) must have the required syntax
-->
<!element namespaceDcl EMPTY>
<!attlist namespaceDcl %linkattrs;
name CDATA #IMPLIED>
</code></pre>
</body>
</html>
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
|