Re: [xml-dev] Approaches to Expanding the Semantics of a Community'sSelf

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] Approaches to Expanding the Semantics of a Community'sSelf-Interested XML Vocabulary

From: Guillaume Lebleu <gl@brixlogic.com>
To: "Costello, Roger L." <costello@mitre.org>
Date: Wed, 21 Nov 2007 14:02:57 -0800

Roger,

I'll put the idealistic option 3 aside right away ;-)

Regarding options 1 and 2, I think it is valuable to look at both of 
them as translations. The only difference is that option 1 does not keep 
the original content ("destructive translation"), while option 2 keeps 
the original content via mixed vocabularies ("non-destructive translation").

What I like about option 2 is that it is practical in the sense that it 
solves the communication issue between vocabulary experts and 
programmers, who are usually different people. I think this 
accessibility to content/vocabulary experts is what will make Option 2 
eventually successful.

Let me explain my humble theory:

Human content editors most likely know about editing XML and have 
knowledge about one or more different vocabularies, but may not know 
XSLT or translation programming. But, using mixed vocabularies, content 
editors can easily publish content in both vocabularies without having 
to edit two documents and providing two URLs, and they don't need to 
explain the 'mapping' to developers they may not even know (assuming we 
are on the wild Web, not in a specific organization with all the skills 
required, clear roles and processes to develop translations).

 From then on, by looking at the examples published on the Web by by 
content editors, programmers can take over, extracting mapping patterns 
and automating the tedious work of generating mark up for same content 
in different vocabularies. Probably by then, most content consumers may 
have observed that the mixed vocabulary approach is more common than the 
single vocabulary approach and will have accepted the idea of ignoring 
whatever tags from a vocabulary they don't understand, instead of 
rejecting them as schema invalid. For those consuming clients that 
really can't handle mixed vocabularies, server-side filtering requested 
via a URL parameter is easy for both manually edited and 
programmatically generated mixed vocabulary content.

So, the focus on content experts is what I believe will make Option 2 
eventually successful. It is my understanding that the microformats.org 
community has followed this model of focusing on human content editors, 
and enjoyed spontaneous adoption and success this way.

Over time, some tags in different vocabulary never get used and some get 
used a lot, and assuming the content is published on the Web, we can 
track usage statistics (http://code.google.com/webstats/). Eventually, 
based on these statistics every vocabulary editor incorporates the most 
widely used tags from other vocabularies and we reach organically and 
chaotically the goal of a single vocabulary, at least for the most 
common concepts.

That said, from an implementation standpoint, for option 2, I think I'd 
prefer a pure XML or pure HTML implementation instead of the one you 
provided:

Pure XML-based mixed vocabulary approach, which allows multiple namespaces:

<?xml version="1.0" encoding="UTF-8"?>
<v:vcard xmlns="http://www.firstcommunity.org"; xmlns:v="http://www.ietf.org/vcard";><Point-of-Contact>
    <v:fn><Name>John Smith</Name></v:fn>
    <v:adr><Address>
        <v:street-address><Street>10 Tremont St.</Street></v:street-address>
        <v:locality><City>Boston</City></v:locality>
        <v:region><State>MA</State></v:region>
    </Address></v:adr>
    <v:tel><Telephone>617-123-4567</Telephone></v:tel>
</Point-of-Contact></v:vcard>

Pure plain old semantic HTML-based mixed vocabulary approach, which 
assumes a single namespace:

<?xml version="1.0" encoding="UTF-8"?>
<div class="Point-of-Contact vcard">
   <span class="Name fn">John Smith</span>
   <span class="Address adr">
      <span class="Street street-address">10 Tremont St.</span>
      <span class="City locality">Boston</span>
      <span class="State region">MA</span>
   </span>
   <span class="Telephone tel">617-123-4567</span>
</div>


Guillaume

References:
- Approaches to Expanding the Semantics of a Community's Self-Interested XML Vocabulary
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]