OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: XML word processors and the SW (was Re: First Order Logic .. .)



The technique where one creates instances until 
one gets an exhaustive set isn't unknown.  Once 
known as "tag sprinkling" it is a typical bottom 
up approach seen a lot when we used to be given 
a stack of mil docs or a mil content standard 
and told to "make a DTD for this by next week". 
It isn't that much different from having 
system do the job.    Lots of times when I  
start out to develop a language without a 
real imprimatur for it, I write instances 
until I get a feel for the domains.  Then 
I start in on the DTD/Schema.   This is 
probably fairly common.   Mahler and Andaloussi 
wrote a book on structured techniques for 
DTD design, and most people who come from 
any kind of structured programming background 
understand these approaches.

So the automation may be future-tense, but the 
approach is the oldest one out there.  Some 
suggest it is a bad approach and that structured 
design techniques should be used in every case. 
I think it depends on the situation.  If I have 
a lot of folks (say committee) sitting around a 
table, structured techniques drive a discussion 
nicely.  But I've seldom seen it done.  Most of the 
time, they toss a load of docs on the table and 
say, "tag these".  Only later do when they try to 
get a trading partner to adhere to their tags 
do they hear that well-known refrain which will 
haunt the Semantic Web:

"Who sez?" 

"The W3C sez!!"

"Screw 'em.  This is our business deal."

And so it will go.  How many really large relational schemas 
out there share standard namespaces?  Some.  But mostly, they 
share a weak data standard (eg, NIBRS) for which endless 
local customization is done, semantics and all.

As I said: the challenge of vertical vocabularies is to 
get the two front-runners in a business ecology to give 
up their subject domain expertise to slower competitors. 
In most business markets, you are number one or number 
two or a loser.   A little Objectivism; localize schemas.

Len
http://www.mp3.com/LenBullard

Ekam sat.h, Vipraah bahudhaa vadanti.
Daamyata. Datta. Dayadhvam.h


-----Original Message-----
From: Joel Rees [mailto:rees@server.mediafusion.co.jp]

[clipped]

> When it was suggested early in the XML rhubarb
> that DTDs would go away, (well-formed only),
> I laughed.  It removes the biggest advantage
> of SGML:  standard vocabularies for focused
> domains, the easy means to annotate a text with inline
> metainformation for interpretation.  Now people
> are defending DTDs against the next new thing
> and so it goes, but the principle remains:  once
> you get beyond a simple message, well-formedness
> isn't enough.  You need the metadata to get around
> the outrageous and inefficient noise reduction
> techniques of open text searching.

My company is betting that there will be a large range of applications for
which one would rather not have the DTD in the way.

I tend to figure that DTD-less is an intermediate step, something to use
while trying to get a grasp of what a document class should include and what
it should not.

When I think of writing XML documents with a word processor, I imagine
formatting some piece, then selecting a range and assigning a semantic tag
of my choosing to it. The word processor should split the semantic XML from
the formatting XML, then save the format as an XSL document and the semantic
as straight XML.

An automatic DTD generator should eat a batch of similar files (possibly
built from a single original as a template) and spit the DTD out as whatever
is needed to describe everything in the batch.

When editing the doc, a palette would appear showing the current set of tags
not made directly manipulable by the current XSL.

So, following this line of reasoning, SW would simply take a DTD and allow
selecting a node/nodeset and attaching some attributes or child elements
that specify some common/standard qualitative semantic?

How far in the future am I imagining? I know Microsoft is doing the
smoke-and-mirrors about Word saving as XML. Any bets (or inside info) about
whether they have even considered semantics issues?