[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: XML word processors and the SW (was Re: First Order Logic .. .)
- From: "Bullard, Claude L (Len)" <clbullar@ingr.com>
- To: Joel Rees <rees@mediafusion.co.jp>, Jeff Lowery <jlowery@scenicsoft.com>
- Date: Thu, 17 May 2001 09:12:18 -0500
The technique where one creates instances until
one gets an exhaustive set isn't unknown. Once
known as "tag sprinkling" it is a typical bottom
up approach seen a lot when we used to be given
a stack of mil docs or a mil content standard
and told to "make a DTD for this by next week".
It isn't that much different from having
system do the job. Lots of times when I
start out to develop a language without a
real imprimatur for it, I write instances
until I get a feel for the domains. Then
I start in on the DTD/Schema. This is
probably fairly common. Mahler and Andaloussi
wrote a book on structured techniques for
DTD design, and most people who come from
any kind of structured programming background
understand these approaches.
So the automation may be future-tense, but the
approach is the oldest one out there. Some
suggest it is a bad approach and that structured
design techniques should be used in every case.
I think it depends on the situation. If I have
a lot of folks (say committee) sitting around a
table, structured techniques drive a discussion
nicely. But I've seldom seen it done. Most of the
time, they toss a load of docs on the table and
say, "tag these". Only later do when they try to
get a trading partner to adhere to their tags
do they hear that well-known refrain which will
haunt the Semantic Web:
"Who sez?"
"The W3C sez!!"
"Screw 'em. This is our business deal."
And so it will go. How many really large relational schemas
out there share standard namespaces? Some. But mostly, they
share a weak data standard (eg, NIBRS) for which endless
local customization is done, semantics and all.
As I said: the challenge of vertical vocabularies is to
get the two front-runners in a business ecology to give
up their subject domain expertise to slower competitors.
In most business markets, you are number one or number
two or a loser. A little Objectivism; localize schemas.
Len
http://www.mp3.com/LenBullard
Ekam sat.h, Vipraah bahudhaa vadanti.
Daamyata. Datta. Dayadhvam.h
-----Original Message-----
From: Joel Rees [mailto:rees@server.mediafusion.co.jp]
[clipped]
> When it was suggested early in the XML rhubarb
> that DTDs would go away, (well-formed only),
> I laughed. It removes the biggest advantage
> of SGML: standard vocabularies for focused
> domains, the easy means to annotate a text with inline
> metainformation for interpretation. Now people
> are defending DTDs against the next new thing
> and so it goes, but the principle remains: once
> you get beyond a simple message, well-formedness
> isn't enough. You need the metadata to get around
> the outrageous and inefficient noise reduction
> techniques of open text searching.
My company is betting that there will be a large range of applications for
which one would rather not have the DTD in the way.
I tend to figure that DTD-less is an intermediate step, something to use
while trying to get a grasp of what a document class should include and what
it should not.
When I think of writing XML documents with a word processor, I imagine
formatting some piece, then selecting a range and assigning a semantic tag
of my choosing to it. The word processor should split the semantic XML from
the formatting XML, then save the format as an XSL document and the semantic
as straight XML.
An automatic DTD generator should eat a batch of similar files (possibly
built from a single original as a template) and spit the DTD out as whatever
is needed to describe everything in the batch.
When editing the doc, a palette would appear showing the current set of tags
not made directly manipulable by the current XSL.
So, following this line of reasoning, SW would simply take a DTD and allow
selecting a node/nodeset and attaching some attributes or child elements
that specify some common/standard qualitative semantic?
How far in the future am I imagining? I know Microsoft is doing the
smoke-and-mirrors about Word saving as XML. Any bets (or inside info) about
whether they have even considered semantics issues?