Lists Home |
Date Index |
- From: John Robert Gardner <firstname.lastname@example.org>
- To: Paul Prescod <email@example.com>
- Date: Thu, 28 Oct 1999 17:30:51 -0400 (EDT)
Many thanks for resuscitating this topic, and for the reassurance that the
relative paucity of responses was not indicative of it's overall
relevance to this list.
I'm glad to have the chance to flesh this out further, so the reply below
will address the matter along three lines:
1. to clarify, per the request, "why" multiple namespaces
2. to address/question the applicability of architectural
3. to present the possible solution we're considering viz.
our initial investment in Oracle, and corresponding
prescription to work with open standards.
4. offer a response to W. Underwood's reply
On Wed, 27 Oct 1999, Paul Prescod wrote:
> Let me first suggest that the solution to your problem is probably not
> to put various element type names in one tag. I could be wrong on this
> point so I'll trust you to set me straight if that's the case.
> > <DC:Creator GILS:Originator TEI:docAuthor>Tillich</DC:Creator
> > GILS:Originator TEI:docAuthor>
> Now you've said explicitly that your goal is to avoid duplicating the
> data in your documents in multiple documents. But is duplicating the
> semantic "author" better?
Here it is likely I've not clarified the rationale. I'll try metaphor
since my generalization also ran aground on TEI-L in this relation. At
the same time, this is the best case I can make for Paul's point:
> If you can convince me that you really need
> multiple element type names *in each and every tag* then you will be the
> first to do so.
Heinrich at Humboldt U. in Berlin is dilligently searching the world's
various library resources with his Dublin-Core-Based engine for
<Creator>Tillich</Creator> and we want his DC system to transparently find
the wealth of Tillich articles we'd have online. Simultaneously, Alicia
is using a TEI-based search over in Ireland, and wants to find
<docAuthor>Tillich</docAuthor>, with the same transparency, and Lu in
HongKong has GILS, so <Originator>Tillich</Originator> is the formula for
the query. With our multiple namespaces/ArchForms/tagbag (see below) or
whatever, everybody is able to use their native search engine which, in
turn, is able to find the gems of wisdom sought without any additional
tweaking on the user--or library administrators'--end.
"Multiple Namespaces" was a suggestion I had in offline discussions,
which, of course, "can't" be done, so I'm hoping, at least, that the
reason for it makes better sense now.
> I'm guessing that DC:Creator is *always* going
> to be a synonym for TEI:docAuthor which means that saying so explicitly
yes, it is, so Arch FOrms does make sense, but cf. below . . . .
> in the document is redundant. It causes all of the usual problems of
> database redundancy:
> * It increases the size of your database: it will quadruple (at least)
> your indexes.
Granted, but this would be an acceptable caveat if the goal noted just
above is met.
> * It increases the possibility for error: authors or data generators
> could "forget" to insert a TEI:docAuthor alongside a DC:Creator.
Yes and no, since most of our input is one-time only, rather than
respeatedly updated or nuanced. THe finished article and record entry is
a one-time keying (or XSLT transformed) deal.
> * It reduces optimization opportunities because the database won't
> cache "synonyms" properly.
Exactly our worry. Our initial implementation for demo is with our
existing Oracle 8.1.5 (soon-to-be .6, we hope) wizard's work. We've been
given the idea of architectural forms in the context of the same
discussion wherein mult namespaces was raised.
Here I am willing to learn that my ignorance of Arch Forms has caused me
to to sell the solution short, esp. cf. Paul's comment in his closing
> There is hope, however. "Out of line" architectural forms are about to
> be reinvented as "archetypes." Once they are reinvented in a syntax that
> is OO-friendly and W3C approved, it will become obvious that people will
> need to do XPath-like queries based not only on element types, but also
> on archetypes. Finally, search engine vendors are likely to "get it."
My responses/questions/concerns on Arch forms were as follows-
1. as it is an SGML subset standard, our "standards" caveat is ostensibly
met, but-- per the <snip> above-- it seems not necessarily "happy" with
XML -- or I'm misreading you? At any rate as ISO 10744:1997, AAnnex A.3
it seems very viable . .. . except-- cf. caveats 2 and 3 below:
2. Will Arch forms work with our Oracle investment to date (we're
considering SIM and related technology for our next Phase)?
> Old fashioned SGML smelly-ness aside, architectural forms were designed
> to solve exactly this problem. Proponents claim that one of their great
> virtues is that they allow you to do the mapping in EITHER the document
> (duplicating data) OR the DTD (centralizing it). I'm not really happy
> with the fact that it allows the "inline" mode, but the "centralized"
> mode is just what you need.
Will this mechanism achieve the transparency for our users in diaspora
referred to above--and hopefully I made better sense this time--in the
<GoalScenario> section? Because we discussed using attributes for teh
different synonymous--or largely so-- tags like
docAuthor/Creator/Originator, etc., but we were under the impression that,
per below as well, attributes wouldn't achieve our implementation of
The obvious bottom line is the transparency, in the end, the software
required, legacy investment or none-- is secondary to this requirement,
given that we are implementing an open/int'l standard. If we must write
some "in between" script or program gizmo to make it transparent with arch
forms, then we're good to go, and are subsequently seeking suggestions
along these lines, and want to be sure that we are working with the right
database software to do so (proprietary solutions which Oracle--or anyone
else--may provide are obviated by the charge of our grant).
------One suggestion we're considering---------
Put our initial stash of MARC records, via my XSLT work and such, into
--say-- Dublin Core for the Oracle database to store, and then construct
an XML document which is a tagbag of empty elements with attributes which
equivocate that the Dublin Core Creator is the same as TEI docAuthor, etc.
This sounds like Arch. Forms, in a way, but--as noted--I'm not convinced I
fully grok arch forms anyway.
How does htis figure into your points above, and following below?
> Architectural forms are expressed as attributes but they are supposed to
> be INTERPRETED by an architectural processor (like nsgmls and jade) as
> if they were element type names (generic identifiers). The syntax is,
This, then, if true for transparency, would then still need to pass the
question in #2 just above, will it work with Oracle? Corresponding, then,
to Paul's summation below:
> I claim then, that what you need is a database that understands either
> architectural forms or some similar technology. It would index in terms
> of synonyms and recognize that asking for one synonym is as easy as
> asking for another. As far as I know, architectural form indexing and
> caching has never been implemented in a large-scale (multi-gigabyte) XML
> database system but I could be wrong.
We're more than willing--and are largely bound as well--to trudge forth
and do so in uncharted waters, but hence my many questions, and also
considerations of SIM, etc., etc.
On Thu, 28 Oct 1999, Walter Underwood wrote:
> Our search engine handles multiple DTDs by mapping the elements
> into common search meta data elements.
> DC:Creator -> author
> GILS:Originator -> author
> TEI:docAuthor -> author
> and so on. So the documents can remain legal and "pure" with
> respect to TEI or GILS, but users can search them with a
> common model.
This is also not unlike what we're considering, but it sounds from your
post like it enables folks to come to your search engine and do these
multiple search types via the common mapping you've set up, wherein, per
the <GoalScenario> above, we want folks to be able to do this with their
existing native search system.
> And I like the idea of "Creator is Tillich". Shouldn't that
> be "Ground of Creator is Tillich"?
. .. . . only, of course, if it has the _courage to be_ the Ground of
the Creator . . . ;-)
thanks both of you!,
John Robert Gardner
xml-dev: A list for W3C XML Developers. To post, mailto:firstname.lastname@example.org
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:email@example.com the following message;
To subscribe to the digests, mailto:firstname.lastname@example.org the following message;
List coordinator, Henry Rzepa (mailto:email@example.com)