[
Lists Home |
Date Index |
Thread Index
]
- From: Paul Prescod <paul@prescod.net>
- To: xml-dev@ic.ac.uk
- Date: Fri, 05 Nov 1999 09:59:57 -0600
John Robert Gardner wrote:
>
> <GoalScenario>
> Heinrich at Humboldt U. in Berlin is dilligently searching the world's
> various library resources with his Dublin-Core-Based engine for
> <Creator>Tillich</Creator> and we want his DC system to transparently find
> the wealth of Tillich articles we'd have online.
> </GoalScenario>
Let's say that in five years we come up with a new peer to DC, TEI or
GILS. Let's say we call it "newmeta" We may need to make this new system
compatible with our existing data. We probably do not have the luxury of
re-encoding all of the data. So our problem is to make the existing data
"look as if" it is compatible with newmeta. I think that we should start
with that as a design goal. We need to be able to make the data look as
if it supports multiple schemes *without requiring the markup to support
all of those schemes*.
The next question is how varied the queries will be. Is straight
name-value indexing sufficient? Or do you need to allow XPath queries on
the "DC version", "TEI version", "GILs version" and so forth? Is XPath a
good query language for metadata to start with? Maybe the right place to
put your abstraction is in the metadata data model. It's my duty to
point out that you should evaluate RDF in that capacity though I don't
claim to be an RDF expert myself.
XML's data model is elements within elements. RDF's data model is
structured properties with values. It seems like a better fit but I
don't know of an RDF query language -- I don't know what people use for
querying RDF.
> > * It increases the possibility for error: authors or data generators
> > could "forget" to insert a TEI:docAuthor alongside a DC:Creator.
>
> Yes and no, since most of our input is one-time only, rather than
> respeatedly updated or nuanced. THe finished article and record entry is
> a one-time keying (or XSLT transformed) deal.
Even if it is a one-time keying, if you quadruple the number of tags,
you'll increase the number of typos.
> 1. as it is an SGML subset standard, our "standards" caveat is ostensibly
> met, but-- per the <snip> above-- it seems not necessarily "happy" with
> XML -- or I'm misreading you? At any rate as ISO 10744:1997, AAnnex A.3
> it seems very viable . .. . except-- cf. caveats 2 and 3 below:
Yes, architectural forms work fine with XML now. But to some degree you
must choose your solution based on the software available. There are
various toolkits that are archform aware but I don't know of any
repositories or search engines that index based on archforms.
> 2. Will Arch forms work with our Oracle investment to date (we're
> considering SIM and related technology for our next Phase)?
Archforms are the manner of representing your aliasing. Oracle has a
variety of ways to support aliasing but it won't read archforms
directly. You would have to write code to implement the aliasing.
> Will this mechanism achieve the transparency for our users in diaspora
> referred to above--and hopefully I made better sense this time--in the
> <GoalScenario> section? Because we discussed using attributes for teh
> different synonymous--or largely so-- tags like
> docAuthor/Creator/Originator, etc., but we were under the impression that,
> per below as well, attributes wouldn't achieve our implementation of
> transparent accessibility.
An architectural form engine implicitly transforms the data.
So you type:
<emory:author>Paul</emory:author>
And then a batch architectural form processor could convert it into:
<DC:Creator>Paul</DC:Creator>
A mythical architectural form-smart search engine would "pretend" to do
that conversion, in real-time, when the query was done and cache the
results intelligently. But such a thing doesn't exist as a general tool.
> The obvious bottom line is the transparency, in the end, the software
> required, legacy investment or none-- is secondary to this requirement,
> given that we are implementing an open/int'l standard.
I think that one thing we still have to hash out is the natural data
model for querying this information.
If it is flat text name-value pairs, Oracle will be great and the
problem will be trivial to solve. Simple name aliasing is all you need.
It is is structured property name-value pairs then you need something
object orientish like an RDF engine or an object database. You need name
aliasing but it needs to be hierarchical.
If the proper model is elements within elements then you need an
XML-specific solution like architectural forms. That's going to be
harder to find.
> Put our initial stash of MARC records, via my XSLT work and such, into
> --say-- Dublin Core for the Oracle database to store, and then construct
> an XML document which is a tagbag of empty elements with attributes which
> equivocate that the Dublin Core Creator is the same as TEI docAuthor, etc.
> This sounds like Arch. Forms, in a way, but--as noted--I'm not convinced I
> fully grok arch forms anyway.
It sounds like you are proposing an alternate representation for the
same thing that architectural forms allows. I think that the
representation is a lesser problem than finding the software that wlil
allow your queries to scale and be extensible.
> This is also not unlike what we're considering, but it sounds from your
> post like it enables folks to come to your search engine and do these
> multiple search types via the common mapping you've set up, wherein, per
> the <GoalScenario> above, we want folks to be able to do this with their
> existing native search system.
What exactly do you mean by that? The data exists in one place, right?
Or are you distributing the XML files around the world? If it exists in
one place, then what software looks at the information? Surely
researchers around the world cannot all be banging on the same files at
the same time. That's why we typically use server software.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|