xml-dev - RE: xml search engine?

RE: xml search engine?
[ Lists Home | Date Index | Thread Index ]
From: "Didier PH Martin" <martind@netfolder.com>
To: "Martin Bryan" <mtbryan@sgml.u-net.com>, <xml-dev@xml.org>
Date: Tue, 4 Apr 2000 08:35:39 -0400
Hi Martin,

Martin said:
It strikes me that this debate is missing something.

Didier replies:
Or that you are missing something in this debate.

Martin said:
The advantage of XML queries over other forms of query is that you can use
context to identify the subset of information within a document that you
need to search to find a meaningful result. Instead of having to look at all
indexed occurrences of the term you only need to look at that subset that
are "associated" with a given context. This should, hopefully, reduce the
level of information overload we all suffer from at present.

Didier replies:
Yes indeed, you are right. Is the universe a homogeneous world where
everything is XML? Can we benefit to access this universe through a single
"facade"? Or maybe can we at least try to see if this is possible?

Martin said:
The key to efficiency is going to be the mapping  between the semantics of
the context determining elements (remembering that we are talking about a
chain of ancestors for most elements) and the terms used in a natural
language (or near natural language) query. Unless there is a close match
between query semantics and markup semantics the results of the query will
be meaningless.

Didier replies:
Not necessarily. But may. This all depend on the level of sophistication the
information set engine possess.

Martin said:
The first question that needs to be asked is "how do users identify the
contexts in which data is likely to be meaningful?"

Didier replies:
I totally agree. The context we are talking about with the XML information
set API is that we are trying to build a universe with a single facade to an
heterogeneous universe. The client manipulate a limited set of objects that
wraps the heterogeneous data sources and the heterogeneous query languages.

Martin said:
Take the example used in another thread "like "find a SPEECH whose SPEAKER
contains 'hamlet'". What happens if I coded my text as <Hamlet>To be or not
to be</Hamlet>? How do I know that the tag name identifies the speaker of a
speech? Yet it obviously does - thats the whole intention of the tag. OK, so
its a non-generalized DTD, bad practice. But what about <Part
role="Hamlet">To be or not to be</Part>. Again how do I relate the tag to
the query?

Didier replies:
The state of the art in information retrieval is still struggling to resolve
the context ambiguities (even humans have trouble with this ;-). There is a
good article explaining the notion of Information integration in the IEEE
Intelligent Systems (September/October 1998) the document is available
on-line in PDF format and be retrieved at:
http://www.computer.org/intelligent/ex1998/pdf/x5012.pdf
Even if the article is already two years old, it explain quite well the
problems of information integration and the problem the XML API tries to
address by stretching the model or objects XML developers are already using
to manipulate information sets (transient or permanent). But in order to
expand these interfaces so that more versatile queries or more sophisticated
queries can be performed on information set engine. Information set engines
that does more than maintaining a couple of link lists ;-)

Martin said:
Structured queries can only be generated accurately from knowledge of the
DTD they are intended to query contents related to. Len Bullard hit the nail
on the head. The first port of call is the namespace. The second is the
DTD/schema for that namespace, and the third is the contents of elements
coded using a specific element within the DTD. Queries need to be based on
the elements defined for a particular namespace.

Didier replies:
This can be a very valid query space.

Martin said:
So lets try to write queries based on this, something like:

Find me occurences of the phrase "ABC DEF" within elements whose parents
contain "ELEMENT-X" or "Attribute-Y" within "Namespace-Z"

Indexing for such a query will need to be based on a combination of
Namespace, Context and Contents. Omitting any one of these components will
make it impossible to efficiently search servers. To suggest that we might
be able to map between the contexts in different namespaces is, I feel,
going to be beyond what current systems will be able to provide. But we need
to consider how it might be possible to do this.

Didier replies:
So, in the proposed framework, this would be one of the possible query space
an information set engine may offer. I should say however that, a big chunck
of the desired query (i.e the query you are mentionning) is already
addressed by xpath.

Martin said:
The real key is how we break the query down into something that can be
exchanged between servers. Something along the lines of:

<query>
<namespace>www.mysource.com</namespace>
<element>Part</element>
<attribute name="role">Hamlet</attribute>
<content>ABC DEF</content>
</query>

might allow us to allow different servers to use different tools to select
the data using differing engines.

Didier replies:
Now you are addressing an other issue. Sorry I am still struggling to define
properly what is the scope of a single function that this issue is not yet
on my work space. The problem being, how do we provide an XML fragment as a
query to the object.SelectNodes(queryType, Expression) function. A possible
solution is to keep the function as is and have the XML fragment used to
express the query packaged in a string. So following you example, I would
have:

var query = "<query>" +
"<namespace>www.mysource.com</namespace>" +
"<element>Part</element>" +
"<attribute name="role">Hamlet</attribute>" +
"<content>ABC DEF</content>" +
"</query>"

node_set = object.selectNodes("MartinBryanQuery", query)

So it seems to work. The function can support your query world.

Cheers
Didier PH Martin
----------------------------------------------
Email: martind@netfolder.com
Conferences: Web Chicago(http://www.mfweb.com)
             XML Europe (http://www.gca.org)
Book: XML Professional (http://www.wrox.com)
column: Style Matters (http://www.xml.com)
Products: http://www.netfolder.com

***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************


***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************
Follow-Ups:
- Re: xml search engine?
  - From: "Martin Bryan" <mtbryan@sgml.u-net.com>
References:
- Re: xml search engine?
  - From: "Martin Bryan" <mtbryan@sgml.u-net.com>
Prev by Date: Re: XLink question
Next by Date: Re: XLink question
Previous by thread: Re: xml search engine?
Next by thread: Re: xml search engine?
Index(es):
- Date
- Thread