xml-dev - RE: [xml-dev] XML indexing/search engine

RE: [xml-dev] XML indexing/search engine

[ Lists Home | Date Index | Thread Index ]

To: "Dare Obasanjo" <dareo@microsoft.com>,"Tim Bray" <tbray@textuality.com>,"Roth, Scott (ITD)" <Scott.Roth@state.ma.us>
Subject: RE: [xml-dev] XML indexing/search engine
From: "Ranjeet Sonone" <ranjeet@ipedo.com>
Date: Fri, 23 Aug 2002 13:49:46 -0700
Cc: <xml-dev@lists.xml.org>
Thread-index: AcJKz6SISI/1B0uNSke5uOf5FcWOqQAATtJjAATbXVA=
Thread-topic: [xml-dev] XML indexing/search engine

Just speaking of XML to relational mapping, it does look 
straightforward, not until you deal with issues such
as mapping the XML data model, validating the content against a given
DTD/XMLSchema
and most importantly having a whole set of document based
management and retrieval interface. E.g. to convince myself
I tried to map the following XML content to SQL Server
relations, was successful in shredding my data into tables,
but stumbled when had to add data to the tables by reading
from the XML content and typing it manually in the table.
Searching the documentation, bumped across this wonderful 
mechanism called XDR (where you specify the mapping) and then you 
can load the data into the tables using the XML loader. But 
why do all this when you can invoke a simple call as 
addDocument(document) on a native XML database, which gives 
you a document based management interface?

Sure, SQL Server can provide such an interface over 
OLEDB or some COM object, but then why not? Why ask people 
to write the XDR (which I do not appreciate much) files for 
mapping the data models? And how can I validate the entire 
document once I have shredded? Or how can I evolve my document 
schema if it changes tomorrow? Write the XDR again? 
Load the data again? And what if I have to transform my data 
stright out of the data store, without having to bother about
a layer of reading the content, assembling the document and 
then applying the transforms?

This true about most of the relational stores that support 
XML, and not just SQL Server.

When you suggest, "just map the XML data model to RDBMS tables",
please shed some more light on how to do it, and without trying to
justify
your faith towards RDBMS, honestly reveal the efforts involved.

For a healthy excercise, please use the data given below,
shred it in RDBMS of your choice and then try to provide 
a document-centric access to the data or at least try to retrieve
the entire document content, preserving the document identity (name). 
One would end-up writing a complex middleware/stored procedures/whatever
you call it. 

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE security SYSTEM "security.dtd">
<security>
    <users>
        <user id="u1000">
            <name>Manager</name>
            <password>GoVlqdxyBIugO0FWvj5WnyJ3HyM=</password>
            <email>1000@ipedo.com</email>
            <year>1991</year>
        </user>
        <user id="u1001">
            <name>ranjeet</name>
            <password>RoVlqdsyBIug11FWvj5WnyJ4HyM=</password>
            <email>1001@ipedo.com</email>
            <year>1991</year>
        </user>
        <user id="u1002">
            <name>nick</name>
            <password>TpVlqdsyBIasds1FWvj5WnyJ4HyM=</password>
            <email>1002@ipedo.com</email>
            <year>1992</year>
        </user>
        <user id="u1003">
            <name>srini</name>
            <password>QpVlqdsyTIasds2FWvj5WhyJ4HyM=</password>
            <email>1004@ipedo.com</email>
            <year>1992</year>
        </user>
        <user id="u1004">
            <name>jim</name>
            <password>QpVlqdsyTIasds2FWvj5WhyJ4HyM=</password>
            <email>1005@ipedo.com</email>
            <year>1993</year>
        </user>
        <user id="u1005">
            <name>alex</name>
            <password>QpVlqdsyTIasds2FWvj5WhyJ4HyM=</password>
            <email>1005@ipedo.com</email>
            <year>1993</year>
        </user>
    </users>
    <groups>
        <group id="g2000">
            <name>Adminsitrators</name>
            <owners>
                <owner>Manager</owner>
            </owners>
            <members>
                <member>Manager</member>
                <member>srini</member>
                <member>alex</member>
            </members>
        </group>
        <group id="g2001">
            <name>Users</name>
            <owners>
                <owner>Manager</owner>
                <owner>ranjeet</owner>
            </owners>
            <members>
                <member>ranjeet</member>
                <member>nick</member>
                <member>jim</member>
            </members>
        </group>
    </groups>
</security>

any takers?

-ranjeet


-----Original Message-----
From: Dare Obasanjo [mailto:dareo@microsoft.com]
Sent: Friday, August 23, 2002 11:16 AM
To: Tim Bray; Roth, Scott (ITD)
Cc: xml-dev@lists.xml.org
Subject: RE: [xml-dev] XML indexing/search engine


If an XML-enabled RDBMS which supports importation and exportation of
XML is an option then  the combination of SQL Server and SQLXML does the
job very well. More info at 
 
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnexxml
/html/xml07162001.asp

	-----Original Message----- 
	From: Tim Bray [mailto:tbray@textuality.com] 
	Sent: Fri 8/23/2002 11:04 AM 
	To: Roth, Scott (ITD) 
	Cc: xml-dev@lists.xml.org 
	Subject: Re: [xml-dev] XML indexing/search engine
	
	

	Roth, Scott (ITD) wrote:
	> Hi -
	>
	> I am starting to design an application that will be a
calendaring/event engine for the State of Massachusetts and all of its
agencies (Department of Public Health, Registry of Motor Vehicles,
etc...).  We plan on putting an appropriate calendar event schema in
place, and then starting to generate 1 XML file per event (public
hearing, course, forum, workshop, whatever...).  This will build up
quite a large amount of small XML files quickly.  My question is this -
what is the best way to store these files for easy indexing and
searching?  The actual files will be stored in our content management
system, so I am not worried about updating the information - merely
being able to efficiently query the collection.  Apache's Xindice seems
to be the frontrunner so far.  I am envisioning storing the collection
in Xindice and returning a nodeset to my XSL that contains file names
that match whatever the query was.  The XSL is then free to iterate
through each matching file using the d
	ocument function and grab whatever information for display that
the current page requires.  Is there other software that I should be
considering?  Other approaches?
	
	The idea of making this information available in XML is a good
one and I
	salute Massachusets for this progressive and sensible move.
Publishing
	the schema is smart too.  Of course, just because you're going
to make
	it available in XML doesn't mean you have to store/maintain the
data in
	XML.  Could you put an output filter on your content-management
system
	and hook it up to the web with one of the many gateway products?
	
	Of course, many CM systems don't take kindly to a high volume of
queries
	& exports (as in choke, fall over, die, lock up)... maybe you
could
	batch-dump this stuff into a simple rdbms (oracle, mysql,
whatever), and
	gateway to that while XMLifying the export; these things tend to
search
	well and hold up under query loads.  Does the retrieval really
need to
	be full-text or could fielded query search out of an RDBMS
handle it?
	
	Summary: XML for export and interchange is totally the way to
go.  How
	you get there?  Acronyms that begin with X aren't that relevant.
	
	Now all the XDBMS vendors are going to complain about my lack of
	fidelity to the religion of the XML data model, oh well.
	
	> I am anxious to get this right, as this will be the model for
other statewide templatizing applications - for example, press releases.
	
	It shouldn't be *that* hard.  Once you do it, let us know how it
went,
	or submit a paper to one of the conferences or something.  -Tim
	
	
	
-----------------------------------------------------------------
	The xml-dev list is sponsored by XML.org <http://www.xml.org>,
an
	initiative of OASIS <http://www.oasis-open.org>
	
	The list archives are at http://lists.xml.org/archives/xml-dev/
	
	To subscribe or unsubscribe from this list use the subscription
	manager: <http://lists.xml.org/ob/adm.pl>

Follow-Ups:
- Re: [xml-dev] XML indexing/search engine
  - From: "Steve Muench" <Steve.Muench@oracle.com>

Prev by Date: Detection of non-Unicode characters
Next by Date: Re: [xml-dev] Architectural Forms revival?
Previous by thread: RE: RE: [xml-dev] XML indexing/search engine
Next by thread: Re: [xml-dev] XML indexing/search engine
Index(es):
- Date
- Thread