XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] XML aggregation question?

Robert Koberg wrote:
>>>>> So I have been sticking with the filesystem,
>>>>
>>>> Yes - I am a great believer in the filesystem as a simple database 
>>>> engine.

i completely agree; transquery was my first foray into this concept a 
few years ago.

>>>>> Apache's Lucene for indexing

Lucene is great stuff

>>>>
>>>> Yes
>>>>
>>>>>  and CVS or Subversion for version control.
>>>>
>>>> Yes
>>>
>>>  From the code point of view, it works so nicely. Easy to develop on 
>>> a local machine just by doing a server commit and local update. Easy 
>>> to change on the server by doing a local commit and a server update 
>>> (and probably an Ant build). Same for content/metadata if necessary 
>>> to change in different locations -- just need to run a lucene update 
>>> to keep the UI experience in sync.
>>
>>
>> Well, herein lies the crux of Mike's comments about if you don't start
>> with a DB that's likely what you'll end up building to some degree.  

and this is very good advice.


> I guess 'to some degree' is the main qualifier. Let's look at it:
> 
> - filesystem: this is perhaps the simplest of them all and probably does 
> not need comment.
> ----
> - Lucene: this library is one of the most clean and easiest to 
> use/implement. If you are dealing with XML (i.e. not wanting to index 
> tags/attribute names), though, you might need to write some code (SAX 
> preferably or even some DOMish thing) to handle indexing appropriately. 
> This can be used to tune the index as well. For example, you have some 
> description-type fields that allow em(phasis) and strong (emphasis) - 
> you can weight those terms/phases more and more heavily than the 
> not-applicable-to-search and/or phatic text. Can you even do this type 
> of thing in an XML DB or RDB? (unless you use something like Lucene?)
> 
> You will also want to determine whether you store the indexed data in 
> the index (for quick/easy retrieval) or store references to pick back up 
> from the filesystem. (this is where an XML DB has its most charm to me)
> ----
> - CVS/Subversion (oh, and when I say subversion, I mean using the 
> filesystem backend, not the Berkeley DB backend (licensing again)): this 
> is pretty much a no-brainer as well (I assume?). This allows you to 
> checkout anywhere where you give access and create an instant work, QA, 
> gold-master or runtime environment.
> 
> If you need something like version control in your app it will be much 
> easier with something that is filesystem based rather than held in some 
> binary.
> ------------------------
> Since this an XML list, I will assume we are talking about XML. I will 
> also assume the main benefit of a DB is transactions (I don't think 
> eXist has that yet). But that can be handled by code at the level of a 

eXist is internally, fully transactional, to enable journaling and recovery.

> wellformedness check and a validity check.
> 
> How does using a (XML)DB from the start compare?

I have used eXist-db in anger over the past 1.5 year, here are some of
my thoughts;

* I concur with RobertK that eXist is not yet ready for a production
instance exposed to the public world....though this is like a lot of OS
software...I also think RobertK may have hit eXist when it was 
bifurcating a  lot of critical code.

Of course (as he said) this could always be the
case, but I dont think so. In any event, Open Source software is only as
good as the number of eyeballs involved...with XMLDB  there are few
better OS solutions then eXist (Apache Xindice is a magnitude worst).
The only other point I usually make about OS software is that if one
doesnt contribute back to OS software, then it's just 'software written
by other people' whom one can blame when things go wrong. On the other
hand, I can fully sympathise with RobertK,  it is irratating to invest
time and effort investigating a software package, to end up concluding
that it is too buggy to use.

* eXist should only be used in its standalone configuration using
the built in Jetty container...the war version (when deployed in tomcat)
when I used it seemed to have quite a few problems rendering it unusable 
(for me at least).

* one of the critical areas of unstability in eXist is the change in
node numbering scheme; this bitter pill was dealt with over the past 3
mths resulting in eXist 1.0 (old numbering, stable) and eXist 1.1 which
is a little unstable. Old numbering scheme placed limits on document
size, also updates (via XUPDATE) was an extremely expensive processing
operation (due to how nodes were numbered and thusly accessed) which a
while back led to bad things like database corruption and memory
overflows, etc....though I would say all those issues have been fixed;
now the unstability is in the 1.1v. All the gory details can be found
here: http://exist.sourceforge.net/xmlprague06.html

* I would rather see RDBMS and filesystems enhanced then have to resort
to a seperate approach to achieve what I do with eXist-db...though it
does a good job at providing an 'XML application server' combining all
those XML technologies including querying via XPATH, XQUERY, and XSLT. I
   am a big fan of Apache Lucene, but I need to do more then just
query/search in my XML applications.

* as an aside, latest version of eXist has an experimental ATOM 
Publishing Protocol built in now,which provides quite a lot of 
possibilities with respect to data feed integration.

IMHO, data corruption is the worst thing that can happen in software
(well there are worst things....but this is probably the most painful).

eXist does have journaling for recovery operations (when hardware or
software crashes) so I would expect for any corruption of the type
RobertK has experienced to be dealt with. I would be interested to hear
of specifics and push back to eXist bug db; any example code to
reproduce error would be greatly appreciated.

I have quite a number of eXist db in operation for a number of my
commercial clients; none are public facing database servers and
participate in processes like content management, XML aggregation
, data summations, etc....

to summerise my programming activities with eXist;

* use perl RPC::XML which invokes eXist XML-RPC API which is quite stable

* I use REST to model and prototype apps, as well as provide

* as already mentioned, subversion is a great enhancement to file 
systems...I have integrated a subversion working copy to live within an 
eXist database.

* use XUPDATE to perform update operations, though eXist has a concept 
of XQUERY UPDATE EXTENSIONS I have never had to use them. I sometimes 
use HTTP PUT/DELETE verbs, instead of using XML-RPC.


hth, jim


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS