XML aggregation question?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "Andrew S. Townley" <ast@xxxxxxxxxxxx>
To: XML Developers List <xml-dev@xxxxxxxxxxxxx>
Date: Sat, 26 Aug 2006 10:36:20 +0100

Hi Folks,

I'm looking for some collective wisdom from the list on how what I want
can be done using only XML technologies.  I know at least 3 different
ways you could do it using databases of various sorts, but I'm trying to
see if there's a better way, or if the RDBMS is the way to go.

What I'm trying to do is dynamically aggregate information from XML
instance documents without having to process all of the instances every
time I want the aggregate.  Maybe this is a job for an XMLDB, but I'm
not terribly familiar with them.  I'd also like to be able to keep my
XML instance documents stored on the filesystem rather than having them
in a database for easy access from a variety of tools, from text editors
to Web servers to other utilities written in various languages.

Given something like a widget in an inventory or workflow system where
each instance represents a given widget, e.g.

<widget>
  <status>XXX</status>
  ...
</widget>

What I would like to be able to do is get a view of the collective
status of my group of widgets in an on-demand manner.  Other processes
may be changing the status, so I don't want to introduce a dependency on
an application-maintained static index updated when the status changes.

As I said, some of the ways I know are possible are:

1 - Move the data from the XML instances into a database and run
queries.  When I need the data, either re-generate the XML or store the
XML as a blob.  Obviously, need to do everything in the database or use
ETL operations to do updates.

2 - Keep the XML on the filesystem and periodically (via cron or
similar) generate a static index based on the as-is state of the
information.  Aggregate info is only guaranteed to be as fresh as the
last batch job.  This also has the problem of not scaling well as the
number of instances increases.

3 - Provide a centralized persistence layer to essentially do what #1 is
doing, but as the XML is modified, update the static index.  This seems
really cumbersome and error prone, plus it means you can't have the
flexibility of accessing the "raw" instance documents with shell
scripts, for example.

I'm sure I'm missing something obvious here, so any pointers/suggestions
would be appreciated.  This has to be a common pattern, so I'm sure
there are other solutions people have come up with.  My goal is to keep
whatever solution as light as possible, but if I have to build or use
infrastructure, then that's what I'll have to do.

Thanks in advance,

ast
-- 
Andrew S. Townley <ast@atownley.org>
http://atownley.org

Follow-Ups:
- RE: [xml-dev] XML aggregation question?
  - From: "Michael Kay" <mike@saxonica.com>
- Re: [xml-dev] XML aggregation question?
  - From: peter murray-rust <pm286@cam.ac.uk>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]