Re: [xml-dev] What is the general direction you are seeing these daysto store and query lots of large complex XML?
Yes, unless there is a need to forward on the XML to some other endpoint I can't really see why it would need to stay as XML? Roger says his data has shallow structure so just Casandra may suffice. Pure relational seems expensive for the volumes he has, though as normalized data the real volumes may be significantly smaller. If you need to keep some sense of the original structure and, in particular, if the document has deep structure then a graph database is the way to go. I mentioned Titan, you mention Neo4J. Personally, I have always wanted to link Cassandra with Titan directly in a low level hybrid model but now that Aurelius has been acquired by DataStax that may be unnecessary;they are hinting they will do that. That, IMO, could prove to be the ultimate in flexible, scalable databases for which is why I lean towards Titan these days....
In any case, the real question is what's the real end use for all this data? Roger says it get's pulled into SAS and SPSS which suggests some form of further analytics. Again, that suggests to me that pulling this into something like Titan and using Faunus on Hadoop to get the end analytics directly might be the overall best solution to the entire problem.