The important part comes after that quote:
These tools are designed for finding answers hidden within very, very large data sets.
If you extract what you think you are looking for you may not allow the analytics to find the hidden answers that you didn’t thing to extract.
Steve
From: Costello, Roger L. [mailto:costello@mitre.org]
Sent: Wednesday, March 11, 2015 5:14 AM
To: xml-dev@lists.xml.org
Subject: RE: [xml-dev] What is the general direction you are seeing these days to store and query lots of large complex XML?
Hi Folks,
Peter made a very interesting assertion:
The analytics should run directly on the data,
not on some extract.
My plan was to perform XPath and XQuery on the 50 million XML documents and then use the query results as input into SAS and SPSS analytics. So my approach is quite different than what Peter advocates.
Peter, why do you assert that the analytics should be run directly on the data? Why is that superior to querying the data and using the query results as input to the analytics? Does everyone agree with Peter that the analytics should be run directly on the data? Anyone disagree with Peter?
/Roger