The important part comes after that quote: These tools are designed for finding answers hidden within very, very large data sets. If you extract what you think you are looking for you may not allow the analytics to find the hidden answers that you didn’t thing to extract. Steve From: Costello, Roger L. [mailto:costello@mitre.org]
Hi Folks, Peter made a very interesting assertion: The analytics should run directly on the data,
not on some extract. My plan was to perform XPath and XQuery on the 50 million XML documents and then use the query results as input into SAS and SPSS analytics. So my approach is
quite different than what Peter advocates. Peter, why do you assert that the analytics should be run directly on the data? Why is that superior to querying the data and using the query results as input
to the analytics? Does everyone agree with Peter that the analytics should be run directly on the data? Anyone disagree with Peter? /Roger
|