On 10/4/2016 11:53 AM, Costello, Roger L. wrote:
Scenario: You are tasked to build a system that will receive data from a
variety of sources. The data arrives in different formats (e.g., CSV,
binary, tab-delimited, JSON, XML). The data sources provide different
kinds of data (e.g., one source provides book data, another source
provides weather data, another source provides gardening data). The data
will be converted to a common intermediary form and then from the
intermediary form, placed into a data store.
One of the problems in tackling the question is that we don't know why all this data is supposed to end up in a common data store. The data are all about different kinds of things, so presumably there won't be an overlap between tables (just as an example, assuming the final data store even has tables). So why? Is it so that a single query language can be used over all the data? Is it so that there is only one kind of data system software to administer or maintain? Is it to satisfy some manager who insists on this scheme? Is it a research project to see if RDF is suitable for some task?