Ihe,before you load anything anywhere, you need to do data cleaning on this dataif you do integration from the Web and data has no unique ids…..
In particular entity resolution…Literature is full of data cleaning and entity resolution algorithms.One that you will find familiar (because it looks very much like XQuery :-) is here: