[
Lists Home |
Date Index |
Thread Index
]
I have a large collection of XML documents, and want to find and
group any duplicates. The obvious but slow way of doing this is
to just compare them all to each other. Is there a better
approach?
Particularly, is there any APIs or standards for "hashing" a
document so that duplicates could be identified in a similar way
to what you'd do with a hash table?
Thanks,
Eric
|