xml-dev - hashing

hashing

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: hashing
From: Eric Hanson <eric@aquameta.com>
Date: Thu, 29 Apr 2004 19:58:17 +0000
User-agent: Mutt/1.2i

I have a large collection of XML documents, and want to find and
group any duplicates.  The obvious but slow way of doing this is
to just compare them all to each other.  Is there a better
approach?

Particularly, is there any APIs or standards for "hashing" a
document so that duplicates could be identified in a similar way
to what you'd do with a hash table?

Thanks,
Eric

Follow-Ups:
- Re: [xml-dev] hashing
  - From: "Jeff Greif" <jgreif@alumni.princeton.edu>
- Re: [xml-dev] hashing
  - From: David Megginson <dmeggin@attglobal.net>

Prev by Date: RE: [xml-dev] ISO and the Standards Golden Hammer (was Re: [xml-dev] You call that a standard?)
Next by Date: WAY OFFTOPIC: ( RE: [xml-dev] ISO and the Standards Golden Hammer (was Re: [xml-d ev] You call that a standard?))
Previous by thread: RE: [xml-dev] ISO and the Standards Golden Hammer (was Re: [xml-d ev] You call that a standard?)
Next by thread: Re: [xml-dev] hashing
Index(es):
- Date
- Thread

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS