OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] hashing

[ Lists Home | Date Index | Thread Index ]

md5sum is a cryptographic hash using the MD5 algorithm.  It's not fast, but
it will do what you want.  It's available in linux, in cygwin, and probably
other ways.

In a reasonable command shell, where unix commands are available along with

md5sum *.xml | sort

will put the duplicate files on neighboring lines.


----- Original Message ----- 
From: "Eric Hanson" <eric@aquameta.com>
To: <xml-dev@lists.xml.org>
Sent: Thursday, April 29, 2004 12:58 PM
Subject: [xml-dev] hashing

> I have a large collection of XML documents, and want to find and
> group any duplicates.  The obvious but slow way of doing this is
> to just compare them all to each other.  Is there a better
> approach?
> Particularly, is there any APIs or standards for "hashing" a
> document so that duplicates could be identified in a similar way
> to what you'd do with a hash table?

  • References:
    • hashing
      • From: Eric Hanson <eric@aquameta.com>


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS