[
Lists Home |
Date Index |
Thread Index
]
- To: Rich Salz <rsalz@datapower.com>
- Subject: Re: [xml-dev] hashing
- From: Eric Hanson <eric@aquameta.com>
- Date: Wed, 5 May 2004 20:58:36 +0000
- Cc: David Megginson <dmeggin@attglobal.net>,XML Developers List <xml-dev@lists.xml.org>
- In-reply-to: <Pine.LNX.4.44L0.0404292226350.4710-100000@smtp.datapower.com>; from rsalz@datapower.com on Thu, Apr 29, 2004 at 10:34:20PM -0400
- References: <40916503.9080001@attglobal.net> <Pine.LNX.4.44L0.0404292226350.4710-100000@smtp.datapower.com>
- User-agent: Mutt/1.2i
I'm just concerned about being conceptually identical.
Instances might be rendered differently by different processors
but as long as they're conceptually the same that's the only
concern. So running them through a canonicalization engine
works great for this.
Anyway, thanks for the code, I gave it a try and it works great.
Eric
Rich Salz (rsalz@datapower.com) wrote:
> If you're concerned about byte-for-byte identical, hashing each file
> is okay; if you're concerned about semantic identical (e.g., the order
> of attributes doesn't matter) than use standard XML canonicalization
> or something similar (but it won't be as good:)
>
> Her's a portable python script that compares all files named on
> the command-line:
>
> ; cat x.py
> import sys,sha
> from xml.dom.ext.reader import PyExpat
> from xml.dom.ext.c14n import Canonicalize
>
> hashes = {}
> for f in sys.argv:
> o = sha.sha()
> if 1:
> # simple hash of contents
> o.update(open(f).read())
> else:
> # sha(c14n(doc))
> r = PyExpat.Reader()
> dom = r.fromStream(open(f))
> o.update(Canonicalize(dom))
> h = o.digest()
> other = hashes.get(h, None)
> if other:
> print 'duplicate', f, other
> else:
> hashes[h] = f
> ;
>
> --
> Rich Salz Chief Security Architect
> DataPower Technology http://www.datapower.com
> XS40 XML Security Gateway http://www.datapower.com/products/xs40.html
> XML Security Overview http://www.datapower.com/xmldev/xmlsecurity.html
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
|