xml-dev - Re: [xml-dev] hashing

Re: [xml-dev] hashing

[ Lists Home | Date Index | Thread Index ]

To: David Megginson <dmeggin@attglobal.net>
Subject: Re: [xml-dev] hashing
From: Rich Salz <rsalz@datapower.com>
Date: Thu, 29 Apr 2004 22:34:20 -0400 (EDT)
Cc: XML Developers List <xml-dev@lists.xml.org>
In-reply-to: <40916503.9080001@attglobal.net>

If you're concerned about byte-for-byte identical, hashing each file
is okay; if you're concerned about semantic identical (e.g., the order
of attributes doesn't matter) than use standard XML canonicalization
or something similar (but it won't be as good:)

Her's a portable python script that compares all files named on
the command-line:

; cat x.py
import sys,sha
from xml.dom.ext.reader import PyExpat
from xml.dom.ext.c14n import Canonicalize

hashes = {}
for f in sys.argv:
    o = sha.sha()
    if 1:
        # simple hash of contents
        o.update(open(f).read())
    else:
        # sha(c14n(doc))
        r = PyExpat.Reader()
        dom = r.fromStream(open(f))
        o.update(Canonicalize(dom))
    h = o.digest()
    other = hashes.get(h, None)
    if other:
        print 'duplicate', f, other
    else:
        hashes[h] = f
;

--
Rich Salz                  Chief Security Architect
DataPower Technology       http://www.datapower.com
XS40 XML Security Gateway  http://www.datapower.com/products/xs40.html
XML Security Overview      http://www.datapower.com/xmldev/xmlsecurity.html

References:
- Re: [xml-dev] hashing
  - From: David Megginson <dmeggin@attglobal.net>

Prev by Date: Re: [xml-dev] hashing
Next by Date: Re: [xml-dev] ISO and the Standards Golden Hammer (was Re: [xml-d ev] You call that a standard?)
Previous by thread: Re: [xml-dev] hashing
Next by thread: Re: [xml-dev] hashing
Index(es):
- Date
- Thread