OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] hashing

[ Lists Home | Date Index | Thread Index ]
  • To: Rich Salz <rsalz@datapower.com>
  • Subject: Re: [xml-dev] hashing
  • From: Eric Hanson <eric@aquameta.com>
  • Date: Wed, 5 May 2004 20:58:36 +0000
  • Cc: David Megginson <dmeggin@attglobal.net>,XML Developers List <xml-dev@lists.xml.org>
  • In-reply-to: <Pine.LNX.4.44L0.0404292226350.4710-100000@smtp.datapower.com>; from rsalz@datapower.com on Thu, Apr 29, 2004 at 10:34:20PM -0400
  • References: <40916503.9080001@attglobal.net> <Pine.LNX.4.44L0.0404292226350.4710-100000@smtp.datapower.com>
  • User-agent: Mutt/1.2i

I'm just concerned about being conceptually identical.
Instances might be rendered differently by different processors
but as long as they're conceptually the same that's the only
concern.  So running them through a canonicalization engine
works great for this. 

Anyway, thanks for the code, I gave it a try and it works great.

Eric

Rich Salz (rsalz@datapower.com) wrote:
> If you're concerned about byte-for-byte identical, hashing each file
> is okay; if you're concerned about semantic identical (e.g., the order
> of attributes doesn't matter) than use standard XML canonicalization
> or something similar (but it won't be as good:)
> 
> Her's a portable python script that compares all files named on
> the command-line:
> 
> ; cat x.py
> import sys,sha
> from xml.dom.ext.reader import PyExpat
> from xml.dom.ext.c14n import Canonicalize
> 
> hashes = {}
> for f in sys.argv:
>     o = sha.sha()
>     if 1:
>         # simple hash of contents
>         o.update(open(f).read())
>     else:
>         # sha(c14n(doc))
>         r = PyExpat.Reader()
>         dom = r.fromStream(open(f))
>         o.update(Canonicalize(dom))
>     h = o.digest()
>     other = hashes.get(h, None)
>     if other:
>         print 'duplicate', f, other
>     else:
>         hashes[h] = f
> ;
> 
> --
> Rich Salz                  Chief Security Architect
> DataPower Technology       http://www.datapower.com
> XS40 XML Security Gateway  http://www.datapower.com/products/xs40.html
> XML Security Overview      http://www.datapower.com/xmldev/xmlsecurity.html
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS