Lists Home |
Date Index |
If you're sending XML from one place to another and it may be 'touched' by
one-or more XML processors, then you can't use md5sum, because the XML may
be parsed and rewritten in a semantically equivalent way that is
syntactically different, causing the MD5 hash to fail. E.g., attributes
may be written out in a different order, or empty elements can be written
as either <foo></foo> or <foo/>.
In order to deal with this, you should canonicalize the XML and then hash
that bytestream; this will give an identical digest value, even in the
face of those changes. For the hashing, use SHA1. For the
canonicalization use exclusive c14n, as it is more robust when your XML is
transported inside other XML (e.g., it becomes the body of a SOAP
message). If you are always generating the XML, you might be able to make
some simplifying assumptions and come up with a simpler c14n mechanism; I
strongly suggest you avoid the temptation to do that. If, in fact, you
use exc-c14n/sha1, you can probably leverage a large pool of bundled
and/or open source code, because those mechanisms are used in WS-Security
for generating a digital signature of a SOAP message; in essence you are
generating a <dsig:Reference> element of a standard XML digital signature,
as defined by W3C/IETF.
The second question, is how do you "protect" the digest value? Are you
concerned about tampering along the way? How do you currently protect
your md5sum values? It may be enough to generate the XML digest and
send/store it the same way you do your md5sum value. Or you might need to
go whole hog and use an XML signature.
Hope this helps.
Application Integration Middleware