[
Lists Home |
Date Index |
Thread Index
]
Let the publisher validate the xml and the make a msg digest
When an xml document is authored, the author can attach a xml schema or
dtd reference to it. The receiver of the xml document gets the xml
document and validates it against the xml schema or dtd, referenced in
the document to verify that the document is valid.
The xml document might be used over and over again, without any changes
is made to it, and it might even be validated every time. This is a
waste of time!
Let the author do the validation of the finished xml document. If the
xml document is successfully validated against the referenced xml
schema or dtd, why should the receiver of the document need to check
the document again to se if it is valid, the author has tested it
already?
My suggestion is that after the document has been validated by the
author, an message digest is created, similar to ones used in
cryptography, and the digest value is appended to the xml document.
All the receiver has to do is run the xml document through the same
msg. digest, and compare the results of the 2. If they are equal,
nothing in the document has changed since the author made the digest,
so no need to validate.
So this brings you not only conformation that the document is valid,
but also that its content has not changed.
This also allows dom builders (if they are changed) to skip the process
of verifying that the data it receives from the sax reader is really a
xml character, well-formed etc, since that also brings a lot of
overhead. Just look at jdom when it builds a jdom document.
Example:
<?xml version="1.0"?>
<Family>
<Person>
<Name>Fred Flintstone</Name>
</Person>
<Person>
<Name>Vilma Flintstone</Name>
</Person>
</Family>
When I run this through openssl and makes a message digest, with the
command: "openssl dgst flintstone.xml"
it returns a digest: "b99060bb744edd6aac5193da6957afcb" (the problem
with this digest is that white space is also included!)
Then we can do something like this:
<?xml version="1.0"?>
<?digest="b99060bb744edd6aac5193da6957afcb"?> // or
whatever!!!!
<Family>
<Person>
<Name>Fred Flintstone</Name>
</Person>
<Person>
<Name>Vilma Flintstone</Name>
</Person>
</Family>
The receiver can then read and remove the digest, and the verify it
using the same msg digest using the same command showed before.
It could be interesting to do some benchmarking on this.
This is just some thoughts!
Regards, Niels Peter Strandberg
|