Re: [xml-dev] Normalizing and signing XML -- Xoxa

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
From: Norman Gray <norman@astro.gla.ac.uk>
To: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
Date: Thu, 28 May 2015 01:39:22 +0100
Ken, hello.

Many thanks for your very thoughtful comments -- this is exactly the sort of pushback I was hoping to get from xml-dev folk.

> On 2015 May 27, at 20:32, G. Ken Holman <gkholman@CraneSoftwrights.com> wrote:
> 
> At 2015-05-27 14:34 +0100, Norman Gray wrote:
>> Normalising and signing XML is a well-known pain in the neck.
> 
> I think I disagree with that statement.
> 
> I undertook a project to sign UBL documents and inject into the UBL instance the scaffolding with which to contain its digital signature.  Features include adding multiple signatures and adding a "final" signature (such that at that point no further signatures can be added) [1].
> 
> I also recently created the library to sign BDE documents, that don't need scaffolding and simply put the digital signature at the end of the document (thus the library works with any XML document that accepts digital signatures as the last children of the document element) [2].
> 
> For both of these projects I used the free library:
> 
>  https://www.aleksey.com/xmlsec/
> 
> Given that the library exists and is free, I didn't find it a pain-in-the-neck at all.

What you say is more true (and what I said is less straightforward) than it was when this project first went on my personal back-burner in 2012.  Compared to then, XML DigSig libraries now appear to be significantly more reliable and usable, and I'm happy to assume for the purposes of this discussion that they are functionally flawless, in terms of implementing the REC with adequate performance.

Where this 'Xoxa' approach scores is I think in two places:

  * The 'Xoxa' equivalence class of documents is larger than for XML C14N.  This is not a disadvantage, and I return to it below.

  * Because this canonicalization is simpler and, layered on top of an XML parser API, it's more straightforwardly implementable.  It's hard to estimate, because the work was spread out over quite a long time, but I think that writing the Java and the C libraries probably took between one and two person-weeks of effort, some fraction of which was wrangling with C IPC stuff I'd half-forgotten.  If you were implementing this protocol as a custom addition to an existing tool-chain, I think it would take roughly the same amount of time.

I can't find a long list of DigSig implementations.  I know of (C) xmlsec, as you mentioned, and javax.xml.crypto.dsig in Java.  If I want to do this in Python, say, then <http://www.decalage.info/en/python/xmldsig> tells me it's at least possible, and <http://xmlsig.sourceforge.net> provides C++, but both are wrappers for xmlsec.  So it's good that xmlsec appears to be well shaken-down, but I get nervous at what appears to be a monoculture emerging.

> XML DigSig is hard because XML Canonicalization is hard
> 
> What evidence do you cite for that?  You state it in your blog post, but I'm trying to find out what it is that is considered to be hard to do.
> 
> This isn't an idle question:  I've never implemented canonicalization because I'm using it off-the-shelf, so I honestly don't know the answer myself to the question ... what is so hard about canonicalization?  When I looked at the specification long ago it seemed straightforward to me.

I don't think I can adduce actual hard _evidence_ for this.  Partly this remark comes from a general air of notoriety about the joys of DigSig, but mostly it's from examination of the spec.  C18N and DigSig are both simple in principle, but if the C18N spec is to do its job, it has to be at least as complicated and subtle as the XML spec which, as I know from having implemented a good chunk of it, has considerably more corners and edge-cases than one might expect, including many which I have already forgotten.

I looked at the C18N spec in some detail in 2012 as part of a few-month discussion about how to sign XML documents.  I'm a standards junkie, but I found myself only a lukewarm advocate for this one, and the only reason it wasn't briskly rejected as a solution by the group was because the alternative, of signing the wire XML as a byte-string and attaching the PGP as a <!-- comment -->, was even more unappetising.  The latter is the solution that was eventually adopted.

> You say ...
> 
>> ; and that's hard because, I think, it's happening at the wrong level (lexical rather than structural).
> 
> ... and ...?  I don't see any explication of that statement.

I say a little more about it in the arXiv paper at <http://arxiv.org/abs/1505.04437>.

It seems to me that the core thing -- the Document -- is the parsed element tree, so that it's verging on the perverse to sign the transient/temporary/incidental artefact which is the XML serialization.  More practically, generating the canonical serialization requires a serializer which is as intimately dependent on the XML Spec's edge-cases as the XML parser is.  More practically still, generating a serialization is going to be more heavyweight/slow than generating a hash of a sequence of parse objects.

> I believe it's possible to define an alternative canonicalization...
> 
> Possible, sure ... but alternatives are always difficult to elbow out established and already-implemented specifications that don't fail.

Very true, hence my diffidence in describing this here.

> which accepts a larger range of input documents as equivalent
> 
> Hopefully provided that the content of the information set is not different ... if two different infosets are considered equivalent, won't there be integrity issues?  If the infosets are the same, why use something other than the infoset?  Where is the difficulty with working with the infoset?

This 'Xoxa' procedure implies a larger equivalence class than the C18N one.  Two documents such as '<p>one two&amp;</p>' and '<p>one  \n\r  two&#26;</p>' will have different infosets but the same Xoxa canonicalisation.  But:

  * In many cases this won't matter.  One might even guess that most XML applications (for some value of 'most') will try hard to normalise those differences away.  At any rate, this approach would only apply to that (important) subset of applications where this doesn't matter.

  * Consider XSLT output: it can be hard to control just what whitespace appears on the other side of the serializer.  It's not impossible, but in my experience it's an annoying and unproductive puzzle.

  * APIs may normalise at least some of this information away.  By the time information has got to your SAX handlers a lot of whitespace edge-cases have been normalized away as part of ordinary XML processing.  This is another way of saying (i) applications which use SAX/Expat are cases where the different-infoset distinction doesn't matter, because (ii) that detailed infoset information isn't available to the application.

  * A procedure which relies only on the information available from the SAX/Expat API enables a disproportionate simplification in the specification and implementation of this normalization.

> I just read the Gutmann paper you cited ... parts are disingenuous.  Reading the following "Much more worrying though is the fact that at the semantic level XML, like MS Word, consists of highly dynamic content, but about two orders of magnitude more complex than Word." near the start of the document immediately throws me off his perspective.  I disagree right there.  I don't think well-formed XML has semantics (other than inherent hierarchical relationships) and I don't think DTD-valid XML has very many semantics (other than referential integrity).  I believe XML defines syntax and users define semantics.

I also disagree with large chunks of his paper, sometimes using quite distinguished language.  But I agree with what I take to be his central point, that cryptographic signature and message-digest operations are fundamentally defined on byte-streams, and (though this isn't how he puts it) that an XML document is fundamentally not a byte-stream.  There's an impedance mismatch here: C18N attempts to bridge it by defining a byte-stream which is associated with a given InfoSet, but by insisting that that byte-stream is syntactically in XML Instance syntax, creates a complicated and possibly brittle problem for itself.

> And that lead me to read the James Clark blog post that cites, among other things, about the awkwardness of signing things other than XML when combined with XML ... but I see that as out of scope of your post today.  Your post today is talking about the signing of an XML document, not a combination of XML and other stuff (or did I miss something?).

This is <http://blog.jclark.com/2007/10/bytes-not-infosets.html>.  I'd struggle to give a compact account of what Clark is saying here, because I think he's in effect responding to Gutmann's more amorphous points.

> , and which has fewer bells and whistles (and gongs and bugles), but which is much simpler both to define and to implement. I describe that in a preprint at arXiv [2], and it's illustratively implemented in both C and Java in a library called Xoxa[3]
>> 
>> I've expanded on this in a blog-post at <http://text.nxg.me.uk/2015/b65m>.
> 
> Your "normalization" looks a lot like James Clark's NSGMLS output ... was that your inspiration?  I've used that in my "n2x" free resource that converts SGML to XML [3].

Yes, it's completely based on his ESIS output, because that was ready-to-hand in my head, from long familiarity.  The only real differences (if I recall correctly) between the information in Clark's ESIS and the information available to a SAX/Expat user comes from the latter's support for namespaces.

> How much of an XML processor is needed to create your normalization?  If one needs a conformant XML processor in order to create your normalization, how is it making things that much simpler than working with the infoset?  I thought a SAX processor gives one an infoset, and you are talking about using SAX.

My first thought was that I could implement this without using a full XML parser.  I then spent about a week implementing a large fraction of an XML parser, before realising that that's what I'd just done (doh!), and that the rate of increase of conformance unit-tests wasn't slowing down.  So I chucked that away (sob!) and just used a third-party parser.

A SAX processor doesn't give you a full infoset, but the subset of it which is most useful in practice.  It's probably no coincidence that the ESIS output, the SAX API information, and Expat (etc) all give pretty much the same subset.

Xoxa normalizes that subset, and can naturally do so in-place, rather than by reserializing the document and signing/digesting that.

> I think it would be a challenge to justify moving away from the established standards.
> 
> If two parties wish to do whatever they want with your specification or with any other non-standard way of expressing an XML document as being signed, that is up to those involved.  But there already exists a deployed solution for standalone XML documents.

Indeed.

> I hope this is considered constructive.

Very much so -- it's clarified a couple of points for me.  Thanks for your detailed thoughts.

Best wishes,

Norman


-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK
Follow-Ups:
- Re: [xml-dev] Normalizing and signing XML -- Xoxa
  - From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
References:
- Normalizing and signing XML -- Xoxa
  - From: Norman Gray <norman@astro.gla.ac.uk>
- Re: [xml-dev] Normalizing and signing XML -- Xoxa
  - From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]