Re: [xml-dev] Difference between "normalize" and "canonicalize"?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
To: <xml-dev@lists.xml.org>
Date: Wed, 25 Feb 2009 08:11:41 -0500

Personally, I see "normalization" as changing the information into 
something that is common, while "canonicalization" is representing 
something in a common way without changing it.

When line-ending sequences are normalized, they are changed into new 
values without their old values being retained.  On DOS and Mac and 
mainframe systems, different line-ending sequences are changed to the 
line-feed character.  Once you have the line-feed character there is 
no going back to what it was.  If there was a line-feed in the DOS 
file, there is no distinguishing the authored line-feed from the 
normalized line-end line feed.

The normalize-space() function changes a sequence of white-space 
characters into a single space.  The information is changing and you 
can't undo it once you have the normalized string.  There's no way to 
go back to an arbitrary sequence of white-space characters.

The normalize-unicode() string changes a character without the 
ability to go back.  Using NFKC normalization on U+1E9B creates 
U+1E61 and you can't go back because you've changed the Latin 
character that is the basis of the Unicode character from a long s to 
a simple s.

On the other hand, canonicalization doesn't change the information, 
or the meaning of the information, it merely makes assumptions about 
how that information is presented or organized.  One can then recover 
another arbitrary representation or organization again without 
changing the meaning.  Consider empty elements: they can be created 
either as "<abc/>" or "<abc></abc>" and the meaning between the two 
is identical.  In an XML processor you cannot distinguish between the 
two.  However, when not using an XML processor you need a common 
representation of an empty element so that two users who see an empty 
element represent that empty element in the same canonical form so 
that other processes will see the same information from their 
perspective.  But the information hasn't changed at all.

So I see normalization as destructive and canonicalization as not destructive.

Normalizing information creates a common form without necessarily 
being able to recover the original form because the information is 
being changed.

Canonicalizing information creates a common form merely by convention 
and one could then change that to another alternate form simply by 
following a different convention without changing the information.

So I personally don't consider the two terms the same.

But I also don't think they are always consistently applied with such 
nuance and I wouldn't be surprised to find some users of the terms 
interchanging them.  But when I'm given the choice I perceive a distinction.

I hope this helps.

. . . . . . . . . . . Ken

At 2009-02-25 06:38 -0500, Costello, Roger L. wrote:
>Hi Folks,
>
>Consider these two sentences:
>
>
>1. When an XML parser reads in an XML document it normalizes all 
>line breaks to \n.
>
>2. A canonicalizer tool will canonicalize empty elements to 
>start-tag, end-tag pairs.
>
>
>Both "normalize" and "canonicalize" seem to mean:
>
>    Put into a standard form.
>
>Do they in fact mean the same thing? If so, why have two terms? Why 
>not have just one term?
>
>/Roger

--
XQuery/XSLT training in Prague, CZ 2009-03 http://www.xmlprague.cz
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video lesson:    http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18
Video overview:  http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/x/
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/x/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal

Follow-Ups:
- Re: [xml-dev] Difference between "normalize" and "canonicalize"?
  - From: Richard Salz <rsalz@us.ibm.com>

References:
- Difference between "normalize" and "canonicalize"?
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]