xml-dev - Re: Attribute value normalization

Re: Attribute value normalization

[ Lists Home | Date Index | Thread Index ]

From: Richard Tobin <richard@cogsci.ed.ac.uk>
To: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>, xml-dev@ic.ac.uk
Date: Wed, 27 May 1998 12:35:14 +0100 (BST)

> While translating the XML specification, I find that I do not understand 
> the attribute normalization mechanism of XML.

The result produced by RXP and LT-XML is given at the end (except that
carriage return characters have been replaced by the sequence ^M for
ease of reading).  Here is my explanation for each case.  The relevant
section of the standard is of course 3.3.3.

> <test a="
> 
> test
> 
> test
> 
> "/>

In this case, the linefeeds (or whatever record boundaries are in your
system) are replaced by spaces. Then, the trailing spaces are removed and
the other spaces compressed.  So the result is

  <test a="test test"/>

This is of course the intended way for NMTOKENS to work.

> <test a="&D;&A;&D;&A;test&D;&A;&D;&A;test&D;&A;&D;&A;"/>
> <test a="&DA;&DA;test&DA;&DA;test&DA;&DA;"/>

In this cases the character entities were expanded (into carriage
returns and linefeeds) when then general entities were defined.  So
when the replacement text of the entities is "recursively processed",
they get turned into spaces.  They then get stripped or replaced,
producing the same result as the first case.

[However, if the attribute were of type CDATA, the result would be
different from the first case: these would have 4 spaces instead of 2,
because the cr/lf pairs in the first case were reduced to linefeeds
(probably on input, see section 2.11), whereas in the second case they
are not part of the *literal* entity value of the internal entity.]

> <test a="&#xD;&#xA;&#xD;&#xA;test&#xD;&#xA;&#xD;&#xA;test&#xD;&#xA;&#xD;&#xA;"/>
> <test a="&#xD;&#xD;test&#xD;&#xD;test&#xD;&#xD;"/>
> <test a="&#xA;&#xA;test&#xA;&#xA;test&#xA;&#xA;"/>

In these cases, the character references are appended, but unlike the
case general entity references the result is not recursively
processed.  So there are no space characters to normalise, and the
result is the same as if the attribute had had type CDATA - that is,
the carriage returns and linefeeds appear in the normalised value.

Here is the RXP/LT-XML output:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE test [
<!ELEMENT test (#PCDATA|test)*>
<!ATTLIST test 
        a NMTOKENS #IMPLIED>
<!ENTITY D "&#xD;"> 
<!ENTITY A "&#xA;">
<!ENTITY DA "&#xD;&#xA;">  ]>
<test>
<test a="test test"/>
<test a="test test"/>
<test a="test test"/>
<test a="^M
^M
test^M
^M
test^M
^M
"/>
<test a="^M^Mtest^M^Mtest^M^M"/>
<test a="

test

test

"/>
</test>

-- Richard

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

Follow-Ups:
- Re: Attribute value normalization
  - From: Lars Marius Garshol <larsga@ifi.uio.no>

Prev by Date: Re: XSD: Proposed Goals, Rev. 1
Next by Date: Re: XSD: Proposed Goals, Rev. 1
Previous by thread: Re: Attribute value normalization
Next by thread: Re: Attribute value normalization
Index(es):
- Date
- Thread