OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Attribute normalisation and character entities

[ Lists Home | Date Index | Thread Index ]
  • From: David Brownell <david-b@pacbell.net>
  • To: Richard Tobin <richard@cogsci.ed.ac.uk>
  • Date: Thu, 27 Jan 2000 15:56:44 -0800

Richard Tobin wrote:
> In article <3890CE2A.633285@pacbell.net>,
> David Brownell <david-b@pacbell.net> wrote:
> >There are two curious points in 3.3.3 ... first, that character and
> >entity refs may appear, and second that CRLF sequences may appear (line
> >endings already having been normalized).
> What makes you sure line ends have already been normalised?

http://www.w3.org/XML/xml-19980210-errata#E24 ... the first sentence
that replaces 3.3.3 in the REC says so.

>	  In 2.11
> it refers to converting them to #xA before passing them to the
> application, and suggests that it can be implemented by normalising
> before parsing (but doesn't have to be).
> I take the line-end conversion in 3.3.3 as duplicating the requirement
> in 2.11.  If you implement it by normalising before parsing, you won't
> have to do anything about it in attribute normalisation.

The errata preclude that interpretation.  Line end normalization is done
first, and yet afterwards you can still find a CRLF (or a plain CR) in the
pre-normalization attribute text.

> Similarly, I think the entity expansion in 3.3.3 is duplication of
> 4.4.

That was one of the options I presented.  Along with some of the
spec inconsistencies introduced by that interpretation.

> And finally, I suspect that the authors just forgot the possibility
> of non-#x20 whitespace (arising from character entity references) in
> the paragraph about trimming and compressing spaces.

Didn't I identify a few more problems with 3.3.3 than that??  ;-)

> The simplest solution seems to me to leave normalisation as it is, and
> change the Names and Nmtokens productions (which are only used for
> tokenised attribute) to require #x20 rather than S.  This would make
> "foo&#9;bar" illegal as a tokenised attribute, and a good thing too.

That'd only affect a couple validity constraints, and wouldn't address
the problem that the spec is problematic re multiple aspects of the
attribute normalization.  (I'll refrain from proposing a fix though!)

- Dave

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Unsubscribe by posting to majordom@ic.ac.uk the message
unsubscribe xml-dev  (or)
unsubscribe xml-dev your-subscribed-email@your-subscribed-address

Please note: New list subscriptions now closed in preparation for transfer to OASIS.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS