OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Questions on XML syntax and conformance issues

[ Lists Home | Date Index | Thread Index ]
  • From: David Brownell <david-b@pacbell.net>
  • To: morus.walter@gmx.de
  • Date: Wed, 08 Mar 2000 13:03:29 -0800

[ crossposting to the OASIS XML conformance WG, which should
  be seeing and dealing with such issues ]

Takuki Kamiya wrote:
> Morus Walter <morus.walter@gmx.de> wrote:
> >
> > The test suite says test 'valid-sa-094' (from James Clarks test cases) to be
> > not wellformed.
> > .... Hence I would not regard this
> > as an entity reference. Any comments on that?
> I agree.

I think that's widely believed, and I know that OASIS has this on a list
of things to resolve.  I don't know when it'll be done though.  

Note that in section 2.8, after the [29] markupdecl production and right
before the "PEs-in-internal-subset" WFC that's relevant there, the spec
is quite explicit:

	... individual nonterminals describe the declarations
	<em>after</em> all the parameter entities have been included.

That conflicts with section 4.4 saying that PE handling is really a
boatload of special cases, including the one being used to claim this
test case isn't malformed.   Seems like maybe W3C should change that
first (and most formative -- since it's the only one with a simple
model behind it!) description of PE processing.

> > Attribute normalization: ...
> But what it says must be accepted as what it means when we are dealing with
> conformance tests; I mean the tests are not to be conformant otherwise.

That's the ideal.  However, earlier discussions on this topic turned
up quite a number of internal inconsistencies in the XML 1.0 spec in
this area.  (See my post, appended.)  The issue is related to the one
above:  entity processing for attributes is fuzzy.

> See http://www.w3.org/XML/xml-19980210-errata#E61
> Therefore I believe that now test case sa02 needs to be corrected.

I glanced at that a while back, and think that maybe _one_ of those
inconsistencies I noted has now been addressed.  Maybe.  Until that
family of issues is fully addressed, I think it'd be best to think
of that issue as open ... lest some fixes cause later unfixes.

(Also, for the record, if one takes XP 0.5 as the best effort "by W3C"
to support XML conformance, I seem to recall that it was used to create
that particular output test.)

- Dave

APPENDED:  my xml-dev post, seemingly not archived, identifying
several seeming internal inconsistencies in the XML spec.


Subject: Re: Attribute normalisation and character entities
Date: Thu, 27 Jan 2000 15:00:58 -0800
From: David Brownell <david-b@pacbell.net>
Organization: Yoyodyne Systems Labs
To: Richard Tobin <richard@cogsci.ed.ac.uk>
CC: xml-dev@ic.ac.uk

Richard Tobin wrote:
> How is an attribute containing a character reference to to whitespace
> character (other than space) supposed to be normalised?
> Section 3.3.3 seems to me to say that character references are not
> subject to the translation to #x20 - the four bulleted points are
> an exhaustive disjunction.
> However the Oasis test suite, in tests sa02 and not-sa02, requires
> that they are replaced with spaces.
> Which is correct?

As a data point, those output tests were originally generated using
the then-current version of XP.  I suspect Tom Passim's observation
is close:  except for CDATA, _whitespace_ should be replaced with just
one space.

As I've commented elsewhere, I find that much of the entity processing
in the XML spec seems to be specified as a collection of special cases
(updated via errata as inconsistencies turn up) rather than being based
on simple and consistent rules.  This is another place that it seems to
be happening.

There are two curious points in 3.3.3 ... first, that character and
entity refs may appear, and second that CRLF sequences may appear (line
endings already having been normalized).

How would these appear?  If we assume that 4.4 applies first, then
those OASIS cases are correct, and they'd appear "doubly escaped" as:

        char-ref-attr = "foo &#38;#9; bar"
        ent-ref-attr1 = "AT&#38;amp;T"
        ent-ref-attr2 = "AT&amp;amp;T"
        crlf-attr     = "a&#xD;&#xA;b"

If we assume that 3.3.3 has needless duplication of 4.4 then I
can't see how the literal CRLF can ever show up as input to the
normalization, since line-ends have already been normalized.

On the other hand, I don't think anyone actually writes what
ent-ref-attr2 has -- "AT&amp;T" is it.  Perhaps 4.4 applies
first, _and_ there is needless duplication (for entity refs).
Or 3.3.3 has both duplication and several errors.

- Dave

This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS