OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Questions on XML syntax and conformance issues

[ Lists Home | Date Index | Thread Index ]
  • From: Morus Walter <morus.walter@gmx.de>
  • To: xml-dev@xml.org
  • Date: Tue, 07 Mar 2000 19:07:08 +0100 (CET)

Hi,

I have a few questions on the xml spec and conformance issues.

I'm currently trying to finish a validating xml parser written in C++.
I tried to check the parser against the oasis test suite and found a few
things I don't understand:


The test suite says test 'valid-sa-094' (from James Clarks test cases) to be
not wellformed. 
<!DOCTYPE doc [
<!ENTITY % e "foo">
<!ELEMENT doc (#PCDATA)>
<!ATTLIST doc a1 CDATA "%e;">
]>
<doc></doc>
The problem they see, seems to be the "%e;" in the attribute value.
If this is a PEreference, it would be forbidden in the internal subset.
However I don't think it is one. Attribute values are defined as 
  [10] 
          AttValue
                   ::=
                      '"' ([^<&"] | Reference)*
                      '"' 
                      |  "'" ([^<&']
                      | Reference)* "'"
so '%' does not have a special meaning here. Hence I would not regard this
as an entity reference. Any comments on that?


Attribute normalization:
The standard says, that WS should be mapped to blanks and character references
to the referenced character.
For non-CDATA attributes sequences of *blanks* should then be mapped to
single spaces.
So if I have e.g. a NMTOKENS attribute 'a&#10;b' step one creates a\nb'
(where \n denotes a linefeed). 
Now what is step two supposed to do? According to the spec nothing.
However the testcase sa02 (from the sun test cases) says, that the result
value for the attribute should be 'a b'.
Actually this makes much more sense, than the result of literally following
the spec.
So is the spec supposed to be wrong here? Should it read 
'If the declared value is not CDATA, then the XML processor must
further process the normalized attribute value by discarding any
leading and trailing WS characters, and by replacing
sequences of WS characters by a single space (#x20)
character.' instead of 
'If the declared value is not CDATA, then the XML processor must
further process the normalized attribute value by discarding any
leading and trailing space (#x20) characters, and by replacing
sequences of space (#x20) characters by a single space (#x20)
character.'?


WS in empty elements:
If an element is declared empty and denoted by a start and an end tag,
should it be allowed to have whitespace between the tags?
I don't think so. The spec says, that that the end tag must follow the
start tag immediately.
However I find samples where this happens.
So how should that be implemented?
And: if WS is allowed, how about comments or PIs?

thanks for your help.
greetings
        Morus


***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS