OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Is whitespace within general entities ignorable?



John, thanks for your reply.

I was, however, surprised by the answer because I don't think it has been
widely implemented that way.

So, to clarify, given the following XML file:-

<?xml version = "1.0"?>
<!DOCTYPE doc [
<!ELEMENT doc (test)*>
<!ELEMENT test (child)*>
<!ELEMENT child EMPTY>
<!ENTITY ent1 "   ">
<!ENTITY ent2 "    <child/>   ">
]>

<doc>
 <!-- this is valid -->
 <test> &ent2; </test>
 <!-- but this isn't -->
 <test> &ent1; </test>
</doc>

This means that an XML processor that parses entities when they are
referenced cannot decide whether or not the case is valid until it comes
across an element start tag in the entity replacement text.  If it does,
then all the preceeding white space is insignificant - otherwise it is
invalid.

I can see how you come to this conclusion by carefully reading the wording
of "validity constraint: Element Valid" in Section 3 of xml[1], but I have
not found this to be implemented in any of the XML processors that I have
tried.  Is this just amother case of imperfect implementations?

> > Section 2.10 mentions the distinction between significant and
insignificant
> > white space but doesn't give a definition. The validity constraint in
> > Section 3(2) for element content talks about the white space surrounding
> > child elements having to match the non terminal S[3] - but is this after
> > entity substitution has been performed?
>
> No.  The whitespace in element content has to be real whitespace
> characters: not entity references, not CDATA sections.
>
> > <?xml version = "1.0"?>
> > <!DOCTYPE test [
> > <!ELEMENT test (child)*>
> > <!ENTITY entws "   ">
> > ]>
> > <test>
> >  &entws;
> > </test>
>
> Invalid.
>
>
> > If we say that it is illegal to have white space within the GE, then
> > something like this would also be illegal:-
> > <!ENTITY entws "   <child/>  ">
>
> Valid.
>
> > Secondly, what if the content model of <test> was changed to EMPTY?
Would
> > this make any difference to your view?  It appears to me that including
the
> > reference to &entws; creates content - which is illegal for EMPTY
elements.
>
> Right.  In an EMPTY element, the start-tag and the end-tag must abut,
> with nothing at all between them.
>
> > And finally, what if &entws; was declared as: <!ENTITY entws "&#9;">.
Would
> > that make any difference?
>
> No.
>

Kind regards
Rob Lugt
ElCel Technology
http://www.elcel.com

[1] http://www.w3.org/TR/REC-xml