Lists Home |
Date Index |
- From: <email@example.com>
- To: XML Dev <firstname.lastname@example.org>
- Date: Fri, 30 Oct 1998 10:36:00 -0500 (EST)
So, Henry's asking whether this is valid:
<!DOCTYPE a [
<!ELEMENT a (b, c)>
<!ELEMENT b EMPTY>
<!ELEMENT c EMPTY>
I'd like to hear Tim Bray's opinion, unless I've missed it already in
this thread (are you reading this, Tim, or alternatively, do you have
an e-mail filter that looks for your name?).
My hunch is that this example *is* valid -- after all, the "<![CDATA["
just means "change to special delimiter-recognition mode" and "]]>"
means "go back to the regular delimiter-recognition mode": neither
implies anything about the contents.
Now, I'd like to reply to what John Cowan wrote:
> In fact, CDATA elements can be returned to the application as
> specialized blocks (the DOM allows it, though SAX does not do so);
> an XML parser *never* needs to look inside a CDATA section after
> its terminator has been found.
This is an interesting interpretation, but it requires some
1. There is no such thing as a "CDATA element" -- CDATA sections are
lexical. In fact, the XML 1.0 REC does make it clear that the
function of a CDATA section is purely escaping, and that its
contents are equivalent to character data (see clause 2.7). There
is no explicit statement allowing parsers to treat CDATA sections
specially (as there is for comments in clause 2.5), though there is
no statement forbidding it either.
2. The XML 1.0 REC says nothing about what information the parser
should deliver to the application when it encounters a CDATA
section (the XML 1.0 REC is weak in this area generally, but we're
working to fix the problem in the XML Information Set WG).
Presumably, because of #1, the rule in clause 2.10 that "an XML
processor must always pass all characters in a document that are
not markup through to the application" applies here.
3. John's statement that the XML parser *never* needs to look inside a
CDATA section after finding the terminator is not quite right --
all of the contents must match the production Char , so the
parser has to check to make certain that there are no illegal
characters within the section (such as form feeds). The version of
AElfred that I wrote doesn't bother to do this checking, but it's
non-conforming (I haven't checked Matt's latest versions).
All the best,
David Megginson email@example.com
xml-dev: A list for W3C XML Developers. To post, mailto:firstname.lastname@example.org
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:email@example.com the following message;
To subscribe to the digests, mailto:firstname.lastname@example.org the following message;
List coordinator, Henry Rzepa (mailto:email@example.com)