XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Invalid Markup in External DTD conditionals

Daniel Murphy <daniel@devasta.ie> writes:

> ...
> My question is: Why is the ignore section not expected to be valid
> declarations like an <![INCLUDE[]]> section? I mean, if you have to
> check the IgnoreSection of a DTD anyway to ensure that the <![ and ]]>
> are all correct, it seems a bit of a waste to have to implement
> dedicated parsing rules for something you are going to be discarding
> regardless. Is this a holdover from SGML? Or was there some other
> motiviation?
>
> Just curiosity really; would be interested to learn more about XMLs history.

I have not consulted the decision records or the discussions of the
working group, so what I am about to say may be wrong as a historical
account of the design motivation.  But:

(a) The grammar for ignoreSectContents is a lot smaller and a lot
simpler than the grammar for extSubset, which is what you'd need if you
wanted to require that an ignored marked section consisted of
syntactically correct declarations.  If the only thing you are doing is
scanning an external subset looking for entity declarations, doing the
work of parsing all the declarations in the ignored subset would have a
very high cost to benefit ratio, even if doing so did not involve a lot
of entity expansions.

(b) One possible reason for marking a marked section with IGNORE is that
there is some syntax problem in the section which you have not yet
resolved; if the contents of the marked section were required to be
syntactically correct, you could not make the parser skip over that
problem except by commenting it out (error prone since comments don't
nest) or by deleting it entirely (possible, but not really the best
approach).

(c) Yes, ISO 8879 does provide that the only thing matched within an
ignored marked section are the beginnings and endings of marked
sections, so at least part of the motivation for the design is
compatibility with 8879.  And if memory serves I think there were at
least some in the group who felt that 8879 was right not to require
parsing of the content of ignored sections, beyond the minimum needed to
locate the correct ending delimiter.

If you want historically reliable information on the thinking in the WG,
I recommend reviewing the relevant threads in the mail archive at

  https://lists.w3.org/Archives/Public/w3c-sgml-wg/

but locating those threads will not necessarily be simple.  The
discussions I have located relate to the questions labeled A.6, A.7, and
A.8, all discussed in October 1996, but there may well be other
discussions later.

I hope this helps.

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS