XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Retain or discard whitespace surrounding an element?

On 27/12/2021 12:03, Roger L Costello wrote:

[snip]
> If the XML document is not associated with a schema (XSD, DTD, or 
> RNG), then the answer is always (a) and the whitespace may be safely 
> discarded.

I think it's other way round. In the absence of a schema/DTD, whitespace
must be retained and passed to the application. Only a schema/DTD can
identify where whitespace can safely be ignored.

> So, sometimes the content of <Document> is one thing, sometimes it's
> another thing. This complicates lexers (and parsers) because they must
> have external, out-of-band knowledge about the document. 

Yes, exactly.

> Is that good language design?

For the original purposes of SGML and XML (large text documents with
both element content and mixed content), yes. In those cases, a schema
is pretty much always used, so the question never arises (it's [a]).

If you use XML to hold what is essentially rectangular data (rows and
columns), or if your application can dispense with mixed content, the
question also never arises (it's [b] and it's up to the application to
ignore whitespace-only nodes).

Basically it's a feature, not a bug 🐞

The only notable bug is (was?) in software that discards a
whitespace-only node that is the sole node between adjacent elements
when a schema/DTD has identified the context as being mixed content.
That is /always/ wrong.

Peter


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS