OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Whitespace

[ Lists Home | Date Index | Thread Index ]
  • From: Toby Speight <tms@ansa.co.uk>
  • To: "XML developers' list" <xml-dev@ic.ac.uk>
  • Date: 19 Aug 1997 19:32:20 +0100


Bill> Bill Donoghoe <URL:mailto:bdonoghoe@spin.net.au>

>> Sean Mc Grath wrote:

>> If you want a chunk of PCDATA in an XML doc, use the <PCDATA>
>> reserved element name.

>> Give me five minutes to put on the asbestos suit and then you flame
>> away....

Bill> Instead of flaming you I will hope onto the bandwagon (can I
Bill> borrow the asbestos suit for awhile).

Is it big enough for three?  ;-)

I believe that the PCDATA element proposal has merit, but not that
it's the solution to all ills.  If it's defined well, and if XML
processors do the Right Thing, it would eliminate misunderstandings
between author and reader.  But it might be too verbose for some
authors (who might be happy with application-defined rules, for

But if PCDATA is defined as "<!ELEMENT pcdata (#PCDATA)>", then it
might be sensible to install specific whitespace rules.  I'm leery of
recommending PRESERVE semantics, because

    <li><p>The first (and only) item
           in the list.</p>

occurs in many sources, and the naive transformation (enclosing all
#PCDATA with <PCDATA></PCDATA>) includes some "programmer's whitespace"
at the beginning of the line after <li>.

Bill> What is wrong with having some standard elements
Bill> (<PCDATA>,<CDATA>,<NEWLINE>)which are part of every XML DTD?

CDATA wouldn't work, for a start.

Bill> If you didn't want users to have to author these tags then
Bill> "normalisation" applications could be developed which could
Bill> convert "raw" XML into the "normalised" version.

Bill> Example:

Bill> <foo>
Bill>    I am data 1
Bill>    I am <emph>data</emph> 2
Bill> </foo>

Bill> could be normalised to:

Bill> [...]

Bill> or

Bill> <foo><pcdata>I am data 1 I am <emph>data</emph> 2</pcdata>
Bill> </foo>

Er, don't you mean

> <foo><pcdata>I am data 1 I am </pcdata><emph><pcdata>data</pcdata
> ></emph><pcdata> 2</pcdata></foo>

?  IWUTI that the PCDATA element was to contain *only* character data,
and no elements (i.e. RCDATA in SGML terms).  As in my element
declaration above, and with #PCDATA in content models replaced with
PCDATA throughout.

[For non-validating processors: we'd need a way to indicate that this
convention is in use]


Version: 2.6.3i
Charset: noconv
Comment: Processed by Mailcrypt 3.4, an Emacs/PGP interface


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)

  • References:
    • Whitespace
      • From: bdonoghoe@spin.net.au (Bill Donoghoe)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS