[
Lists Home |
Date Index |
Thread Index
]
- From: Toby Speight <tms@ansa.co.uk>
- To: "XML developers' list" <xml-dev@ic.ac.uk>
- Date: 19 Aug 1997 19:32:20 +0100
-----BEGIN PGP SIGNED MESSAGE-----
Bill> Bill Donoghoe <URL:mailto:bdonoghoe@spin.net.au>
>> Sean Mc Grath wrote:
>> If you want a chunk of PCDATA in an XML doc, use the <PCDATA>
>> reserved element name.
>> Give me five minutes to put on the asbestos suit and then you flame
>> away....
Bill> Instead of flaming you I will hope onto the bandwagon (can I
Bill> borrow the asbestos suit for awhile).
Is it big enough for three? ;-)
I believe that the PCDATA element proposal has merit, but not that
it's the solution to all ills. If it's defined well, and if XML
processors do the Right Thing, it would eliminate misunderstandings
between author and reader. But it might be too verbose for some
authors (who might be happy with application-defined rules, for
instance).
But if PCDATA is defined as "<!ELEMENT pcdata (#PCDATA)>", then it
might be sensible to install specific whitespace rules. I'm leery of
recommending PRESERVE semantics, because
<ul>
<li><p>The first (and only) item
in the list.</p>
</li>
<ul>
occurs in many sources, and the naive transformation (enclosing all
#PCDATA with <PCDATA></PCDATA>) includes some "programmer's whitespace"
at the beginning of the line after <li>.
Bill> What is wrong with having some standard elements
Bill> (<PCDATA>,<CDATA>,<NEWLINE>)which are part of every XML DTD?
CDATA wouldn't work, for a start.
Bill> If you didn't want users to have to author these tags then
Bill> "normalisation" applications could be developed which could
Bill> convert "raw" XML into the "normalised" version.
Bill> Example:
Bill> <foo>
Bill> I am data 1
Bill> I am <emph>data</emph> 2
Bill> </foo>
Bill> could be normalised to:
Bill> [...]
Bill> or
Bill> <foo><pcdata>I am data 1 I am <emph>data</emph> 2</pcdata>
Bill> </foo>
Er, don't you mean
> <foo><pcdata>I am data 1 I am </pcdata><emph><pcdata>data</pcdata
> ></emph><pcdata> 2</pcdata></foo>
? IWUTI that the PCDATA element was to contain *only* character data,
and no elements (i.e. RCDATA in SGML terms). As in my element
declaration above, and with #PCDATA in content models replaced with
PCDATA throughout.
[For non-validating processors: we'd need a way to indicate that this
convention is in use]
-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv
Comment: Processed by Mailcrypt 3.4, an Emacs/PGP interface
iQB1AwUBM/m6uedsuUurvcRtAQHgUwL/UR/UZ5XUhzEH84s+67Ulu5P09B5G2OxF
ahFvctktCu0KuzClfmkZiQUbHS7adGvlxfFtm5da2tgsqOszEPONQOfjyR9S3D6C
Qr3andAcVy9+wsfB65yd0eqsMUBhctZe
=0PcY
-----END PGP SIGNATURE-----
--
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
- References:
- Whitespace
- From: bdonoghoe@spin.net.au (Bill Donoghoe)
|