[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Quiz: is this XML well-formed?
- From: Peter Flynn <peter@silmaril.ie>
- To: xml-dev@lists.xml.org
- Date: Fri, 5 Feb 2021 23:07:58 +0000
On 05/02/2021 11:33, Marcus Reichardt wrote:
Thanks Michael Sperberg-McQueen for sharing that bit of XML/ERB history.
Digging a bit deeper into old posts to this very mailing list, I
found that various whitespace issues were discussed at great length
http://lists.xml.org/archives/xml-dev/199708/threads.html . However,
these seem to be about whitespace in content
Much good stuff, but also some that misses the point entirely. And TBH
we didn't do as good a job on whitespace as perhaps we could have done,
using that well-known tool, HindSight™.
[...]
So my guess would be that the requirement for space before an
attribute name simply slipped into a first draft for XML because it
was more compact and unambiguous than, say
Markup ::= '<' Name (S (Name Eq QuotedCData S?))* S? '>'
and was never much of a topic for discussion thereafter.
That's largely the reason, I think: at the time it simply wasn't an
issue, and as Michael said, why on earth would anyone want it to be?
Note that SGML also allows
<test att = "x" otheratt = "y">
It also allows multiple newlines before and after the equals sign, with
arbitrary amounts of other whitespace. Both rxp and onsgmls/osgmlnorm in
-wxml mode accept it unquestioningly.
Whitespace as a delimiter between tokens is a convention we all accept
in most languages, human and computer, with some obvious and notable
exceptions. It generally makes life easier, although people who have
grown up without it do not appear to have been damaged by its absence :-)
On 04/02/2021 21:55, C. M. Sperberg-McQueen wrote:
[...]
But I don’t think it makes start-tags easier to parse. I don’t
think it makes them easier to process without a full XML parser. And
I don’t think it makes start-tags easier to read for humans.
An additional small reason they make life easier is that when XML (or
indeed SGML) is being generated by a program that is not itself
XML-aware, it is just another thing not to have to worry about, like
suppressing otherwise unwanted line-ends or default word space tokens.
If for various exogenous reasons a program outputs
<html><head>
<title>foo</title><style
type
=
"text/css"
/></head><body><p>foo</p></body></html>
it simply isn't important for subsequent xml-aware processing. (For
those not dealing with the publishing industry, LARGE quantities of XML
— and some SGML — are still produced in this manner.)
It's good that we revisit these topics occasionally, but in this case I
think it can be marked "no action needed" and we can pass to other matters.
Peter
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]