Re: [xml-dev] Quiz: is this XML well-formed?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Peter Flynn <peter@silmaril.ie>
To: xml-dev@lists.xml.org
Date: Fri, 5 Feb 2021 23:07:58 +0000

On 05/02/2021 11:33, Marcus Reichardt wrote:

Thanks Michael Sperberg-McQueen for sharing that bit of XML/ERB history.

Digging a bit deeper into old posts to this very mailing list, I
found that various whitespace issues were discussed at great length http://lists.xml.org/archives/xml-dev/199708/threads.html . However,
these seem to be about whitespace in content

Much good stuff, but also some that misses the point entirely. And TBH we didn't do as good a job on whitespace as perhaps we could have done, using that well-known tool, HindSight™.

[...]

So my guess would be that the requirement for space before an attribute name simply slipped into a first draft for XML because it was more compact and unambiguous than, say

Markup ::= '<' Name (S (Name Eq QuotedCData S?))* S? '>'

and was never much of a topic for discussion thereafter.

That's largely the reason, I think: at the time it simply wasn't an issue, and as Michael said, why on earth would anyone want it to be?

Note that SGML also allows

    <test att =  "x" otheratt = "y">

It also allows multiple newlines before and after the equals sign, with arbitrary amounts of other whitespace. Both rxp and onsgmls/osgmlnorm in -wxml mode accept it unquestioningly.

Whitespace as a delimiter between tokens is a convention we all accept in most languages, human and computer, with some obvious and notable exceptions. It generally makes life easier, although people who have grown up without it do not appear to have been damaged by its absence :-)

On 04/02/2021 21:55, C. M. Sperberg-McQueen wrote:
[...]

But I don’t think it makes start-tags easier to parse.  I don’t
think it makes them easier to process without a full XML parser.  And
I don’t think it makes start-tags easier to read for humans.

An additional small reason they make life easier is that when XML (or indeed SGML) is being generated by a program that is not itself XML-aware, it is just another thing not to have to worry about, like suppressing otherwise unwanted line-ends or default word space tokens. If for various exogenous reasons a program outputs

<html><head>

<title>foo</title><style
type

=
"text/css"

/></head><body><p>foo</p></body></html>

it simply isn't important for subsequent xml-aware processing. (For those not dealing with the publishing industry, LARGE quantities of XML — and some SGML — are still produced in this manner.)

It's good that we revisit these topics occasionally, but in this case I think it can be marked "no action needed" and we can pass to other matters.

Peter

References:
- Quiz: is this XML well-formed?
  - From: Roger L Costello <costello@mitre.org>
- Re: [xml-dev] Quiz: is this XML well-formed?
  - From: "Liam R. E. Quin" <liam@fromoldbooks.org>
- Re: [xml-dev] Quiz: is this XML well-formed?
  - From: Marcus Reichardt <u123724@gmail.com>
- Re: [xml-dev] Quiz: is this XML well-formed?
  - From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Re: [xml-dev] Quiz: is this XML well-formed?
  - From: Marcus Reichardt <u123724@gmail.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]