OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Whitespace

[ Lists Home | Date Index | Thread Index ]
  • From: "Neil Bradley" <neil@bradley.co.uk>
  • To: xml-dev@ic.ac.uk
  • Date: Sun, 14 Sep 1997 09:26:45 +0000

> Reply-to:      Arnaud Le Taillanter <arnaud21@club-internet.fr>

> Neil Bradley proposed some simple rules (this is "version 1", a second
> version, a little more complex, but simple enough, was proposed). I
> really like
> the approach, even if it doesn't work for the moment.

I agree they are inadequate, but I think my second attempt was more 
acurate than my first, so I am surprised that you now dissect the 
first attempt. Still, I am happy to see this issue continue to be 

> *Rule 1*: standardization of input from different OSs.
>  CR, LF, CRLF are translated to a line end code.
> OBVIOUS!!!!!

Absolutely, but perhaps not to some programmers unfamiliar with, for 
example, the Mac line-end conventions.
> *Rule 2*: line end codes after a start tag or before an end tag are
> discarded. A simple rule. For usual elements, it is exactly what you
> expect :

> <P><EM>Two
> </EM>words</P>
> becomes
> <P><EM>Two</EM>words</P>
> The space between "Two" and "words" evaporated.
> Same thing with:
> <P><EM>
> Two
> </EM>words</P>
> I don't think this particular problem is important: the encoding
> is not natural. It should be an error!
>  I think everybody would write:
> <P><EM>Two</EM> words</P>, or
> <P>
> <EM>Two</EM> words
> </P>, etc...

I have long thought that 'some' formatting options should simply be 
made illegal, and that we should then ensure widespread knowledge of 
restrictions to future document authors. This is the main example I 
had already considered.

> Inside a preserved element, line end codes are wrongly discarded
> after element start tags and before element end tags:
>          blabla <EM>
>          bloblo</EM>
>          blublu
> </PRE>

Again, I think this coding is very unnatural. 

> *Rule 4*: Except in preserved elements (elements
> with a space attribute set to "PRESERVE") line end codes are
> discarded when preceded by a hard or
> soft hyphen (in the process, a soft hyphen is also discarded) and
> remaining line end codes are treated as space. 
> The rule concerning hyphens is not necessary. If it's a hard hyphen,
> don't put it at line end (who would do that?)

It is in fact a very natural action, which I have seen many times.

> Moreover, there is no use in an XML source file to put a soft
> hyphen at line end. Who would do that? In my poor life, I have no occa-
> sion to see some text with hyphens at line end.

I have. Many times.
> *Rule 5*: except in preserved elements, consecutive WS characters
> are reduced to a single space.
> I don't like this rule. If I put two spaces after a point, I mean two
> spaces.
> It's a typographic decision.
> Rule 5 is meant to allow some indentation:
> <P>
> He said:
>      <QUOTE>
>            I need some
>            indentation.SPSPIndentation is needed.
>      </QUOTE>
> </P>

NO IT WAS NOT! I have never said this, and I did not intend to imply 
this. The reason for this rule was purely to remove surplus spaces 
generated by the effect of previous rules.
> Arnaud

I am more than happy for people to pull-apart my proposed rules. That 
is what I put them here for. But please refer to the second attempt, 
not the first.


Neil Bradley - Author of The Concise SGML Companion.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS