OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xml-dev] Sharing Techniques: White Spaces in HTML pages by XSLT



> >  Jonathan:
> J> Is your XSLT output supposed to be HTML ? What charset are you
> J> using ? Do you really need special (non-breaking) spaces, or will
> J> regular whitespace do ?
>
> Output was HTML. The charset would have been whetever MSXML3 put on it
> automatically (probably UTF-16). I just needed a series of spaces that
> will not be collapsed. Is that not what nbsp is for?

You could try being explicit about the output charset :

<xsl:output method="html" encoding="iso-8859-1" />

This will only be honored if your stylesheet is called either via
<?xml-stylesheet?>
or the DOMDocument.transformNodeToObject method. The
DOMDocument.transformNode method returns an UTF-16 BSTR and
ignores the "encoding" attribute.

If you're transforming from ASP, getting the right encoding out is a bit
tricky.
The naive way :
    Response.Write doc.transformNode(stylesheet)
will make the XSL processor produce UTF-16, which will then be converted
to iso-8859-1 (actually, to whatever codepage was specified through
Response.CodePage) for sending to the client by Response.Write.
And it gets you in trouble because MSXML, thanks to the method="html"
attribute, will still have inserted a META tag which describes the document
as being in iso-8851-1, and ASP won't touch it. The browser may get
quite confused (IE does).

The correct way :
    doc.transformNodeToObject stylesheet, Response
will plug the XSL processor directly into the HTTP response stream.
Just make sure you either :
 - tell your stylesheet's output encoding to iso-8859-1 (ASP's default
charset),
as shown above
 - use Response.Charset = "utf-8" to inform the client that your stylesheet
outputs UTF-8 (MSXML's default charset for XSL output to a stream)
 - set both to any other charset, as long as they're in sync you shouldn't
have
any problems.

When Response.Write encounters Unicode characters that can not be
represented in ASP's output charset they are replaced by another character
within
the charset. This means information is lost.

When you tell the XSL processor to output iso-8859-1, characters
outside the charset will be replaced by character references (e.g. &#160;)
so they will get to the client intact.

This may be what is happening to you with the nbsp characters, although
there is a non-breaking space character in iso-8859-1, at the same
codepoint (160) even (or is it only in windows-1252 ?), and ASP
correctly maps U+00A0 to that character.

The other possibility is that your client is broken and treats the U+00A0
character as regular whitespace, and only treats "&nbsp;" as a non-breaking
space. This is horrid but I imagine some implementations might have taken
such a shortcut. It's easy to check: write an HTML file that has "&#160;"
where
you would put "&nbsp;", and if these spaces are ignored, the browser is
broken.

If the problem is on the client side, you'll have to resort to hacks such as
disable-output-encoding to get the browser to render non-breaking spaces.

On the other hand, the abuse of &nbsp; is IMHO one of the worst
uglinesses of HTML. Nowadays you can do without most of the time.
For example when indenting code, you could set margins with CSS :

<style>
.code DIV { margin-left:2em; }
</style>

<div class="code" style="font-family:courier">
void f(void) {
<div>
int x;
<br>
<br>
return 12;
</div>
}
</div>

The indentation is much more predictable than with &nbsp; (you never know
the
width of &nbsp; !)
Notice also that I didn't have to keep track of the indentation level.

In other cases, you can use the "width" CSS attribute with an empty element,
as in :

<div>This whitespace for rent : [<span style="width:3em;"></span>]</div>

<pre> also works, you can always override the font it uses if you don't want
a monospaced font, for example. Though this may be seen as another kind of
HTML abuse.

There's also CSS's "whitespace:pre", but IE supports it only in IE6 in
"standards-compliant mode". I don't know about NS6.

All in all, I'm not sure there's still a need for &nbsp; outside of
non-breaking
spaces... anyone care to comment ?

Hope this helps
Jonathan