[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Re: [xml-dev] Sharing Techniques: White Spaces in HTML pages by XSLT
From: "Kevin Burges" <email@example.com>
> >> That's what I use, but when my XSLT output goes to other machines,
> >> the   characters are not always recognised as spaces. How come?
> > Jonathon:
> J> Is your XSLT output supposed to be HTML ? What charset are you
> J> using ? Do you really need special (non-breaking) spaces, or will
> J> regular whitespace do ?
> Output was HTML. The charset would have been whetever MSXML3 put on it
> automatically (probably UTF-16). I just needed a series of spaces that
> will not be collapsed. Is that not what nbsp is for?
In order to figure out what an HTML browser is doing you need to know:
1) What is the actual encoding of your HTML document? Java will default to encoding used by your system (determined by the locale) in the absense of other information. I am not sure what MSXML3 does. Get a hex editor and look at the byte: is it AO? If it is not, then that is your problem. Start there.
(If is it 00 then A0, then you are using UTF-16. If it is some code > 7F followed by another code >7F it is probably UTF-8).
2) What encoding does your HTML document say it is? This will be in a META tag in the HEAD, if it is anywhere, in a charset parameter.
3) What does the HTTP server say the charset parameter is? System administrators rarely set this correctly, or change this from the default. The default for HTML is ISO 8859-1 (in which character 160 is used for nbsp). Make sure your webserver correctly labels your document's charset.
4) Did any proxies alter the encoding? Probably not a concern outside Japan.
5) What is the recipients browser set to? Charsets were so typically wrong that most browsers guess: one browser apparantly guesses by looking through the last five encodings that were used.
Yikes. This is so complicated! That is why sending XML treats encoding violations so strictly: it is nasty but treating things slackly like HTML does just makes data interoperability unreliable.
When I faced the same thing a year ago, I ended up escaping the encoding and forcing Because HTML is so stuffed, it is not an unreasonable solution IMHO.