OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Conversion of characters



Raj wrote:
> I have a HTML document and I wish to convert a part of it and store it as an
> XML document. I'm using  VB and I find that in a sentence like:
> 
> <P> This is Will Smith's Wild West Zone </P>
> 
> the apostrophy ( ' ) is intrepreted and displayed as a Question (?) mark.
> Is there a way to overcome this problem?

It's probably not an apostrophe but a right single quotation mark, which
I'm not going to bother to attempt to paste into this ASCII email, since
it's well outside the ASCII range.

You will probably find it helpful to know that this character's Unicode
scalar value is 8217 decimal, or 2019 hex. In HTML and XML you could write
&#8217; or &#x2019; but you may prefer to just replace instances of these
characters with regular apostrophes (decimal 39 or hex 27).

Your application apparently knows that it is character number 8217, but it
is unable to represent it in the character map that is being used for your
output.  For example, if you are emitting iso-8859-1 output, there is no
right single quotation mark in this encoding. Hence the question mark.

   - Mike
_____________________________________________________________________________
mike j. brown, software engineer at  |  xml/xslt: http://skew.org/xml/
webb.net in denver, colorado, USA    |  personal: http://hyperreal.org/~mike/