xml-dev - Re: Comparison of URIs: Character encoding.

Re: Comparison of URIs: Character encoding.

[ Lists Home | Date Index | Thread Index ]

From: Ian Graham <igraham@ic-unix.ic.utoronto.ca>
To: Alan Kennedy <alank@xhaus.com>
Date: Tue, 28 Nov 2000 11:53:03 -0500 (EST)


The mapping to an ascii character sequence is defined in the URI
specifications. However, there is the 'old' way (which allowed 
only Latin-1 characters in a URI) and the 'new' way ( which alows
any characters, but requires a choice of charset, with UTF-8
being the recommended one), and the two are quite different. The
formal specifications are in a bunch of RFCs, including

http://www.ietf.org/rfc/rfc2396.txt  (URI syntax - updates
                                      RFCs 1808, 1738 )
http://www.ietf.org/rfc/rfc2718.txt  (Guidelines for new URL schemes,
                                      with a note on charset issues)

Hoope this helps --

Ian
--
Ian Graham ..........................  http://www.utoronto.ca/ian/
i a n   d o t   g r a h a m    a t    u t o r o n t o   d o t  c a 




On Sun, 26 Nov 2000, Alan Kennedy wrote:

> Hello again,
> 
> Another question about identifiers, this time URIs.
> 
> I need to compare URIs, both as SYSTEM identifiers and Namespace
> identifiers. The question I need to answer is this:-
> 
> What character encoding should I use for encoding and decoding of
> escaped values in URIs?
>  
> For example: if I see "%7e"("~" in USASCII) in a URI, what character
> en(de)coding should I use to map that to a single character for
> comparison purposes? What about "%e9" ("e-acute" in "iso-8859-1")?
> 
> Another example: If I see a non-USASCII character in an URI,
> say "ü" ("u-umlaut"), should I escape that as "%fc", as in 
> "iso-8859-1"? Or should I be using UTF-8?
> 
> Or is there no such universal mapping?
> 
> Again, TIA for any enlightenment.
> 
> Alan.
>

References:
- Comparison of URIs: Character encoding.
  - From: Alan Kennedy <alank@xhaus.com>

Prev by Date: SVG ?
Next by Date: Re: Comparing PUBLIC identifiers for equality.
Previous by thread: Comparison of URIs: Character encoding.
Next by thread: RE: Comparison of URIs: Character encoding.
Index(es):
- Date
- Thread