[
Lists Home |
Date Index |
Thread Index
]
- From: Ian Graham <igraham@ic-unix.ic.utoronto.ca>
- To: Alan Kennedy <alank@xhaus.com>
- Date: Tue, 28 Nov 2000 11:53:03 -0500 (EST)
The mapping to an ascii character sequence is defined in the URI
specifications. However, there is the 'old' way (which allowed
only Latin-1 characters in a URI) and the 'new' way ( which alows
any characters, but requires a choice of charset, with UTF-8
being the recommended one), and the two are quite different. The
formal specifications are in a bunch of RFCs, including
http://www.ietf.org/rfc/rfc2396.txt (URI syntax - updates
RFCs 1808, 1738 )
http://www.ietf.org/rfc/rfc2718.txt (Guidelines for new URL schemes,
with a note on charset issues)
Hoope this helps --
Ian
--
Ian Graham .......................... http://www.utoronto.ca/ian/
i a n d o t g r a h a m a t u t o r o n t o d o t c a
On Sun, 26 Nov 2000, Alan Kennedy wrote:
> Hello again,
>
> Another question about identifiers, this time URIs.
>
> I need to compare URIs, both as SYSTEM identifiers and Namespace
> identifiers. The question I need to answer is this:-
>
> What character encoding should I use for encoding and decoding of
> escaped values in URIs?
>
> For example: if I see "%7e"("~" in USASCII) in a URI, what character
> en(de)coding should I use to map that to a single character for
> comparison purposes? What about "%e9" ("e-acute" in "iso-8859-1")?
>
> Another example: If I see a non-USASCII character in an URI,
> say "ü" ("u-umlaut"), should I escape that as "%fc", as in
> "iso-8859-1"? Or should I be using UTF-8?
>
> Or is there no such universal mapping?
>
> Again, TIA for any enlightenment.
>
> Alan.
>
|