OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Using entities for me dash problem

[ Lists Home | Date Index | Thread Index ]



> Yes, but I thought using the glyph instead of the NCR was not valid XML?

Glyphs are pictorial representations of a character (as produced by
fonts, typically) You don't mean glyph here. in XML you can use any
character that is in the specified encoding. So if you are using ASCII
for example you can use "A" as a character directly, or as A
But you can't enter an e-acute or an em dash directly as they are not in
the encoding so you have to use an NCR for them. UTF8 on the other hand
encodes eery character so you can always use character data directly (or
you can use NCR if you want)

> Glyphs are what you are reading right now,
But they are not what is in an XML file.
The  shapes I see on my screen depend on the shapes specified in the
fonts that I am using. 

> if you want to call them
> "characters" then that's fine by me. 

As you are finding, characters, glyphs and encodings can be confusing,
it's best to keep the different layers clearly distinguished.
You might want to look at

http://www.w3.org/TR/2003/WD-charmod-20030822/ 

which distinguishes these in some detail.

> transform a document with UTF encoded XML, it should output
> NCR data, not glyphs, or characters.

No, XSLT will use character data (in most implementations) not NCR if
you ask for utf8 encoded output (or accept that as the default). To force
NCRs to be used, specify an encoding such as US-ASCII that does not
contain the characters.

> it's false. &#8212 is not ASCII data, is it? 
Yes, it is. that bit of your message contains 6 bytes of information.
You want me to understand it as 
ampersand-hash-eight-two-one-two
I can only do that if I know that the bytes represent letters in ascii
(or an ascii compatible encoding) that is what the xml encoding
declaration is for. It does not specify the underlying character set,
there is no need to specify that as it is _always_ unicode in an XML
context..

> Is this an application/parsing error or is this
> currently how XSL works?

It's how XSLT works, which is quite logical once you get to grips with
the meaning of an encoding declaration in XML.

David

-- 
http://www.dcarlisle.demon.co.uk/matthew

________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS