OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Using entities for me dash problem

[ Lists Home | Date Index | Thread Index ]

On 9/12/03 6:23 AM, "David Carlisle" <davidc@nag.co.uk> wrote:

> 
> 
>> Yes, but I thought using the glyph instead of the NCR was not valid XML?
> 
> Glyphs are pictorial representations of a character (as produced by
> fonts, typically) You don't mean glyph here. in XML you can use any
> character that is in the specified encoding. So if you are using ASCII
> for example you can use "A" as a character directly, or as &#65;
> But you can't enter an e-acute or an em dash directly as they are not in
> the encoding so you have to use an NCR for them. UTF8 on the other hand
> encodes eery character so you can always use character data directly (or
> you can use NCR if you want)
> 
>> Glyphs are what you are reading right now,
> But they are not what is in an XML file.
> The  shapes I see on my screen depend on the shapes specified in the
> fonts that I am using.
> 
>> if you want to call them
>> "characters" then that's fine by me.
> 
> As you are finding, characters, glyphs and encodings can be confusing,
> it's best to keep the different layers clearly distinguished.
> You might want to look at

I'm not finding them confusing, thank you. I'm finding that arguing over
semantics is unnecessary. Yes, a glyph is representative of a certain *font
style*, but in typographic circles when I say glyph people know what I'm
talking about. There's no need to get technical about it in this discussion
either. As far as I'm concerned, ‹ <-- that's a glyph, not a character,
rendered in Monaco or whatever default font your computer is using.
Remember, I'm not a computer scientist, so the difference between a glyph
and a character means very little to me at present.

>> transform a document with UTF encoded XML, it should output
>> NCR data, not glyphs, or characters.
> 
> No, XSLT will use character data (in most implementations) not NCR if
> you ask for utf8 encoded output (or accept that as the default). To force
> NCRs to be used, specify an encoding such as US-ASCII that does not
> contain the characters.

Thanks again, as I stated before, this works just fine. But as I've stated,
if I ask for ASCII output, it's so that the NCR will be preserved.
> 
>> it's false. &#8212 is not ASCII data, is it?
> Yes, it is. that bit of your message contains 6 bytes of information.
> You want me to understand it as
> ampersand-hash-eight-two-one-two
> I can only do that if I know that the bytes represent letters in ascii
> (or an ascii compatible encoding) that is what the xml encoding
> declaration is for. It does not specify the underlying character set,
> there is no need to specify that as it is _always_ unicode in an XML
> context..

Once again, you're arguing computer logic with me. Okay, yes, it's ASCII,
fine. But I wanted to *preserve* the NCR and keep the declaration as UTF-8.
That's a perfectly acceptable thing to ask for. Unfortunately, XSL does not
allow me to preserve the NCR. It's "dumb". If anything, your above statement
*proves* that the output method shouldn't be linked to the result
declaration, because then the computer is assuming what the declaration
should be based on how it was transformed. If the transformed result does
not necessarily represent the declaration, I should have be able to change
the declaration. In other words, if I've preserved the NCR for the sake of
making the result UTF-8, then it shouldn't say US-ASCII just because I *had*
to transform it due to the way the computer is programmed to encode these
documents.

To make it simpler, if I want to preserve NCR, there should be an option
without using ASCII encoding, or rather, I should be able to declare
whatever encoding I wish the result to be, regardless of how the
transformation was encoded. Kind of like "browser spoofing" I suppose.
Because in the end I'm just going to change it anyway, right? So that when
it's rendered to screen, we see em dashes and not 8212 all over the place,
because it's specified as what it *is*, not how it's been *transformed*.


>> Is this an application/parsing error or is this
>> currently how XSL works?
> 
> It's how XSLT works, which is quite logical once you get to grips with
> the meaning of an encoding declaration in XML.

I think I've come to grips with the fact that it's illogical and output
encoding should NOT be linked to the result declaration as they can be two
different things.

/johnny :)



-- 
"You'll see it when you believe it." 





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS