xml-dev - Re: [xml-dev] converting character entities to us-ascii /equivalents/

Re: [xml-dev] converting character entities to us-ascii /equivalents/

[ Lists Home | Date Index | Thread Index ]

To: Michael Kay <michael.h.kay@ntlworld.com>
Subject: Re: [xml-dev] converting character entities to us-ascii /equivalents/
From: Robert Koberg <rob@koberg.com>
Date: Wed, 06 Oct 2004 15:55:26 -0700
Cc: "'XML Developers List'" <xml-dev@lists.xml.org>
In-reply-to: <20041006224309.TQZP20770.mta10-svc.ntlworld.com@Turtle>
References: <20041006224309.TQZP20770.mta10-svc.ntlworld.com@Turtle>
User-agent: Mozilla Thunderbird 0.7 (Macintosh/20040616)

Michael Kay wrote:

> If there's a limited number of non-ASCII characters you need to handle, you
> can use character maps in the XSLT 2.0 serializer.

Sorry, I should have specified that I am using v1.0. I intend to move to 
v2.0 sometime soon, but have not had the time to learn it and convert 
all of my stylesheets.

But that is good to know.

thanks,
-Rob

> 
> Michael Kay
> http://www.saxonica.com/
> 
> 
>>-----Original Message-----
>>From: Robert Koberg [mailto:rob@koberg.com] 
>>Sent: 06 October 2004 22:56
>>To: XML Developers List
>>Subject: [xml-dev] converting character entities to us-ascii 
>>/equivalents/
>>
>>Hi,
>>
>>I need to output several versions of a page (through XSL 
>>transformations), one of which is us-ascii (for email). But, 
>>the content 
>>might contain some characters that are not supported by 
>>us-ascii (like 
>>em dash - &#151;).
>>
>>I want the character entities to remain in the content. When 
>>transforming to us-ascii, I want to replace the entities with 
>>some ascii 
>>text equivalent: For example, '&#151;' would get converted to '--'.
>>
>>The XML is pulled into the transformation through the 
>>document function 
>>using a custom URIResolver.
>>
>>Is there an existing solution to this?
>>
>>Does Apache's FOP and the text renderer handle this type of thing?
>>
>>I have tried to set a ContentHandler (actually a 
>>DefaultHandler) on the 
>>XMLReader and tried to replace a character entity, but I am doing 
>>something wrong and a confused on how to proceed. Using the 
>>code below I 
>>get a recoverable error using saxon/aelfred and a failure when using 
>>saxon/xerces.
>>
>>Here is a snippet from the URIResolver:
>>
>>
>>InputSource in = new InputSource(file.getAbsolutePath());
>>SAXSource source = new SAXSource(in);
>>XMLReader reader = null;
>>try {
>>   reader = 
>>XMLReaderFactory.createXMLReader("com.icl.saxon.aelfred.SAXDriver");
>>   //reader = 
>>XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SA
>>XParser");
>>} catch (SAXException e) {
>>   System.err.println(e.getMessage());
>>}
>>
>>reader.setContentHandler(new AsciiHandler());
>>
>>source.setXMLReader(reader);
>>
>>return source;
>>
>>
>>
>>And the DefaultHandler has one method:
>>
>>
>>public void characters(char[] text, int start, int length) {
>>
>>   String str = new String(text, start, length);
>>   if (str.indexOf(174) > -1) {
>>    str.replaceAll("\u00AE", "(Registered Trademark)");
>>   }
>>   text = str.toCharArray();
>>}
>>
>>How can I do this? Is there a better way to handle this type of thing?
>>
>>thanks,
>>-Rob
>>
>>
>>-----------------------------------------------------------------
>>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>initiative of OASIS <http://www.oasis-open.org>
>>
>>The list archives are at http://lists.xml.org/archives/xml-dev/
>>
>>To subscribe or unsubscribe from this list use the subscription
>>manager: <http://www.oasis-open.org/mlmanage/index.php>
>>
>>
>

References:
- RE: [xml-dev] converting character entities to us-ascii /equivalents/
  - From: "Michael Kay" <michael.h.kay@ntlworld.com>

Prev by Date: RE: [xml-dev] converting character entities to us-ascii /equivalents/
Next by Date: XSL help is required.
Previous by thread: RE: [xml-dev] converting character entities to us-ascii /equivalents/
Next by thread: Re: [xml-dev] converting character entities to us-ascii /equivalents/
Index(es):
- Date
- Thread