[
Lists Home |
Date Index |
Thread Index
]
If there's a limited number of non-ASCII characters you need to handle, you
can use character maps in the XSLT 2.0 serializer.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Robert Koberg [mailto:rob@koberg.com]
> Sent: 06 October 2004 22:56
> To: XML Developers List
> Subject: [xml-dev] converting character entities to us-ascii
> /equivalents/
>
> Hi,
>
> I need to output several versions of a page (through XSL
> transformations), one of which is us-ascii (for email). But,
> the content
> might contain some characters that are not supported by
> us-ascii (like
> em dash - —).
>
> I want the character entities to remain in the content. When
> transforming to us-ascii, I want to replace the entities with
> some ascii
> text equivalent: For example, '—' would get converted to '--'.
>
> The XML is pulled into the transformation through the
> document function
> using a custom URIResolver.
>
> Is there an existing solution to this?
>
> Does Apache's FOP and the text renderer handle this type of thing?
>
> I have tried to set a ContentHandler (actually a
> DefaultHandler) on the
> XMLReader and tried to replace a character entity, but I am doing
> something wrong and a confused on how to proceed. Using the
> code below I
> get a recoverable error using saxon/aelfred and a failure when using
> saxon/xerces.
>
> Here is a snippet from the URIResolver:
>
>
> InputSource in = new InputSource(file.getAbsolutePath());
> SAXSource source = new SAXSource(in);
> XMLReader reader = null;
> try {
> reader =
> XMLReaderFactory.createXMLReader("com.icl.saxon.aelfred.SAXDriver");
> //reader =
> XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SA
> XParser");
> } catch (SAXException e) {
> System.err.println(e.getMessage());
> }
>
> reader.setContentHandler(new AsciiHandler());
>
> source.setXMLReader(reader);
>
> return source;
>
>
>
> And the DefaultHandler has one method:
>
>
> public void characters(char[] text, int start, int length) {
>
> String str = new String(text, start, length);
> if (str.indexOf(174) > -1) {
> str.replaceAll("\u00AE", "(Registered Trademark)");
> }
> text = str.toCharArray();
> }
>
> How can I do this? Is there a better way to handle this type of thing?
>
> thanks,
> -Rob
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
>
>
|