OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   converting character entities to us-ascii /equivalents/

[ Lists Home | Date Index | Thread Index ]
  • To: XML Developers List <xml-dev@lists.xml.org>
  • Subject: converting character entities to us-ascii /equivalents/
  • From: Robert Koberg <rob@koberg.com>
  • Date: Wed, 06 Oct 2004 14:55:58 -0700
  • User-agent: Mozilla Thunderbird 0.7 (Macintosh/20040616)

Hi,

I need to output several versions of a page (through XSL 
transformations), one of which is us-ascii (for email). But, the content 
might contain some characters that are not supported by us-ascii (like 
em dash - &#151;).

I want the character entities to remain in the content. When 
transforming to us-ascii, I want to replace the entities with some ascii 
text equivalent: For example, '&#151;' would get converted to '--'.

The XML is pulled into the transformation through the document function 
using a custom URIResolver.

Is there an existing solution to this?

Does Apache's FOP and the text renderer handle this type of thing?

I have tried to set a ContentHandler (actually a DefaultHandler) on the 
XMLReader and tried to replace a character entity, but I am doing 
something wrong and a confused on how to proceed. Using the code below I 
get a recoverable error using saxon/aelfred and a failure when using 
saxon/xerces.

Here is a snippet from the URIResolver:


InputSource in = new InputSource(file.getAbsolutePath());
SAXSource source = new SAXSource(in);
XMLReader reader = null;
try {
   reader = 
XMLReaderFactory.createXMLReader("com.icl.saxon.aelfred.SAXDriver");
   //reader = 
XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
} catch (SAXException e) {
   System.err.println(e.getMessage());
}

reader.setContentHandler(new AsciiHandler());

source.setXMLReader(reader);

return source;



And the DefaultHandler has one method:


public void characters(char[] text, int start, int length) {

   String str = new String(text, start, length);
   if (str.indexOf(174) > -1) {
    str.replaceAll("\u00AE", "(Registered Trademark)");
   }
   text = str.toCharArray();
}

How can I do this? Is there a better way to handle this type of thing?

thanks,
-Rob





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS