[
Lists Home |
Date Index |
Thread Index
]
- From: Robin LaFontaine <robin@monsell.co.uk>
- To: Rick JELLIFFE <ricko@geotempo.com>, "Aurenz, Scot" <SAurenz@Rational.Com>
- Date: Wed, 10 May 2000 09:30:41 +0100
Rick, Aurenz,
Thanks for your replies. A couple of clarifications:
1. Rick asks: 'Do you mean entity references or numeric character references?'
I mean entity references as specified in the DTD.
2. Aurenz: I do not want to simply disable the entity translation
processing because I want to manipulate the data with the entities
expanded, and then write the file out again. But if it is written out
without the entities then it looks a lot different for the user. So,
it would be preferable to put the entities back in again!
Neither of you seems to have a solution to this one! Perhaps I will
have to do it myself, but it seems surprising that there is not a
common solution.
Robin
At 3:11 am +0800 10/5/00, Rick JELLIFFE wrote:
>"Aurenz, Scot" wrote:
> >
> > > Is there an easy way to process an XML document and put the entity
> > > references back into it?
>
>Do you mean entity references or numeric character references?
>
>If it is the latter, you can try a lossless transcoder. The only one of
>these public is xml-tcs, which you can find at
>
>http://www.ascc.net/xml/en/utf-8/transcode-index.html
>
>This is a set of patches to Plan9's tcs. Because of copyright I cannot
>ship a combined version or a binary, but you can put the pieces
>together. It can convert characters not available in the output encoding
>into various formats, including doubly delimited:
>
> STRIP: no delimiter,
> UNKNOWN: put in unknown character indicator "?" or FFFD
> UNICODE: Unicode-style U+HHHH
> JAVA: Java-style \uHHHH
> JAVA_DD: Java-style \\uHHHH
> XML: XML-style &#xHHHH;
> XML_DD: XML-style &#xHHHH;
> SPREAD1: Old SPREAD &U-HHHH;
> SPREAD1_DD: Old SPREAD &U-HHHH;
> SPREAD2: New SPREAD &UHHHH;
> SPREAD2_DD: New SPREAD &UHHHH;
> CSS1: CSS1 \HHHH
> CSS1_DD: CSS1 \\HHHH
> CSS2: CSS2 \\00HHHH (space following is delimiter)
> CSS2_DD: CSS2 \\00HHHH (space following is delimiter)
> SGML: SGML-, HTML (< 4) and Netscape style decimal
>&#DDDDDD;
> SGML_DD: SGML-style &#DDDDDD;
>
>
>Rick Jelliffe
-- -----------------------------------------------------------------
Robin La Fontaine, Monsell EDM Ltd
(R&D Project Management, Engineering Data Exchange using XML, EDIF)
Tel: +44 1684 592 144 Fax: +44 1684 594 504 or +44 870 054 2811
Email: robin@monsell.co.uk http://www.monsell.co.uk
***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************
|