XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] tool, library to replace pseudo escaped entities withreal characters


Do you want to repair the file? Perhaps this could work:

Make an xslt2 null transform.
Make a template for the description element. 
In that template do a text substitution on data content  to replace " &" with  some unlikely single character, eg 䀀 convert to a sequence of codepoints with string-to-codepoints(), and put that into a variable.
Iterate over each codepoint in the variable, outputting it as a character, and when you find 0x4000; output it in xsl:text with disable-output-escaping to true.

On 30/10/2014 11:28 PM, "Uche Ogbuji" <uche@ogbuji.net> wrote:
On Thu, Oct 30, 2014 at 5:16 AM, Gareth Oakes <goakes@gpsl.co> wrote:
>I'm sure someone must have written a nice little python script or
>something similar to do this sometime, anyway I have some XML with
>stuff like
>
><description>PJ&amp;nbsp;72 fra &amp;Ouml;rsj&amp;ouml; Belysning er
>en funktionel lampe&amp;nbsp;som kan justeres efter eget behov.
>Fremstillet af lakeret metal og&amp;nbsp;f&aring;s i mange
>farver.&amp;nbsp;I serien f&amp;aring;s skrivebordslamper, gulvlamper,
>loftslamper.&amp;nbsp;&amp;nbsp;</description>
>
>anyway, rather than sitting down and writing a solution for this
>problem I am supposing someone has written it in the past, and I can
>just use that.

I'm guessing you want the &amp;s to become ampersands? I'm pretty sure the
regular expression /&amp;/&/g would work in most environments.

Could be dangerous because a plain old &amp; would reduce to a WF error after that transform, and those are pretty common. Unless, that is, you know that &amp; has been "psychoescaped" to &amp;amp; . Can't tell from the sample given.

In other words, the problem is underspecified to provide an off-the shelf solution; it depends on knowing the original pattern reliably, so it might indeed be that writing a bit of code is best.


--
Uche Ogbuji                                       http://uche.ogbuji.net
Founding Partner, Zepheira                  http://zepheira.com
Author, _Ndewo, Colorado_                 http://uche.ogbuji.net/ndewo/
Founding editor, Kin Poetry Journal      http://wearekin.org
http://copia.ogbuji.net    http://www.linkedin.com/in/ucheogbuji    http://twitter.com/uogbuji


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS