Do you want to repair the file? Perhaps this could work:
Make an xslt2 null transform.
Make a template for the description element.
In that template do a text substitution on data content to replace " &" with some unlikely single character, eg 䀀 convert to a sequence of codepoints with string-to-codepoints(), and put that into a variable.
Iterate over each codepoint in the variable, outputting it as a character, and when you find 0x4000; output it in xsl:text with disable-output-escaping to true.
On Thu, Oct 30, 2014 at 5:16 AM, Gareth Oakes <goakes@gpsl.co> wrote:-->I'm sure someone must have written a nice little python script or
>something similar to do this sometime, anyway I have some XML with
>stuff like
>
><description>PJ&nbsp;72 fra &Ouml;rsj&ouml; Belysning er
>en funktionel lampe&nbsp;som kan justeres efter eget behov.
>Fremstillet af lakeret metal og&nbsp;fås i mange
>farver.&nbsp;I serien f&aring;s skrivebordslamper, gulvlamper,
>loftslamper.&nbsp;&nbsp;</description>
>
>anyway, rather than sitting down and writing a solution for this
>problem I am supposing someone has written it in the past, and I can
>just use that.
I'm guessing you want the &s to become ampersands? I'm pretty sure the
regular expression /&/&/g would work in most environments.Could be dangerous because a plain old & would reduce to a WF error after that transform, and those are pretty common. Unless, that is, you know that & has been "psychoescaped" to &amp; . Can't tell from the sample given.In other words, the problem is underspecified to provide an off-the shelf solution; it depends on knowing the original pattern reliably, so it might indeed be that writing a bit of code is best.Uche Ogbuji http://uche.ogbuji.net
Founding Partner, Zepheira http://zepheira.comAuthor, _Ndewo, Colorado_ http://uche.ogbuji.net/ndewo/
Founding editor, Kin Poetry Journal http://wearekin.org
http://copia.ogbuji.net http://www.linkedin.com/in/ucheogbuji http://twitter.com/uogbuji