[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Tool converts records to XML
- From: Shlomi Fish <shlomif@shlomifish.org>
- To: Roger L Costello <costello@mitre.org>
- Date: Wed, 16 Nov 2022 04:47:28 +0200
Hi Roger and all,
On Tue, 15 Nov 2022 20:14:36 +0000
Roger L Costello <costello@mitre.org> wrote:
> Michael Kay wrote:
>
> > the "Barnes & Noble" problem. The number #1 blunder
> > when writing XML is not to bother escaping `<` and `&`
> > if they happen to occur in your input.
>
> Ouch!
>
> You are right Michael.
>
> Upon reflection, I realized that there is an even nastier problem lurking
> than the problem of converting & and < in the input record data into &
> and < in the output XML.
>
> My actual record data is encoded in Windows 1252. The records contain many
> occurrences of the degree symbol. In Windows 1252 the degree symbol is a
> single byte (hex B0) but in UTF-8 the degree symbol is two bytes (hex C2 B0).
> My AWK program doesn't convert the one byte Windows 1252 degree symbol to the
> two byte UTF-8 degree symbol. In fact, to be correct every Windows 1252
> character with an encoding above hex 0F must be converted to two bytes, and
> my AWK program doesn't do any of that.
>
> Eeeeeeek!
>
> For fun, I also wrote an XSLT program to convert the records to XML. My
> program uses the unparsed-text() function, which does all character
> conversions behind-the-scene (i.e., you have no idea that all the Windows
> 1252 characters above hex 0F are being converted to two byte UTF-8
> characters).
>
> To implement the character conversions in AWK would be a monumental task.
>
> Eeeeeeek!
>
> Lesson Learned: Don't use AWK to convert records to XML.
>
Re AWK , perl , and POSIX, see:
* https://shlomifish.livejournal.com/1991.html
* https://twitter.com/shlomif/status/1542047869989011457
* https://github.com/ekalinin/github-markdown-toc/pull/104/files
> Bummer!
>
> /Roger
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
--
Shlomi Fish https://www.shlomifish.org/
https://github.com/shlomif/validate-your-html - Validate Your HTML
If Chuck Norris is disappointed by you not following his advice, he’ll survive.
On the other hand, you will not.
— https://www.shlomifish.org/humour/bits/facts/Chuck-Norris/
Please reply to list if it's a mailing list post - https://shlom.in/reply .
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]