XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Tool converts records to XML

Hi Roger and all,

On Tue, 15 Nov 2022 20:14:36 +0000
Roger L Costello <costello@mitre.org> wrote:

> Michael Kay wrote:
> 
> > the "Barnes & Noble" problem. The number #1 blunder 
> > when writing XML is not to bother escaping `<` and `&` 
> > if they happen to occur in your input.  
> 
> Ouch!
> 
> You are right Michael.
> 
> Upon reflection, I realized that there is an even nastier problem lurking
> than the problem of converting & and < in the input record data into &amp;
> and &lt; in the output XML. 
> 
> My actual record data is encoded in Windows 1252. The records contain many
> occurrences of the degree symbol. In Windows 1252 the degree symbol is a
> single byte (hex B0) but in UTF-8 the degree symbol is two bytes (hex C2 B0).
> My AWK program doesn't convert the one byte Windows 1252 degree symbol to the
> two byte UTF-8 degree symbol. In fact, to be correct every Windows 1252
> character with an encoding above hex 0F must be converted to two bytes, and
> my AWK program doesn't do any of that.
> 
> Eeeeeeek!
> 
> For fun, I also wrote an XSLT program to convert the records to XML. My
> program uses the unparsed-text() function, which does all character
> conversions behind-the-scene (i.e., you have no idea that all the Windows
> 1252 characters above hex 0F are being converted to two byte UTF-8
> characters).
> 
> To implement the character conversions in AWK would be a monumental task.
> 
> Eeeeeeek!
> 
> Lesson Learned: Don't use AWK to convert records to XML.
> 

Re AWK , perl , and POSIX, see:

* https://shlomifish.livejournal.com/1991.html

* https://twitter.com/shlomif/status/1542047869989011457

* https://github.com/ekalinin/github-markdown-toc/pull/104/files

> Bummer!
> 
> /Roger
> 
> _______________________________________________________________________
> 
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
> 
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
> 



-- 

Shlomi Fish       https://www.shlomifish.org/
https://github.com/shlomif/validate-your-html - Validate Your HTML

If Chuck Norris is disappointed by you not following his advice, he’ll survive.
On the other hand, you will not.
    — https://www.shlomifish.org/humour/bits/facts/Chuck-Norris/

Please reply to list if it's a mailing list post - https://shlom.in/reply .


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS