XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Tool converts records to XML



On Wed, Nov 16, 2022 at 7:41 PM C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:

Roger L Costello <costello@mitre.org> writes:

> Michael Kay wrote:
>
>> the "Barnes & Noble" problem. The number #1 blunder
>> when writing XML is not to bother escaping `<` and `&`
>> if they happen to occur in your input.
>
> Ouch!
>
> You are right Michael.
>
> Upon reflection, I realized that there is an even nastier problem
> lurking than the problem of converting & and < in the input record
> data into &amp; and &lt; in the output XML.
>
> ...
>
> To implement the character conversions in AWK would be a monumental task.
>
> Eeeeeeek!
>
> Lesson Learned: Don't use AWK to convert records to XML.

Well, you may be right, and I believe many on this list share my
preference for performing such conversions in XSLT and/or XQuery, but I
have to say that the lesson you suggest seems a slightly broader
conclusion than is warranted by the experience you describe.


Agreed.
 
A couple points of detail:

  - Your downstream tools are likely to be somewhat happier if you
    convert the data to UTF-8 or UTF-16, but unless I am mistaken you
    are not in fact required to do so, in order to turn the data into
    XML.  XML does allow encoding declarations.

  - If you do want to convert the encoding it would surprise me a bit if
    awk had no constructs suitable for the work.  It would surprise me
    even more if a system with awk did not have the iconv utility for   
    converting textual data from one encoding to another.

        iconv --from-code=WINDOWS-1252 --to-code=UTF-8 < myinput > output.utf8


awk does not need such a construct. iconv and awk are part of a unix like shell ecosystem so iconv can pre or post process an awk conversion using normal shell scripting piping


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS