XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] XSLT versus AWK

The problem here is you're dealing with two data formats, field-separated text and XML, and AWK only handles one of them. You're therefore resorting to creating XML "by hand", and like most people who create XML by hand, you're getting it wrong - you're not bothering to escape special characters like `<` and `&`, and you're not bothering to check that the field names are valid XML names. Mistakes like that account for most of the bad XML that we all have to deal with, and that's why most of us recommend using XML-aware tools not just to parse XML, but to create it as well.

Lesson learned: use the right programming language for the right data format.

Michael Kay
Saxonica

On 31 Jul 2022, at 12:22, Roger L Costello <costello@mitre.org> wrote:

Hi Folks,

XSLT is a programming language specifically designed for processing textual data that is formatted as XML.

AWK is a programming language specifically designed for processing textual data that is formatted as records containing fields. Interestingly, I have observed that the records/fields format is the one used for input and output by most UNIX tools.

XSLT and AWK are mature programming languages. XSLT was created roughly 24 years ago at the W3C. AWK was created roughly 45 years ago at Bell Labs by Alfred Aho, Peter Weinberger, and Brian Kernighan (the name AWK comes from their last names).

XSLT and AWK substantially reduce -- relative to other programming languages -- the amount of code, time, and effort needed to process their respective data formats. A developer will be far more productive writing an XSLT program to process XML-formatted data than if he were to write the program in some other programming language. A developer will be far more productive writing an AWK program to process records-and-fields-formatted data than if he were to write the program in some other programming language. 

XSLT and AWK are complimentary. An XSLT program can convert an XML document into a document containing records and fields. An AWK program can convert a document consisting of records and fields into an XML document. In fact, just yesterday I did that very thing -- I wrote an AWK program to convert to XML a huge document containing records with tab-delimited fields, where the first record contained column headers. See my simple, short AWK program below (note: I am an AWK newbie, so there are likely better ways to write the program).

Lesson Learned: Use the right programming language for the right data format.

/Roger

convert2xml.awk

BEGIN   {  # field separator is tab (x09)
                 # record separator is CRLF (\r\n) 
                  FS = "\t"
                  RS = "\r\n"
                 print "<Airport>" 
               }
NR==1   {  # get column header names, store in an array
                  for (i=1; i<=NF; i++)
                     header[i] = $i; 
               }
NR!=1   {  # create a <Row>...</Row> element for the line 
                  # surround field $i with a start/end tag named header[i] 
                  print "<Row>"
                  for (i=1; i<=NF; i++)
                     print "<" header[i] ">" $i "</" header[i] ">"
                  print "</Row>"
               }
END       { print "</Airport>" }




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS