XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Tool converts records to XML

Hi Folks,

In the spirit of UNIX tool building .....

I created a simple tool that converts records of tab-delimited data into XML. For example, these records:

title	authors	date	isbn	publisher
Unix Shell Programming	Stephen G. Kochan, Patrick Wood	2019	0-872-32400-3	SAMS
Small, Sharp Software Tools	Brian P. Hogan	2019	978-1-68050-296-1	The Pragmatic Programmers
The AWK Programming Language	Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger	1988	0-201-07981-X	Addison-Wesley Publishing Company

are converted to this XML:

<document>
	<row>
		<title>Unix Shell Programming</title>
		<authors>Stephen G. Kochan, Patrick Wood</authors>
		<date>2019</date>
		<isbn>0-872-32400-3</isbn>
		<publisher>SAMS</publisher>
	</row>
	<row>
		<title>Small, Sharp Software Tools</title>
		<authors>Brian P. Hogan</authors>
		<date>2019</date>
		<isbn>978-1-68050-296-1</isbn>
		<publisher>The Pragmatic Programmers</publisher>
	</row>
	<row>
		<title>The AWK Programming Language</title>
		<authors>Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger</authors>
		<date>1988</date>
		<isbn>0-201-07981-X</isbn>
		<publisher>Addison-Wesley Publishing Company</publisher>
	</row>
</document>

Each record is wrapped in a <row>...</row> element. The fields in each record are wrapped in an element named by the header. The root element is <document>...</document.

The tool may be invoked with a file, like this:

toxml books.txt

or from standard input, like this:

cat books.txt | toxml

The tool is a small AWK program, which I named "toxml":
---------------------------------------------------------
awk '
BEGIN  	{   # field separator is tab (\t)
          	    # record separator is LF (\n)
          	    OFS=FS="\t"
          	    RS="\n"
          	    print "<document>" 
        	}
NR==1  	{  # store column header names in an array
           	    for (i=1; i<=NF; i++)
              	        header[i]=$i;
        	}
NR!=1  	{   # create a <row>...</row> element for the line 
           	    # surround field $i with a start/end tag named header[i]
           	    print "<row>"
           	    for (i=1; i<=NF; i++)
              	        print "<" header[i] ">" $i "</" header[i] ">"
           	    print "</row>"
        	}
END    	{ print "</document>" }' $*
---------------------------------------------------------






[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS