[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Tool converts records to XML
- From: Roger L Costello <costello@mitre.org>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Mon, 14 Nov 2022 23:09:53 +0000
Hi Folks,
In the spirit of UNIX tool building .....
I created a simple tool that converts records of tab-delimited data into XML. For example, these records:
title authors date isbn publisher
Unix Shell Programming Stephen G. Kochan, Patrick Wood 2019 0-872-32400-3 SAMS
Small, Sharp Software Tools Brian P. Hogan 2019 978-1-68050-296-1 The Pragmatic Programmers
The AWK Programming Language Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger 1988 0-201-07981-X Addison-Wesley Publishing Company
are converted to this XML:
<document>
<row>
<title>Unix Shell Programming</title>
<authors>Stephen G. Kochan, Patrick Wood</authors>
<date>2019</date>
<isbn>0-872-32400-3</isbn>
<publisher>SAMS</publisher>
</row>
<row>
<title>Small, Sharp Software Tools</title>
<authors>Brian P. Hogan</authors>
<date>2019</date>
<isbn>978-1-68050-296-1</isbn>
<publisher>The Pragmatic Programmers</publisher>
</row>
<row>
<title>The AWK Programming Language</title>
<authors>Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger</authors>
<date>1988</date>
<isbn>0-201-07981-X</isbn>
<publisher>Addison-Wesley Publishing Company</publisher>
</row>
</document>
Each record is wrapped in a <row>...</row> element. The fields in each record are wrapped in an element named by the header. The root element is <document>...</document.
The tool may be invoked with a file, like this:
toxml books.txt
or from standard input, like this:
cat books.txt | toxml
The tool is a small AWK program, which I named "toxml":
---------------------------------------------------------
awk '
BEGIN { # field separator is tab (\t)
# record separator is LF (\n)
OFS=FS="\t"
RS="\n"
print "<document>"
}
NR==1 { # store column header names in an array
for (i=1; i<=NF; i++)
header[i]=$i;
}
NR!=1 { # create a <row>...</row> element for the line
# surround field $i with a start/end tag named header[i]
print "<row>"
for (i=1; i<=NF; i++)
print "<" header[i] ">" $i "</" header[i] ">"
print "</row>"
}
END { print "</document>" }' $*
---------------------------------------------------------
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]