Am Dienstag, 15. November 2022 um 00:10:03 MEZ hat Roger L Costello <costello@mitre.org> Folgendes geschrieben:
Hi Folks,
In the spirit of UNIX tool building .....
I created a simple tool that converts records of tab-delimited data into XML. For example, these records:
title authors date isbn publisher
Unix Shell Programming Stephen G. Kochan, Patrick Wood 2019 0-872-32400-3 SAMS
Small, Sharp Software Tools Brian P. Hogan 2019 978-1-68050-296-1 The Pragmatic Programmers
The AWK Programming Language Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger 1988 0-201-07981-X Addison-Wesley Publishing Company
are converted to this XML:
<document>
<row>
<title>Unix Shell Programming</title>
<authors>Stephen G. Kochan, Patrick Wood</authors>
<date>2019</date>
<isbn>0-872-32400-3</isbn>
<publisher>SAMS</publisher>
</row>
<row>
<title>Small, Sharp Software Tools</title>
<authors>Brian P. Hogan</authors>
<date>2019</date>
<isbn>978-1-68050-296-1</isbn>
<publisher>The Pragmatic Programmers</publisher>
</row>
<row>
<title>The AWK Programming Language</title>
<authors>Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger</authors>
<date>1988</date>
<isbn>0-201-07981-X</isbn>
<publisher>Addison-Wesley Publishing Company</publisher>
</row>
</document>
Each record is wrapped in a <row>...</row> element. The fields in each record are wrapped in an element named by the header. The root element is <document>...</document.
The tool may be invoked with a file, like this:
toxml books.txt
or from standard input, like this:
cat books.txt | toxml
The tool is a small AWK program, which I named "toxml":
---------------------------------------------------------
awk '
BEGIN { # field separator is tab (\t)
# record separator is LF (\n)
OFS=FS="\t"
RS="\n"
print "<document>"
}
NR==1 { # store column header names in an array
for (i=1; i<=NF; i++)
header[i]=$i;
}
NR!=1 { # create a <row>...</row> element for the line
# surround field $i with a start/end tag named header[i]
print "<row>"
for (i=1; i<=NF; i++)
print "<" header[i] ">" $i "</" header[i] ">"
print "</row>"
}
END { print "</document>" }' $*
---------------------------------------------------------
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.