Re: [xml-dev] Tool converts records to XML

<!DOCTYPE document [

<!ELEMENT document - - (row+)>

<!ELEMENT row - - (title,authors,date,isbn,publisher)>

<!ELEMENT title - - (#PCDATA)>

<!ELEMENT authors - - (#PCDATA)>

<!ELEMENT date - - (#PCDATA)>

<!ELEMENT isbn - - (#PCDATA)>

<!ELEMENT publisher - - (#PCDATA)>

<!ENTITY books SYSTEM "books.txt">

<!ENTITY start-fields "<row><title>">

<!ENTITY end-row "</publisher></row>">

<!ENTITY end-title "</title><authors>">

<!ENTITY end-authors "</authors><date>">

<!ENTITY end-date "</date><isbn>">

<!ENTITY end-isbn "</isbn><publisher>">

<!SHORTREF in-document "&#RS;" start-fields>

<!SHORTREF in-title "&#TAB;" end-title>

<!SHORTREF in-authors "&#TAB;" end-authors>

<!SHORTREF in-date "&#TAB;" end-date>

<!SHORTREF in-isbn "&#TAB;" end-isbn>

<!SHORTREF in-publisher "&#RE;" end-row>

<!USEMAP in-document document>

<!USEMAP in-title title>

<!USEMAP in-authors authors>

<!USEMAP in-date date>

<!USEMAP in-isbn isbn>

<!USEMAP in-publisher publisher>

&books

</document>

Where books.txt is supposed to contain tab-separated data, and the SGML file is stored in doc.sgm, say.

Producing XML from this is as simple as invoking „sgmlproc doc.sgm“

Have fun,

Marcus Reichardt

sgml.io

Am 15.11.2022 um 00:09 schrieb Roger L Costello <costello@mitre.org>:

Hi Folks,

In the spirit of UNIX tool building .....

I created a simple tool that converts records of tab-delimited data into XML. For example, these records:

title authors date isbn publisher
Unix Shell Programming Stephen G. Kochan, Patrick Wood 2019 0-872-32400-3 SAMS
Small, Sharp Software Tools Brian P. Hogan 2019 978-1-68050-296-1 The Pragmatic Programmers
The AWK Programming Language Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger 1988 0-201-07981-X Addison-Wesley Publishing Company

are converted to this XML:

<document>
<row>
<title>Unix Shell Programming</title>
<authors>Stephen G. Kochan, Patrick Wood</authors>
<date>2019</date>
<isbn>0-872-32400-3</isbn>
<publisher>SAMS</publisher>
</row>
<row>
<title>Small, Sharp Software Tools</title>
<authors>Brian P. Hogan</authors>
<date>2019</date>
<isbn>978-1-68050-296-1</isbn>
<publisher>The Pragmatic Programmers</publisher>
</row>
<row>
<title>The AWK Programming Language</title>
<authors>Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger</authors>
<date>1988</date>
<isbn>0-201-07981-X</isbn>
<publisher>Addison-Wesley Publishing Company</publisher>
</row>
</document>

Each record is wrapped in a <row>...</row> element. The fields in each record are wrapped in an element named by the header. The root element is <document>...</document.

The tool may be invoked with a file, like this:

toxml books.txt

or from standard input, like this:

cat books.txt | toxml

The tool is a small AWK program, which I named "toxml":
---------------------------------------------------------
awk '
BEGIN {   # field separator is tab (\t)
             # record separator is LF (\n)
             OFS=FS="\t"
             RS="\n"
             print "<document>"
        }
NR==1 { # store column header names in an array
              for (i=1; i<=NF; i++)
                     header[i]=$i;
        }
NR!=1 {   # create a <row>...</row> element for the line
              # surround field $i with a start/end tag named header[i]
              print "<row>"
              for (i=1; i<=NF; i++)
                     print "<" header[i] ">" $i "</" header[i] ">"
              print "</row>"
        }
END     { print "</document>" }' $*
---------------------------------------------------------

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php