XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Tool converts records to XML

You've been at this game long enough, Roger, to have seen the "Barnes & Noble" problem. The number #1 blunder when writing XML is not to bother escaping `<` and `&` if they happen to occur in your input.

Michael Kay
Saxonica

> On 14 Nov 2022, at 23:09, Roger L Costello <costello@mitre.org> wrote:
> 
> Hi Folks,
> 
> In the spirit of UNIX tool building .....
> 
> I created a simple tool that converts records of tab-delimited data into XML. For example, these records:
> 
> title	authors	date	isbn	publisher
> Unix Shell Programming	Stephen G. Kochan, Patrick Wood	2019	0-872-32400-3	SAMS
> Small, Sharp Software Tools	Brian P. Hogan	2019	978-1-68050-296-1	The Pragmatic Programmers
> The AWK Programming Language	Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger	1988	0-201-07981-X	Addison-Wesley Publishing Company
> 
> are converted to this XML:
> 
> <document>
> 	<row>
> 		<title>Unix Shell Programming</title>
> 		<authors>Stephen G. Kochan, Patrick Wood</authors>
> 		<date>2019</date>
> 		<isbn>0-872-32400-3</isbn>
> 		<publisher>SAMS</publisher>
> 	</row>
> 	<row>
> 		<title>Small, Sharp Software Tools</title>
> 		<authors>Brian P. Hogan</authors>
> 		<date>2019</date>
> 		<isbn>978-1-68050-296-1</isbn>
> 		<publisher>The Pragmatic Programmers</publisher>
> 	</row>
> 	<row>
> 		<title>The AWK Programming Language</title>
> 		<authors>Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger</authors>
> 		<date>1988</date>
> 		<isbn>0-201-07981-X</isbn>
> 		<publisher>Addison-Wesley Publishing Company</publisher>
> 	</row>
> </document>
> 
> Each record is wrapped in a <row>...</row> element. The fields in each record are wrapped in an element named by the header. The root element is <document>...</document.
> 
> The tool may be invoked with a file, like this:
> 
> toxml books.txt
> 
> or from standard input, like this:
> 
> cat books.txt | toxml
> 
> The tool is a small AWK program, which I named "toxml":
> ---------------------------------------------------------
> awk '
> BEGIN  	{   # field separator is tab (\t)
>          	    # record separator is LF (\n)
>          	    OFS=FS="\t"
>          	    RS="\n"
>          	    print "<document>" 
>        	}
> NR==1  	{  # store column header names in an array
>           	    for (i=1; i<=NF; i++)
>              	        header[i]=$i;
>        	}
> NR!=1  	{   # create a <row>...</row> element for the line 
>           	    # surround field $i with a start/end tag named header[i]
>           	    print "<row>"
>           	    for (i=1; i<=NF; i++)
>              	        print "<" header[i] ">" $i "</" header[i] ">"
>           	    print "</row>"
>        	}
> END    	{ print "</document>" }' $*
> ---------------------------------------------------------
> 
> 
> 
> 
> 
> _______________________________________________________________________
> 
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
> 
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
> 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS