Thanks Roger for sharing, effective simplicity is always interesting. Here are some additional considerations.
and similarly
<document> <element title=”Unix Shell Programming”’ authors=”Stephen G. Kochan, Patrick Wood” date=”2019” isbn=”0-872-32400-3” publisher=”SAMS”/> <element title=”Small, Sharp Software Tools” authors=”Brian P. Hogan” date=”2019” isbn=”978-1-68050-296-1” publisher=” The Pragmatic Programmers”/> <element title=”The AWK Programming Language” authors=”Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger” date=”1988” isbn=”0-201-07981-X” publisher=”Addison-Wesley Publishing Company”/> </document>
Although an author might have a single-quote or double-quote convention, such a choice is explicitly not considered part of the captured information in the Post-Schema Validation Inforset (PSVI). During EXI endeavors we learned that user agents are allowed to switch between them as they see fit. So authors and authoring tools can have any encoding preference they want. If an attribute value includes both single-quote ‘ and double-quote “ characters, then some character escaping is inevitable and multiple forms of attribute expression are possible.
and
Once again I believe that PSVI does not distinguish between any of these representations, all are exactly equivalent once parsed by an XML processor. So you can choose whatever convention you like.
Establishing a good practice such as your toxml processor seems like an excellent processing-chain addition to XML, JSON, Semantic Web, and other Big Data workflows. all the best, Don -- Don Brutzman Naval Postgraduate School, Code USW/Br brutzman@nps.edu Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149 X3D graphics, virtual worlds, Navy robotics https:// faculty.nps.edu/brutzman From: Hans-Juergen Rennau <hrennau@yahoo.de> Roger, I would find it interesting to compare an awk solution with an XQuery one, also considering aspects like clarity and extensibility. Especially interesting as the potential of XQuery for tool building is by and large ignored. With kind regards, Hans-Jürgen PS. Example of an XQuery-based solution: declare variable $uri external; declare variable $sep external := '	'; <document>{ let $lines := unparsed-text-lines($uri) let $names := $lines => head() => tokenize($sep) for $line in tail($lines) return <row>{ for $field at $pos in tokenize($line, $sep) return element {$names[$pos]} {$field} }</row> }</document> Am Dienstag, 15. November 2022 um 00:10:03 MEZ hat Roger L Costello <costello@mitre.org> Folgendes geschrieben: Hi Folks, In the spirit of UNIX tool building ..... I created a simple tool that converts records of tab-delimited data into XML. For example, these records: title authors date isbn publisher Unix Shell Programming Stephen G. Kochan, Patrick Wood 2019 0-872-32400-3 SAMS Small, Sharp Software Tools Brian P. Hogan 2019 978-1-68050-296-1 The Pragmatic Programmers The AWK Programming Language Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger 1988 0-201-07981-X Addison-Wesley Publishing Company are converted to this XML: <document> <row> <title>Unix Shell Programming</title> <authors>Stephen G. Kochan, Patrick Wood</authors> <date>2019</date> <isbn>0-872-32400-3</isbn> <publisher>SAMS</publisher> </row> <row> <title>Small, Sharp Software Tools</title> <authors>Brian P. Hogan</authors> <date>2019</date> <isbn>978-1-68050-296-1</isbn> <publisher>The Pragmatic Programmers</publisher> </row> <row> <title>The AWK Programming Language</title> <authors>Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger</authors> <date>1988</date> <isbn>0-201-07981-X</isbn> <publisher>Addison-Wesley Publishing Company</publisher> </row> </document> Each record is wrapped in a <row>...</row> element. The fields in each record are wrapped in an element named by the header. The root element is <document>...</document. The tool may be invoked with a file, like this: toxml books.txt or from standard input, like this: cat books.txt | toxml The tool is a small AWK program, which I named "toxml": --------------------------------------------------------- awk ' BEGIN { # field separator is tab (\t) # record separator is LF (\n) OFS=FS="\t" RS="\n" print "<document>" } NR==1 { # store column header names in an array for (i=1; i<=NF; i++) header[i]=$i; } NR!=1 { # create a <row>...</row> element for the line # surround field $i with a start/end tag named header[i] print "<row>" for (i=1; i<=NF; i++) print "<" header[i] ">" $i "</" header[i] ">" print "</row>" } END { print "</document>" }' $* --------------------------------------------------------- _______________________________________________________________________ XML-DEV is a publicly archived, unmoderated list hosted by OASIS to support XML implementation and development. To minimize spam in the archives, you must subscribe before posting. [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ Or unsubscribe: xml-dev-unsubscribe@lists.xml.org subscribe: xml-dev-subscribe@lists.xml.org List archive: http://lists.xml.org/archives/xml-dev/ List Guidelines: http://www.oasis-open.org/maillists/guidelines.php |
Attachment:
smime.p7s
Description: application/pkcs7-signature