Original Requirements: “I'm looking for a tool that I can feed an XML document, and it will tell me which element names were used in the document.” I’d say the 1-liner does htat ---------------------------------------- David A. Lee From: Uche Ogbuji [mailto:uche@ogbuji.net] That doesn't look like it includes the attributes listing, nor the path to each element, as in the original request. Heck, 1-liners are easy if you simplify requirements ;) import sys, amara; doc = amara.parse(sys.argv[1]); print [ e.xml_qname for e in doc.xml_select('//*')]" Note: above includes the details of actually loading the file. I suspect you might have to specify a bit more in the xmlsh example? --Uche On Wed, Feb 2, 2011 at 3:03 PM, David Lee <dlee@calldei.com> wrote: There’s always the XQuery approach ( example in xmlsh) xquery -q ‘ distinct-values( //node-name(.) ) ‘ ---------------------------------------- David A. Lee From: Uche Ogbuji [mailto:uche@ogbuji.net] Oh what the heck. I might as well offer a quick and dirty Amara [1] recipe. import sys import amara from amara.lib.util import element_subtree_iter from amara.xpath.util import abspath doc = amara.parse(sys.argv[1]) top_prefixes = dict(doc.xml_select('*')[0].xml_namespaces) for e in element_subtree_iter(doc): print abspath(e, prefixes=top_prefixes) attrs = dict(e.xml_attributes) if attrs: print '\t', attrs So for example: python element_census.py http://www.retards.org/library/technology/computers/programming/xml/example-xbel.xml | head /xbel {(None, u'version'): u'1.0'} /xbel/title /xbel/folder {(None, u'folded'): u'yes'} /xbel/folder[2] {(None, u'folded'): u'yes'} /xbel/folder[2]/title /xbel/folder[2]/bookmark {(None, u'href'): u'http://www.sciam.com/1999/0599issue/0599bosak.html'} Pretty easy to tweak or format and details of report, etc. In the example above I pass along a URL. You can also pass along a file name, if you like.
|