* Simon St.Laurent wrote:
>I know I could write this myself, but suspect someone else has already
>done a better job of it - and I just don't know the right Google
>keywords to summon it.
>
>I'm looking for a tool that I can feed an XML document, and it will tell
>me which element names were used in the document.
>
>Attributes used on those elements would be a bonus, as would a frequency
>count for usage, but mostly I'm just trying to survey a collection of
>(DocBook) documents quickly.
As this is turning into a rosetta code excercise,
Which I think is a useful thing. Scheme anyone? Haskell? Ruby?
let me throw in C#:
var elems = XDocument.Load(path).Root.DescendantsAndSelf();
var attrs = elems.Attributes()
.GroupBy(x => new { Attr = x.Name, Elem = x.Parent.Name });
foreach (var elem in elems.GroupBy(x => x.Name))
Console.WriteLine("{0} {1}", elem.Count(), elem.Key);
Console.WriteLine();
foreach (var attr in attrs)
Console.WriteLine("{0} {1}/@{2}", attr.Count(),
attr.Key.Elem, attr.Key.Attr);
Just to be clear, that's C# with LINQ, right? (IIRC LINQ is not part of the language def).
Though it would seem the GNU-ish way to solve this would be having a
tool that prints out all the element names and attribute/element paths
and then use, say, `sort | uniq -c` on it.
Yeah, but if you've already gone through the exercise of parsing the XML, do you really save much by punting to pipes for the most generic bits?