> From: Michael Kay <mike@saxonica.com> > Date: Friday, 13 September 2013 6:49 AM > >> On 12 Sep 2013, at 19:47, David Lee wrote: >> >> In my experience, ALL Large XML files are really collections of smaller files. >> I have never seen a single XML document of any large size that isnt simply >> <root> >> <row> document 1 .... </row> >> ..... 10 bizillion times >> </root> > > That's certainly a very common pattern, but I've seen a few examples that > don't quite fit it. For example, a database dump of 50 tables each of which > fits the above pattern. Or GIS data consisting of large numbers of objects of > a wide variety of different kinds. What does seem to be true is that as files > get larger, it's rare for the hierarchy to get deeper. I agree with that and wanted to share a brief note on our experience, dealing primarily with XML that is to be printed in some format. While XML for things like parts catalogues can get quite large, they tend to be of the pattern of repeating sets of data. Some of the larger XML documents we deal with (which are not "database dumps") tend to be lengthy pieces of legislation. While legislation can be broken down into provisions and so on, there is still enough cross-referencing and relationships between the information to make it tricky to break up into standalone components. Having said that I don't think I've seen a single piece of legislation (eg. Bill or Act) exceed 100MB in XML document size. -Gareth |