Hi Folks,
I have an upcoming contract that will involve converting between XML formats. My client wants their customer to provide a representative sample to contract against (i.e. when the transform works on the sample, our work on the contract is fulfilled).
The onus may be firmly on the end customer, but I want to get as much right the first time as possible, and I think they can be helped to ensure that the sample is representative.
What would people recommend to help a) ensure that the sample *is* representative and b) help target/prioritise the work on the transformation?
My initial thoughts would be to use Trang or similar to generate schema from both the sample set and the full data set (if possible: it's likely to be very large), but I'm worried that it will be hard to identify something meaningful. I've also come across tools that offer some form of statistical 'analysis' of XML data sets; I was interested in Rick's Feature Grammar tool but I don't claim to understand it (yet), nor be sure that there won't be recursive features in the data (I would be surprised if there weren't).
I'm interested in how others approach the problem - any ideas?
Thanks,
Tom