OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Ensuring samples are representative

On Mon, 2016-10-17 at 21:34 +0000, yamahito wrote:
> Hi Folks,
> I have an upcoming contract that will involve converting between XML
> formats.  My client wants their customer to provide a representative
> sample

In that case make sure your contract does not promise any level of
accuracy other than a best effort.

> The onus may be firmly on the end customer,

In which case get that in writing.

> What would people recommend to help a) ensure that the sample *is*
> representative and b) help target/prioritise the work on the
> transformation?

You can't ensure it. It may help to get the five longest, five
involving mathematics, five involving tables, five involving both, the
five oldest documents, and then every 17th document (or some other
prime number) by document number.

I say 17th because otherwise you might just get e.g. the first of every
batch, and discover that's always a cover letter. But it depends what
people are willing to send you. If they won't do that, ask for some
"complete sets, if documents come in groups".

Ask the people working with the documents which ones are easy and which
are hard and why. But don't assume that what they find hard will be
hard for you - e.g. "the long tables are hard because they have 10,000
rows" doesn't bother a program, but "the single-page cover sheets are
all different because they came from a word processor" is another

Converting e.g. from word processing or page layout XML files to a
higher level is difficult - e.g. you may need to group list items
together into a list, coping with continuing numbering after an
intervening table...

It's like a giant puzzle and can be a lot of fun to work out. The two
most important things I learned from doing SGML and XML conversions are
to automate as much as possible and to document your processes. Next
most important comes using programming tools such as make, Perl, XSLT,
a revision control system, with never any need to remember which
scripts to run and in what order.
Hope this helps,


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS