[
Lists Home |
Date Index |
Thread Index
]
This is exactly the situation Walter Perry has been talking about for
several years, and pretty much what he's had to deal with in the
financial industry with financial data. As I understand it, his
approach is to take all the incoming feeds, use them to populate his
data structures, and then create an XML representation of his data
which he sends out. In this approach the XML you present to the world
claims to be nothing more than your representation of your data. It
does not purport to be a representation of somebody else's data. I
tend to agree with this.
I say when you receive invalid (not malformed but invalid) data,
clean it up as best you can. Most of the data can be cleaned
automatically because as you've noticed people keep making the same
mistakes. Flag data that can't be cleaned and pass it to a human to
write the code to clean it. The first few weeks you'll be cleaning a
lot of data by hand, but gradually the processes become more and more
automated, and the exceptional cases decrease to a manageable level.
Once you've cleaned the data sufficiently to generate your own data
structures from it, then output those data structures as XML and pass
them on to the third parties.
But you really should have a sit down with Walter. This really is
exactly what he's been doing for some time now.
--
Elliotte Rusty Harold
elharo@metalab.unc.edu
Effective XML (Addison-Wesley, 2003)
http://www.cafeconleche.org/books/effectivexml
http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA
|