Lists Home |
Date Index |
On Wed, 16 Jan 2002, Elliotte Rusty Harold wrote:
> >1) Writing CSV code is easier than XML code (no DOM or anything, just
> >something like SAX; I write the CSV parser myself in less code than it
> >takes to interface with an XML parser)
> If DOM is too hard (and mostly I agree with you) use SAX or use JDOM.
> JDOM certainly is much simpler for the sorts of things you're doing.
> DOM != XML
The code I have that does XML data import (to the same engine as my CVS
data import) uses SAX, yes, but it's still bigger since it then needs to
implement a state machine to pull apart the table structure from a tree.
> >2) Data corruption? XML parsers are *fragile*, CSV parsers can often cope
> >with erronious data in ways that XML parsers mustn't if they are to be
> >standards compliant!
> That's a feature, not a bug. If the data is bad, I want to know about
> it ASAP and get it fixed at the source. Draconian error handling is a
> very good thing.
Depends if you're working in a world of potentially dodgy data sources...
I'd rather not *know* if data is bad, I'd rather the system transparently
fixed it, and only told me if it's too bad to properly process.
With my CSVs, if one row is missing a field or has an extra field (so the
CSV is not well formed, eg not all the rows are the same length) or if
there's a field name that I do not recognise, then I signal that as an
error and stop.
But if they've just used a strange date format, as long as it's parsable,
I'd rather be able to study it and then add support for that date format
so it's not an error in future than have it be forced as an error by some
Alaric B. Snell
http://www.alaric-snell.com/ http://RFC.net/ http://www.warhead.org.uk/
Any sufficiently advanced technology can be emulated in software