OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Features of XML Languages that Increase Complexity?

On 4/15/13 10:00 AM, David Lee wrote:
> CSV was my most trivial example ... but even sticking with it I
> disagree. There are MANY things that can go wrong in a CSV processor
> ... some can be "dangerous" and some can produce bad data, (which may
> be more "dangerous" then crashing ... e.g. say the wrong $ amount to
> take out of my account or the wrong person taken off the do-not-fly
> list).

Bad data is a separate question from the surface area security question.

> Some examples
> * Misconfiguring the field and row separators
> * Incorrect quoting and escaping  (CSV has many variants which are
> incompatible ... you have to agree with the sender to get it right).

Plus, of course, character encoding.

These are issues of the format, true.  Or perhaps rather lack of format.

> * Passing sensitive data in an unsecure channel

Always a risk, not format-specific.

> * Column data larger than the expected maximum size.

That's not a format issue, it's a question of the expectations you set
for your processor.

Your own processor will have its own surface area questions (and CSV may
leave more of that to you), but those are questions specific to your own
work, not generic across a format and tool set.

> * Mismatch of number of columns from expected columns

Again, a question of expectations in your local processing, not an issue 
for the format itself.

> * Missing header rows (thus requiring implicit column definitions)

Expectations, and missing header rows is completely normal in many

> * Putting the wrong data type in a column.  (say a date instead of a
>  number)

Which flavor of CSV are you using that has data types?  Data problem,
not format problem.

> * Formatting the wrong data in a column (dates, units, numeric
> formats etc).

Again, data problem.

> * Storing tree or graph data -- how to match up the parent/child
> relationships

CSV does trees?  This is all about local processing expectations, not
something specific to the format, which is rows all the way down.

> * Inconsistent duplication of data when storing a typical
> master/detail CSV as repeated rows (master columns repeated).

Data problem.

> Thats just a few.    Any of these things could cause incorrect data,
> loss of data, crashes, insecurities. Some of these are really  bad
> errors that simply can't occur with reasonable XML (such as getting
> the field name wrong, or master/detail inconsistencies).   Some are
> errors that pretty much any data format can break with.

The problems here that are specific to CSV are far fewer and less
resource-intensive than the XML-specific concerns Roger listed.  CSV
doesn't have the facilities to create those concerns.

In fact, it barely has facilities. That raises questions, but different
kinds of questions than the surface-area questions Roger raises.

> IMHO simply using a simpler format doesn't make the data "safer".

You're arguing with the wrong guy on that.

 From here on, I'll let you argue with Roger.  I doubt I'm channeling 
his perspective sympathetically.

Simon St.Laurent

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS