Lists Home |
Date Index |
At 2:25 PM -0400 4/15/04, Stephen D. Williams wrote:
>You know, like Jpeg, Tiff/Group4, Word documents (!), PowerPoint,
>zip files, tar/cpio, jar files, gziped HTML, etc.
I routinely deal with broken JPEGs and Word documents. In fact, I was
thinking about Word when I wrote the bit about the fragility of
binary formats. A bad Word document can crash a system. It's been a
while since I've encountered a bad zip or tar file, but I have seen
them. I'm not sure what changed to make these less common. Maybe the
software got better over time?
>When you get a corrupted XML document, you can always magically
>recover just the right missing tags and information? Wow, where is
>that method in the spec?
It's a hell of a lot easier to find the information that is there
than it is to find it in a broken Word document or zip archive. Of
course, you can't recover what's actually missing, but text files are
simply more accessible.
>We're realistically talking about bugs or deficiencies in code,
>configuration, mismatch between applications, etc., not 'fragile
>things that break' from any perspective but schema co-evolution,
>configuration management, and programmer error, isn't that right?
No, it isn't. As well as outright bugs, you can have data corrupted
or partially transmitted across the network, disks that develop bad
sectors, and deliberate creation of bad data as a component of a
denial of service attack. Do you want your system to crash because
some hacker flipped a couple of bytes in the right place?
>You can add forward error correction, b64 or quoted text encoding,
>and other methods to prevent corruption, but the only cure for
>user/programmer/operator error is early error detection and clear
>warning. When these have already been taken care of, through
>earlier testing in once sense or another, or other methods, it is
>not an issue.
There are multiple layers of corruption possible. Using check sums to
verify the data helps at one layer, but does not protect against the
same things well-formedness checking does. Well-formedness checking
does not prevent attacks at the semantic layer though some validity
checks might. Validity cannot prevent most social engineering
attacks. Attacks take place at different points in the stack. Error
correction (which is mostly handled by TCP anyway) is only one a
shiedl against one kind fo attack.
Elliotte Rusty Harold
Effective XML (Addison-Wesley, 2003)