xml-dev - Re: [xml-dev] Fast text output from SAX?

Re: [xml-dev] Fast text output from SAX?

[ Lists Home | Date Index | Thread Index ]

To: "Stephen D. Williams" <sdw@lig.net>
Subject: Re: [xml-dev] Fast text output from SAX?
From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Date: Thu, 15 Apr 2004 17:22:28 -0400
Cc: XML DEV <xml-dev@lists.xml.org>
In-reply-to: <407ED381.6090301@lig.net>
References: <1E0CC447E59C974CA5C7160D2A2854EC097DEA@SJMEMXMB04.stjude.sjcrh.local><p06010203bca46bfb4c07@[192.168.254.88]> <407ED381.6090301@lig.net>

At 2:25 PM -0400 4/15/04, Stephen D. Williams wrote:

>You know, like Jpeg, Tiff/Group4, Word documents (!), PowerPoint, 
>zip files, tar/cpio, jar files, gziped HTML, etc.

I routinely deal with broken JPEGs and Word documents. In fact, I was 
thinking about Word when I wrote the bit about the fragility of 
binary formats. A bad Word document can crash a system. It's been a 
while since I've encountered a bad zip or tar file, but I have seen 
them. I'm not sure what changed to make these less common. Maybe the 
software got better over time?

>When you get a corrupted XML document, you can always magically 
>recover just the right missing tags and information?  Wow, where is 
>that method in the spec?

It's a hell of a lot easier to find the information that is there 
than it is to find it in a broken Word document or zip archive. Of 
course, you can't recover what's actually missing, but text files are 
simply more accessible.

>We're realistically talking about bugs or deficiencies in code, 
>configuration, mismatch between applications, etc., not 'fragile 
>things that break' from any perspective but schema co-evolution, 
>configuration management, and programmer error, isn't that right?

No, it isn't. As well as outright bugs, you can have data corrupted 
or partially transmitted across the network, disks that develop bad 
sectors, and deliberate creation of bad data as a component of a 
denial of service attack. Do you want your system to crash because 
some hacker flipped a couple of bytes in the right place?

>You can add forward error correction, b64 or quoted text encoding, 
>and other methods to prevent corruption, but the only cure for 
>user/programmer/operator error is early error detection and clear 
>warning.  When these have already been taken care of, through 
>earlier testing in once sense or another, or other methods, it is 
>not an issue.

There are multiple layers of corruption possible. Using check sums to 
verify the data helps at one layer, but does not protect against the 
same things well-formedness checking does. Well-formedness checking 
does not prevent attacks at the semantic layer though some validity 
checks might.  Validity cannot prevent most social engineering 
attacks. Attacks take place at different points in the stack. Error 
correction (which is mostly handled by TCP anyway) is only one a 
shiedl against one kind fo attack.
-- 

   Elliotte Rusty Harold
   elharo@metalab.unc.edu
   Effective XML (Addison-Wesley, 2003)
   http://www.cafeconleche.org/books/effectivexml
   http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA

Follow-Ups:
- Re: [xml-dev] Fast text output from SAX?
  - From: "Stephen D. Williams" <sdw@lig.net>

References:
- RE: [xml-dev] Fast text output from SAX?
  - From: "Hunsberger, Peter" <Peter.Hunsberger@STJUDE.ORG>
- RE: [xml-dev] Fast text output from SAX?
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] Fast text output from SAX?
  - From: "Stephen D. Williams" <sdw@lig.net>

Prev by Date: RE: [xml-dev] RE: Why a "general" solution? (was: RE: [xml-dev]XML Binary Characterization WG public list available e)
Next by Date: Re: [xml-dev] XML Binary Characterization WG public list available
Previous by thread: Re: [xml-dev] Fast text output from SAX?
Next by thread: Re: [xml-dev] Fast text output from SAX?
Index(es):
- Date
- Thread