OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Fast text output from SAX?

[ Lists Home | Date Index | Thread Index ]

At 2:25 PM -0400 4/15/04, Stephen D. Williams wrote:

>You know, like Jpeg, Tiff/Group4, Word documents (!), PowerPoint, 
>zip files, tar/cpio, jar files, gziped HTML, etc.

I routinely deal with broken JPEGs and Word documents. In fact, I was 
thinking about Word when I wrote the bit about the fragility of 
binary formats. A bad Word document can crash a system. It's been a 
while since I've encountered a bad zip or tar file, but I have seen 
them. I'm not sure what changed to make these less common. Maybe the 
software got better over time?

>When you get a corrupted XML document, you can always magically 
>recover just the right missing tags and information?  Wow, where is 
>that method in the spec?

It's a hell of a lot easier to find the information that is there 
than it is to find it in a broken Word document or zip archive. Of 
course, you can't recover what's actually missing, but text files are 
simply more accessible.

>We're realistically talking about bugs or deficiencies in code, 
>configuration, mismatch between applications, etc., not 'fragile 
>things that break' from any perspective but schema co-evolution, 
>configuration management, and programmer error, isn't that right?

No, it isn't. As well as outright bugs, you can have data corrupted 
or partially transmitted across the network, disks that develop bad 
sectors, and deliberate creation of bad data as a component of a 
denial of service attack. Do you want your system to crash because 
some hacker flipped a couple of bytes in the right place?

>You can add forward error correction, b64 or quoted text encoding, 
>and other methods to prevent corruption, but the only cure for 
>user/programmer/operator error is early error detection and clear 
>warning.  When these have already been taken care of, through 
>earlier testing in once sense or another, or other methods, it is 
>not an issue.

There are multiple layers of corruption possible. Using check sums to 
verify the data helps at one layer, but does not protect against the 
same things well-formedness checking does. Well-formedness checking 
does not prevent attacks at the semantic layer though some validity 
checks might.  Validity cannot prevent most social engineering 
attacks. Attacks take place at different points in the stack. Error 
correction (which is mostly handled by TCP anyway) is only one a 
shiedl against one kind fo attack.

   Elliotte Rusty Harold
   Effective XML (Addison-Wesley, 2003)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS