OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Fast text output from SAX?

[ Lists Home | Date Index | Thread Index ]

At 10:24 PM -0400 4/15/04, Stephen D. Williams wrote:
>What do you use for data transfer???  I almost never get data 
>corruption that isn't corrected in some way, and I constantly use 
>WiFi, CDMA2000 based cell Internet access, all kinds of computers, 
>harddrives, etc.  Not since I last used my Jazz drive have I had the 
>kind of corruption you seem to be dealing with.  I did have trouble 
>with a particularly ugly multi-drive RAID-5 failure, but files were 
>either good or bad.

I use anything and everything, and sometimes the files get corrupted. 
It doesn't matter why: whether it's a transport error, bad data error 
on disk, or a misbehaving application that writes bad data into the 
file. Corruption happens. Fact of life.

>If the session layer, i.e. TCP/IP or the filesystem, doesn't find 
>errors, the application managing transfer should (email, etc.). 
>Certainly, the application, or better, the library that is accessing 
>the data should detect and react well to any data presented.

Certainly it should, but it doesn't. Word's the most common offender here.

Part of making an application robust against any input is starting 
from the assumption that you have nothing more than stream of bytes, 
and it must be proved to be in a particular format before using it. 
This is essentially what a parser does. This is why XML parsing is 
such a robust process. It's very hard to construct a stream of bytes 
that will crash a parser. Possibly you could do it with very long 
element names or attribute values, but so far I haven't seen it 
pulled off.

However, most processors of binary formats such as Word do not start 
with the assumption that they are reading an arbitrary stream of 
bytes. They assume they're reading data in a known format and build 
assumptions about the format into their code. When those assumptions 
are violated, the program heads south in unanticipated and 
potentially damaging and dangerous ways. This is why it really 
bothers me when processors attempt to gain speed compared to 
traditional XML parsing by skipping well-formedness checks. This 
applies to both many binary parsers and some so-called minimal 
parsers that process traditional XML without checking for 
well-formedness.
-- 

   Elliotte Rusty Harold
   elharo@metalab.unc.edu
   Effective XML (Addison-Wesley, 2003)
   http://www.cafeconleche.org/books/effectivexml
   http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS