OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Images embedded in XML

4/8/01 4:54:50 AM, "Al B. Snell" <alaric@alaric-snell.com> wrote:
>This is a bit of a myth... plain text is less of a common demoninator
>than two's complement or unsigned binary integers, since plain text is
>described in *terms* of this, and XML is far from a simple "lowest common
>denominator" data format; it would take me a few minutes to write a set of
>routines in C to serialise and unserialise data in network byte order, and
>perhaps an hour at most to implement this for a self-describing
>format. Compare that to how long it takes to implement an XML parser, and
>the size and run time space/time requirements, and the size of the XML
>documents compared to the "XDF" records...

IMHO that's an extremely unfair comparison.  I very much doubt that in a few 
minutes you could come up with a reliable, tested, general-purpose data 
serializer, as opposed to a quick application-specific hack.  Most of the time 
spent writing an XML parser isn't spent on bare-metal coding; it's spent on 
the less geeky aspects of programming like testing, requirements analysis, API 
documentation and development (the API that's only for your own personal use 
is always the easiest to develop, since you already have it internalized), 
design for maintainability, and maintenance.  It's like the way geeks always 
ridicule the (very well documented) statistic that if you divide the number of 
lines of code a programmer writes over the span of a project by the total time 
he or she spends on the project, you get about ten lines of code per day.  The 
catch is that the typical programmer spends most of his time on activities 
other than writing lines of code (even though those activities, such as 
requirements analysis, may eventually contribute to writing code).  The time 
to develop and produce a program is always a lot greater than the time to code 

Application-specific hacks are always more efficient than general-purpose 
libraries when the entire project is under the complete control of a lonergeek 
cowboy coder who gets to define all the requirements himself and who's going 
to be the only person ever to look at the code.  In the real world, things are 
a little different.

>Since parsing XML is complex, XML's adoption will be limited by the
>development rate of XML parsers... I have heard a few people say "Ah, I
>can write an XML parser in 10 lines of Perl", but those parsers don't
>process entity references or handle namespaces :-)

But parsers that do all that are easily available to any competent Perl 
programmer, who doesn't have to write the parser, only use it.  The only 
problems arise when the Perl code is supposed to run on some el-cheapo rented 
Web host whose admin doesn't even understand how to install Perl, IOW under 
extremely penny-wise and pound-foolish conditions.

>> Either you do
>> without compatibility between systems or you sacrifice a bit of speed.
>Only a tiny smidgen... many CPUs can do network / host byte order
>translation in a single instruction; compare that to the time taken for an
>XML parser to locate a text node by stepping through to find delimeters,
>then removing the whitespace and converting from UTF8-decimal to an
>integer in host byte order...

You're thinking in terms of saving CPU cycles.  In the Real World, most of the 
time isn't spent executing (cheap) instructions; it's spent by (expensive) 
people trying to solve "impedance mismatches" between two organizations' 
notions of what their data should look like.  Once again, XML has few 
advantages in systems that are entirely under the control of a lone coder.  
The advantages come into play when you have to exchange data with multiple 
organizations; sure writing low-level conversion routines to handle it is fun 
geektoy stuff, but you're ignoring the non-geeky stuff like maintaining and 
documenting all those conversion routines.

And let's not forget the *social* aspects (the ultimate non-geeky stuff) of 
data interchange.  When several unrelated organizations, or even departments 
within an organization, need to exchange data, there's an enormous advantage 
to using a data format that was created by a third party rather than by one of 
the players, namely that there's no rivalry over *which* player gets to create 
the format.  Again, if one party could simply impose a format by fiat, 
everything would be cool, but in real life, if you don't get full "buy in" 
from all the players, you're going to see a lot of friction (usually in the 
form of "creative incompetence" where everybody's implementations differ in 
slight but important details) that will dissipate a lot of energy as heat.  
Yes, this falls into the realm of what hardcore geeks would call "touchy-
feely" stuff, but the fact is that psychological/verbal/non-
quantitative/stereotypically-female/"touchy-feely" considerations play 
important roles in any real-life human endeavor involving more than one 
person, and the fact that one might be more confortable with bits and chips 
than with human interactions doesn't change that reality.