[
Lists Home |
Date Index |
Thread Index
]
mike
if it's any help there was a similar debate about storing data in
database systems years ago - do you store data as binary - integers,
floats, etc - or text. almost everyone chose binary for performance
reasons. except that it's a pain in the neck when it comes to big
endians and little endians, 64 bit, 32 bit, how big is an integer
anyway, (one early definition for C said it was large enough to hold the
address space of the processor (!) )
for all of that i built my database engine using all ascii and text
representations of numbers. and it still outperforms the binary
representations.
you have to do a lot of cpu profiling to understand why - which we did -
don't think too many people do that today. basically conversion to/from
binary is relatively few instructions in the life of a process. in my
case, the fact that numbers were stored in their most likely display
format far outweighed the cost of converting them for calculations.
i agree with the slow down from parsing xml. it's a much bigger problem,
than binary or text formats. the need to find the other end of a tag
before you can really process a tag - and searching for multi byte
sequences is not well supported in the current generation of processors
- is i think the main problem.
all the old formats worked mainly because we look for simple things that
occur once - a newline character, '+', ':', or '*' eg in the different
edi formats. processors have built in instructions for this and so
compilers can optimise code for these functions.
perhaps we can get intel to design multi byte search instructions into
their next processor and then we can get performance back.
rick
ps apologies to intel if it's already there - perhaps the compiler
optimisers could be taught to take advantage of it.
On Fri, 2003-07-25 at 01:17, Mike Champion wrote:
> Since this seems to be "old permathread week" at xml-dev, let's revisit
> another ...I had been waiting for the W3C to publicize the upcoming Binary
> XML workshop before talking about it on xml-dev, but the meme is loose in
> the wild thanks to Marc Hadley (http://weblogs.java.net/pub/wlg/263) and
> Elliotte Rusty Harold (http://www.ibiblio.org/xml/) Apparently the
> announcement will be made publicly readable "real soon now" (presumably the
> delay was to give dues paying W3C members a bit of a head start), and my
> understanding is that non-W3C members will be allowed to present position
> papers and attend if they are accepted.
>
> Elliote's commentary actually echos the disclaimers at the top of the
> (still private) meeting announcement, and a deeply rooted sentiment in
> parts of the W3C that this whole idea is the "spawn of the devil" (quoting
> literally from some very well known people who can identify themselves if
> they wish!).
>
> Elliotte makes a couple of points that illustrate very clearly what is at
> stake here:
>
> "Some developers either don't believe or don't get XML's value proposition
> of a compatible, interoperable, editable, text format." - I suspect that
> everyone believes that, but some see it as only one dimension of a multi-
> dimensional value space, and others see it as a sine qua non.
>
> "They falsely believe that binary formats are significantly faster or
> smaller than XML, which is almost never true in practice" - AFAIK, that's
> exactly what the workshop is trying to determine: What is the evidence that
> 'binary XML' can be faster or smaller in realistic scenarios. My
> understanding (from a lot of inputs, most of which I can't discuss) is that
> the "bloat" issue is a red herring because gzip works wonderfully on XML
> text, but that there is an approximately one order of magnitude processing
> overhead of XML compared with the previous generation of RPC, asynchronous
> messaging, and EDI technology. An order of magnitude here, and a recession
> there, and soon you've got problems that Moore's Law doesn't solve anytime
> soon.
>
> "Worse yet, some vendors are deliberately trying to lock developers into
> their patented, closed, binary, "XML" formats" -- Perhaps, but again the
> whole point of the workshop is to determine whether an open standard would
> be useful in preventing this scenario *if* tangible benefits can be
> demonstrated. I think there are a lot more vendors who realize that
> various aspects of XML's text format cause significant overhead, but won't
> even think about alternatives unless there is some standardization to allow
> interoperability. FWIW, that describes my employer's position [or rather,
> the consensus of us geeks who have looked into the matter, obviously not
> the "official" company line].
>
> "The binary formats actually already exist, and the market has ignored them
> with a resounding silence. They have achieved no traction and no interest
> in the community." -- Interesting point ... I don't see it this way, but
> would love to see some discussion about it.
>
> " These are toxic technologies that serve no one's interests. They
> significantly compromise the XML promise of interoperable, interchangeable
> data that can be processed by a host of free, simple, readily available
> tools." -- The main counter-argument that I'm seeing is that XML-based
> projects/applications are having a hard time making the transition from
> "concept proven" to "actually deployed" in mainstream, non-early-adopter
> businesses. "Bloated" and "sluggish" are frequently used at least in the
> recent trade press articles slamming XML, and AFAIK this stems at least in
> part from from the problems real people are seeing when they plan to scale
> up a proof-of-concept XML project to a "bet the business" proposition. I
> agree that only a small minority of "XML" users will need to exchange
> binary infoset serializations, but we will *all* benefit by having a growth
> path from free/simple/text tools to industrial-strength "bet the business
> on" technologies.
>
> Anyway, flame away .... I need to figure out what to say in my position
> paper :-)
|