OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] SAX for Binary Encodings (SAD-SAX)

[ Lists Home | Date Index | Thread Index ]

Elliotte Rusty Harold wrote:

>> And you're saying that passing in objects when people want primitive 
>> values is bad - but you condone passing in a CHARACTER STRING when 
>> people want primitive values? So they have to parse it rather than 
>> just unboxing it?!?!

> I'm saying pass in what the parser actually knows: the text. The parser 
> knows nothing else.  The local application may know more, and thus it is
> the right place to do the conversion, but an XML parser knows nothing 
> more than the text. There is no way for the parser to know in advance 
> how an application is going to want to interpret any given piece of 
> text.

Yes there is - that's what I'm suggesting; that the application tells 
the parser its intentions by supplying it with some kind of schema.

>> But a plain-and-simple-XML-1.0 application would still be able to 
>> quite happily just have its characters() callback invoked for every 
>> bit of CDATA and in the document, and just ignore all of this stuff, 
>> quite happily. Nothing there would need to change.
> Not if what's in the stream is binary data instead of text.

You're confusing the issues here, that's not what I'm talking about.

>> Now look, I don't really think you're stupid; my snotty tone in this 
>> and the last email is just because I really don't think you're being a 
>> useful contributor to this discussion, and that annoys me... you're 
>> just saying "No, it'll never work, because some people use XML in a 
>> way that won't need this stuff. I see XML as just text, so everyone 
>> else must too". Lots of people are already using XML in not-just-text 
>> ways.

> Really, Who?

XHTML assigns 'integer' semantics to width and height attributes all 
over the place. And 'url' semantics to various src and href attributes.

Yes, you could indeed write <img src="..." width="three inches"> within 
the XHTML namespace and save it to disk, and humans who read that would 
be able to make sense of it. But an XHTML processing application isn't 
required to make sense of it. The XHTML spec assigns a type to that 
width attribute, and that type states that it has to be an integer, and 
indeed a particular representation of an integer (decimal).

This is good for interoperability - if there was no such type 
constraint, then some XHTML browser authors might think "Hmmm, a width 
attribute... since my target market for this XHTML parser is really 
people writing XHTML and they're all design consultants who are more 
familiar with printed brochures than computers, so I'll parse it as 
English text with the name of a unit of measurement in". While some may 
say "Hmmm, I'll take this attribute as a decimal number containing a 
number of pixels, because that's easy". And lo! No XHTML document 
containing width attributes will then work on both at the same time.

The type constraints in *valid* XHTML documents exist to place a bound 
on the work required in an imlementation. The implementator knows he can 
call his XHTML browser 'complete' when it handles integers in the width 
element - complete in the sense that it will be able to correctly render 
valid XHTML.

But he is not prevented from thinking "Eh, people are stupid; I bet 
somebody will write an XHTML document with an invalid width attribute... 
I've got some spare time, I think I'll make my browser support a few 
extra possible ways of expressing a width, just so that my browser will 
display some documents others won't". If he really really really wants to.

 > I think you're very confused. You don't get the difference
> between local interpretation of an XML document and the document itself.

Yes I do; all I'm talking about here is a local processing API.

> I have no objection if you want to define your own binary formats and 
> APIs for processing them. I object greatly when you try to pollute the 
> very useful XML and SAX APIs with things they were never intended to 
> handle.

As I mentioned in the other reply, though, it's no change to SAX APIs to 
add an optional SAX extension.

 > If you can use XML and SAX as you claim, go ahead, But you're
> saying you can't. You're saying you need to change them; that they 
> aren't good enough as is.


[Alaric thinks]

Ah, I see. The original point was that people *were* writing SAX 
interfaces to ASN.1 encodings like BER and PER, just that one 
consequence of this was that the parsers were reading in an integer and 
converting it to a string to pass into the characters callback, which 
seems wasteful when the application goes straight ahead and parses it 
back to an integer.

That doesn't mean that SAX *needs to change* to match ASN.1, now, does 
it? Because you clearly *can* just convert everything to text and feed 
it to the application like that.

It's just that allowing the application to say "Psst, SAX event source, 
I'm interested in native values - pass me some when you can" will enable 
the SAX parser - if it wants to - to save some time.

 > I think that just proves my point that XML and
> SAX aren't about binary data. They're about text. And I want them to 
> remain about text, because text works.

Doesn't work so well for raster image formats, file system structures, 
or data compression formats, though, does it? :-)

>> We are not your enemies! We're just engineers and scientists, trying 
>> to improve the technological state of the art for the betterment of 
>> humanity, yes? :-)

> You may believe that, but it isn't true. What's being proposed here is 
> actively harmful to the technological state of the art. It would make 
> SAX worse, not better. It would make XML much worse, not better.

It doesn't change either SAX or XML, though.

> Bottom line: I think the people clamoring to force binary data into our 
> nice clean text environment have an inferiority complex.

I can't say I've met any, so I wouldn't know.

 > I don't think
> you actually believe your binary data and formats can survive on their 
> own merits.

Computers processed numbers before they processed text; right now we're 
seeing if *text* can survive on its own merits. Because it hasn't yet 
ousted binary...

 > I think you know the ASN.1 dog won't hunt, and if you run it
> up the flagpole nobody will salute. I think that's why you're fighting 
> so hard to force SAX and XML to contort to fit your needs. I think you 
> believe that if you defined your own binary, typed APIs and formats, 
> they would not be adopted because they'd be recognized as inferior to 
> XML. And I claim you think that the only way to get your binary, typed 
> formats and APIs adopted is by attaching the XML brand to them.

Why would we waste the effort? What would be our motiviation to push a 
model we know is inferior? Wouldn't the effort be better spent

> Disagree with me? Then prove me wrong. Create binary formats and APIs 
> for handling them without polluting XML.  I won't stand in your way. If
> you're successful, that's a win for everyone. But I don't believe you 
> can, and I'm really afraid you're going cripple or destroy XML and SAX 
> because you know you can't win otherwise.

Binary formats and APIs for handling them already exist, and did so long 
before XML came around. They're already pretty successful; you use them 
whenever you pick up the phone, use a computer, look at your wristwatch, 
watch TV, drive a recently made car, pay tax, interact with your bank, 
etc. They don't seem to be dying off now XML is around, either.

What more can I do to prove you wrong than that?



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS