OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ASN.1 and XML



On 24 May 2001, Simon St.Laurent wrote:

> > Which process do you refer to?
> 
> All of the parts you described above.  "Gnarly parsing", codecs, and
> compile time...

Ok. The parsing isn't something they have to do (likewise, they don't have
to write an XML parser, or their own scripting language's parser :-). The
codecs plug in, like their XML parser does (it even does the same
job). Compile time doesn't exist for scripting languages, indeed, so you
use an interface like I describe later (in the same email? Can't
remember):

$contact = asn_load_type(".../contact.asn");

$contact->decode("MobilePhone",...data...");

echo $contact->number;
echo $contact->handset;

> And how do you plan to address the people who work in XML on a regular
> basis but aren't actually programmers?  People who don't mind getting
> their hands dirty in the data, maybe even applying regular expressions
> to it, but aren't planning on writing scalable systems for large-scale
> data interchange?

People who just edit the data? Well, I'd suggest that for quick hacks, the
data be encoded using textual encoding (such as XER :-), but ideally
you'll want a customised interface that makes use of semantic information
beyond capturing in any non Turing complete schema language (cue Godel
thread again) to handle it all anyway. And people can poke at BER or even
PER data with an ASN.1 editor; sure, they need a special package for it,
but that package works at the level of something like XML Spy rather than
a plain text editor.

If all you *have* is a text editor, and can't get specialised software to
edit your chosen data format, then indeed, you are in a dilemma - you
should have used XER as your transfer syntax :-)

> While I'm happy to admit that ASN.1 is better optimized for certain
> situations, a lot of see that optimization as coming with costs that
> aren't worth bearing for a lot of work.

The optimisation stuff occurs in specific encoders. ASN.1 itself is just
the "schema" language; the data representations build on top of ASN.1 more
than the other way around, representing its data model in different
ways. Now, the ASN.1 language itself is more expressive than XML Schema
(Mmmm, information object classes...), so coupled with XER (XML Encoding
RUles), it can actually work *as* an XML schema language. With the magic
property that, just by using a different encoder/decoder pair, the same
data can be squeezed through an embedded processor or across a dialup link
or a wireless link, with no extra coding (beyond the fact that someone,
somehwere, has to have written the codec, but that's a one-off cost).

> Well, I guess markup has been called worse.  Personally, I'm happy to
> work with XML because I do real work using it at practically zero cost.
> I can explore it in a text editor, serve it on a Web server, generate it
> with tools built for HTML, and do all kinds of stuff with tools that
> aren't even designed with XML in mind.

This is true, but it applies for many other things (even nasty CSV files -
they're not so nasty if you put a row of field names as a
"header" row; then they become much like a non-hierarchical XML!)

> It sounds like you're purely focused on data.  I'm happy to see that
> document tools have proven useful for data communities, but haranguing
> me (and probably much of this list) about the need to build
> highly-optimized data interchange formats probably isn't going to win
> you a whole lot of converts.

I'm not purely focussed on data, but I'm only discussing it right now
since I think it's bad that XML is being applied to data like this. I like
XML for documents. Me <heart/> DocBook.

> > Dude, read my emails! I just explained that you only need a disassembler
> > if you choose to use a non-textual encoding!
> >
> > Yet still everyone using ASN.1 DOES choose non-textual encoding. Textual
> > encodings tend to be used for examples in books and debugging log output,
> > as far as I can tell, and that's about it.
> 
> I'll take the second paragraph to negate the first.  Textual encodings
> in XML are used equally abundantly in examples and in reality. 

But XML doesn't give you any choice... ASN.1 does give you choice, and
people tend to choose binary for "real work", text for books...

If you stored XHTML or DocBook with ASN.1, it would be more practical to
use a textual encoding, of course. And you can :-)

> > Exciting?!?! This isn't here to amuse you! PER is a tight, compact,
> > encoding. It was defined because it was needed. XML doesn't give you a
> > choice. It limits you for no good reason.
> 
> Your limitation, my freedom.  If I really wanted a "tight, compact
> encoding" I think I might actually find PER exciting.  Since I have
> other priorities - readability among them - I find PER limiting for no
> good reason.

PER would be a crazy choice for a document, certainly, so not for your
application, but that wasn't what I was aiming at; I'm worried about the
uses of XML for data transfer :-)

However, ASN.1/XML interoperability has the nice property that converting
from XER to PER (a standard transformation, as opposed to a
domain-specific hack like WML) can be done in gateways to reduce the costs
of getting your documents to a dinky embedded device...

> > No, but how much of what is happening today with XML is heavily inspired
> > by SGML? See the email before this one for more on that.
> 
> Actually, lots.  Repurposing technologies doesn't necessarily mean that
> all the experience that led to the creation of those technologies is
> discarded.

I worded that badly - sorry, fog of rage... of course there's
*inspiration* from the SGML past, but XML is still starting from scratch
based on things learnt from SGML; my original thrust was that ASN.1 has
been around since 1984 and XML hasn't, the fact that XML learns lots from
SGML which was around since 1986 contrasts with the fact that ASN.1
learnt stuff from the Xerox XNS Courier system, which dates back to God
knows when... the exact dates don't matter (I'm not going to claim that
ASN.1's two year head start on SGML resulted in any amazing insights
:-); the fact is that ASN.1 has been doing data interchange for a lot
longer than the current thinking about using XML in SOAP/XML-RPC/etc. and
has resolved a lot of the issues worrying the XML-data community today.

> > I've not studied the history that hard, but XML seems to have gained hype
> > since the promises of the Semantic Web struck at the hearts of marketing
> > types, AFAICT.
> 
> That would explain why so many people on this list keep looking at the
> Semantic Web sideways. 

Yeah. Hype benefits nobody but the companies who sell false hopes, it
seems...

> Is the fault in ASN.1's stars, or in itself?  XML seems to hit people
> and make them jump up and down - usually happily.  I think there's more
> going on than marketing. 

I would be interested to know what, since it doesn't produce that effect
for me :-)

(Perhaps it's the object oriented Pixie dust Claude enigmatically referred
to...)

> (Though I'll be the first to admit that there is a cruel learning
> curve beyond the fundamentals.)

ASN.1 has some of that too - you need to know about bits if you want to
know about how PER works, and writing information object classes can be a
bit confusing for the weak at heart, but using them is fine.

> > Where it not for the hype, XML would be no better - the accessability
> > stuff is a symptom of the hype, that's all; it's the hype that annoys me
> > :-)
> 
> Well, I see the hype as a symptom of the accessibility, so round and
> round...

Yep, it's a feedback thing, obviously.

> I'm not sure declaring that some information is meant for humans and
> some is meant for machines is such a wise decision.  A lot of the
> excitement around XML comes from the ways in which it explicitly
> violates that separation.

I don't think there's a seperation in what the information is meant for,
just in what the presentation is meant for... sequences of bits on disk
are meant for machines, patterns of light on paper or screen is meant for
humans; with BER or PER encoded data the conversion from disk to screen is
more involved, while with XER/XML it's easier, but that's a quantitative
difference rather than a qualitative one to weigh on a case by case basis
against the costs of storing and processing those on-disk representations,
but at least with ASN.1 it's a simple matter to change which
representation you use, while XML removes the option...

> > The existance of BER and PER doesn't mean that BER and PER are vital to
> > ASN.1. Think of the ASN.1 encodings like SGML application profiles - it's
> > a pretty good analogy. People can stick to XML without going into wierd
> > profiles that use = and + instead of < and >!
> 
> Getting rid of application profiles was a lot of what XML 1.0 was about.

Yes, they didn't do much that was useful, but the encodings work at a more
fundamental level of representing the data rather than representing the
*structure*, which is what the application profiles do.

> > You can do textual encoding with ASN.1. 
> 
> Yes, and you _can_ encode binary information in XML as Base 64.  People
> don't tend to use either practice when they can avoid it.

Well... that's not quite the same, nobody suggests storing their Customer
record as (say...) an ONC XDR file and then sending it Base64'ed in XML
:-)

> > And, if you wish to, you can
> > invest the extra effort to use compact binary encodings when you want to.
> > If you don't want to, you can ignore them.
> 
> Just like people _could_ ignore all of the features in SGML, if they
> wanted to.  The mere existence of ignorable features adds significantly
> to the perception of complexity.

They do ignore the stuff in SGML, and have XML! :-)

> > Perhaps writing everything in
> > ASCII *is* the best way of transferring data over the Internet. That's not
> > a problem.
> 
> Replace ASCII with Unicode - that was another large move XML made.

There's been a UTF8String in ASN.1 since 1997, too, IIRC :-)

> > But the same tools can deal with compact binary data from
> > embedded systems if you just plug in the appropriate decoder/encoder
> > module. A good toolkit will give you the option, but not force you to use
> > it. I explained that!
> 
> But why would I want to get into that?  If you write an ASN.1 parser
> which emits SAX events and an ASN.1 writer which consume SAX events, I
> might be willing to use ASN.1 in those cases where I absolutely have to
> work with ASN.1 streams.  Otherwise, I'd really rather stick with what
> I've got, thanks!

But the same SAX interface could do XML as well. Indeed, if you found
yourself needing to communicate with something too small for an XML
parser, you tell that interface to switch from XER to PER without any
great worries. It's in the pipeline :-)

> > No! It's not! XML lets you do less than ASN.1 does, and is about equally
> > complex (the fact that there are several ASN.1 encodings does *not* make
> > it more complex, you can ignore the ones that aren't applicable to your
> > application, or even ignore the issue altogether and let the toolkit use
> > its default, which will probably be BER. Tell it to choose XER if you
> > want a textual format, but in practice this will probably only be used to
> > present data to a human, and never used as *input*.).
> 
> That's the longest parenthetical phrase I've seen in a long while
> telling me why something isn't complex. 

Ooops! You'd never know I'd played with Lisp, would you?

> There's a usable subset of XML
> that I could probably explain in the same number of characters if you
> really want to push.

But with XER finalised, that'd also be an explanation of an ASN.1
encoding!

> "The user does not need" does not sound enticing to me.  If I want
> access, I want it yesterday. 

This does not seem to apply to the bit sequences used to represent Unicode
or used in TCP/IP or zip files or the filesystem; ASN.1 started life in
the Presentation Layer of the ISO networking stack (beneath X.400, in
fact). It's just a layer between what we nowadays use a TCP/IP stack for
and the layers above. Why is this a magic place where user-transparency is
required?

> As far as users needing to be able to read
> and write XML, there's nothing which says they need to use Notepad or vi
> - they can wrap their work in toolkits the same way you describe for
> ASN.1.

Books about XML contain the examples in XML syntax - it's a cultural
thing; people dealing with XML are assumed to know the syntax, but people
working with ASN.1 are not assumed to know any encoding rules.

> > To the script author, using an ASN.1 toolkit will be almost identical to
> > using an XML parser, if not easier since it will use the schema
> > information to map things to types in the language you're using rather
> > than just giving it all to you as a DOM tree of strings.
> 
> And if I found types deeply interesting, I might be excited about that.

Why not otherwise? All I mean is that you will get back from the decoder
data in whatever format is native to your scripting language, rather than
a DOM tree.

So you can kind of do without XPath for a lot of things and instead have:

$data->owner->address[5]

...for the fifth member of the address sequence inside the owner
"element"; and if that expression happened to point to an integer rather
than a string, that expression would result in an integer, rather than a
string. Certainly, most scripting languages will invisbly cast a string to
an integer so it's not so much use in a scripting context, but in C or
Java that's a nice feature.

> To the extent I do find them interesting, I can get the same information
> out of the PSVI - far more information than I want in any event. 

It's not about curiousity, it's about the encoder and decoder being able
to make use of available information to be able to automate more tasks for
the programmer. The same thing can be done with XML Schema
information. But it was done over a decade ago with ASN.1, and been
refined since...

ABS

-- 
                               Alaric B. Snell
 http://www.alaric-snell.com/  http://RFC.net/  http://www.warhead.org.uk/
   Any sufficiently advanced technology can be emulated in software