xml-dev - Re: [xml-dev] Fast text output from SAX?

Re: [xml-dev] Fast text output from SAX?

[ Lists Home | Date Index | Thread Index ]

To: bob@wyman.us
Subject: Re: [xml-dev] Fast text output from SAX?
From: "Stephen D. Williams" <sdw@lig.net>
Date: Sat, 17 Apr 2004 01:09:57 -0400
Cc: "'Bullard, Claude L (Len)'" <clbullar@ingr.com>,'Elliotte Rusty Harold' <elharo@metalab.unc.edu>,'XML DEV' <xml-dev@lists.xml.org>
In-reply-to: <004001c42409$9d4e3370$650aa8c0@BOBDEV>
References: <004001c42409$9d4e3370$650aa8c0@BOBDEV>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6a) Gecko/20031030

Bob Wyman wrote:

Stephen D. Williams wrote:

ASN.1 is based on certain assumptions that are not
true of every situation that needs better efficiency
than XML 1.1. As an example shoehorning self-description
into a format that was explicitly designed in the
opposite way doesn't seem like the best path, to me.

This is why my coworkers are driven crazy by my normally pedantic and precise description and naming of everything... I get lazy one time...
Yes, I know that ASN.1 is the schema language and that you need to convert it separately with BER, DER, PER, etc.
The first CCITT/ISO specs I bought and read were for ASN.1, BER, X.400 etc. in 1990 when I configured a real live OSI stack and an X.400 server on an HPUX 3 or 4 box. (What a nightmare, the equivalent of an IP address was a zillion hex digits and there was only static routing.) I also setup ae.ge.com and glued the X.400 system to Internet email with a completely custom sendmail configuration and my own fuzzy name lookup, sort of an X.500. We also did some CALS work. Ancient history.

I was obviously using "ASN.1" for "ASN.1, xER, and everything else you need.". I did download the new specs but ...

	Clearly, you do not understand ASN.1 very well. First, ASN.1
is just a schema language -- it is not an encoding format. But, it is
a schema language that was first used to define self-describing binary
data streams. These were encoded using the (Basic Encoding Rules)
which relied on "tag-length-value" encoding. (i.e. each value is
tagged with its type and its length). You can think of BER as ASN.1's

I didn't think that BER had an identity equivalent, just scalar typing.
ONC-RPC included the type also, but no identity equivalent.

own "XML". i.e it is very easy to read (assuming you're comfortable
with binary and have a schema to look up the non-standard type
numbers), very easy to write, and it is bulky (because of all the tags
and lengths). Note: There is even the equivelant of "namespaces" in
ASN.1... These are driven off an ISO maintained OID (object
identifier) tree. Thus, you can ensure that your tags are globally
unique. (Admittedly, many will argue that the OID method is
unnecessarily cumbersome...)

But used for PKI formats, right?
Global registration is ok for new government CA policy identifiers or final versions of new codec standards, but pretty much useless for most of the rest of the world's applications. I believe that there are private ranges, like the 10. or 192.168. IP ranges, but that's not as useful as a URI schema or namespace base.

	Other encoding rules for ASN.1 have been defined. For example,
there is XER (XML Encoding Rules -- basically BER with text tags...),
CER (Canonical Encoding Rules) and DER which are useful when doing
signed stuff, and PER (Packed Encoding Rules) which are highly
efficient and compact. If you only look at PER, you might get the idea
that ASN.1 codings aren't self-describing. That's because PER is the
"schema-based" version of ASN.1 while BER is sort of like the
"schema-free" version of ASN.1 encodings.
	Note: The ASN.1 community went through much the same
progression that some people hope the "XML" community will go through.
i.e. BER was self-describing and could actually be parsed without
schema knowledge. But, it was inefficient and fat (but smaller than

It could be parsed, but still only understood by application code that knew what it was going to get, i.e. that a particular object type would have certain fields, variants, etc. Right?

XML in most cases). PER was able to get compactness and efficiency by
forcing the parser to understand the schema. You choose which to use
based on your need. 
	The schema-free vs schema-required arguments raged many years
ago in the ASN.1 community. What it basically comes down to is that
which is right depends on the application. If you've got a great deal
of variety in what you're receiving and the format definition
frequently changes, then use something like BER. But, if you've
stablized your format and it doesn't change more frequently than
you're able to deploy a new schema, then PER is great.
	In summary: There is no need to "shoe-horn" self-description
into ASN.1. It has been there since day one. i.e. almost 20 years
now...

So, I want to write an application that allows a form editor to define a data schema and data structure that mirrors the form with named fields. The application can create XML documents/objects with the new format and input data from a dynamic GUI. It sends the schema (optionally) and the XML form data objects to another process which dynamically interprets and displays them.

Trivial with XML, or esXML because the structure and naming are integral and the library and application can produce and consume the data interpretively. No IDL compiling, no code stubs, no rebuilding applications, and simplified version management.
How do I do this with ASN.1+{BER,DER,PER,XER,...)? Even with XER, doesn't the ASN.1 have to be compiled into code at some point?
Are there libraries that support doing all of this with metadata and live interpretation?

Besides trying to avoid parsing, serialization, and the "object creation explosion", I dislike IDL-driven development. Inevitably most projects evolve repeatedly, often involve modules that don't need to know the whole object structure, and sometimes involve router-like systems that should not have to be updated often.

Certainly IDL is useful as an interface contract definition, but doesn't an XML schema do that better?

Personally, I would like the lower bounds for what a programmer MUST do to interface two applications to be one line of code to write each field and simple writes of the data space. The corresponding receiver would read the data in and need one line of code to read each field, optionally using enumeration to know what fields were there. Certainly the step above this is schema exchange for interface contract, but no import/export, stub, IDL, or other fluff should be needed.

		bob wyman

sdw

-- 
swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw

begin:vcard
fn:Stephen Williams
n:Williams;Stephen
email;internet:sdw@lig.net
tel;work:703-724-0118
tel;fax:703-995-0407
tel;pager:sdwpage@lig.net
tel;home:703-729-5405
tel;cell:703-371-9362
x-mozilla-html:TRUE
version:2.1
end:vcard

Follow-Ups:
- RE: [xml-dev] Fast text output from SAX?
  - From: "Bob Wyman" <bob@wyman.us>

References:
- RE: [xml-dev] Fast text output from SAX?
  - From: "Bob Wyman" <bob@wyman.us>

Prev by Date: Re: [xml-dev] Fast text output from SAX?
Next by Date: RE: [xml-dev] Fast text output from SAX?
Previous by thread: RE: [xml-dev] Fast text output from SAX?
Next by thread: RE: [xml-dev] Fast text output from SAX?
Index(es):
- Date
- Thread