[
Lists Home |
Date Index |
Thread Index
]
> -----Original Message-----
> From: Bullard, Claude L (Len) [mailto:clbullar@ingr.com]
> Sent: Sunday, April 18, 2004 14:49
> To: 'Michael Champion'; 'XML DEV'
> Subject: RE: [xml-dev] Validation vs performance - was Re:
> [xml-dev] Fast text output from SAX?
>
>
> Yes. Dead on. When where and under what conditions
> do applications need alternative formats? Those
> who think they need one should be making the cases
> for those conditions now.
>
> Here's the shakedown: binaries vs text formats as
> Bob W. points out is an old debate. There are:
>
> 1. Those who are developing a generalized binary
> and want to offer that.
Let me point out one fact about ASN.1 that I see overlooked sometimes,
especially when people try to compare ASN.1 with XML: **ASN.1 is not
inherently binary**.
ASN.1 focuses on a level of data description that is more abstract than a
wire representation. (This is one reason why a direct comparison with XML
1.x syntax is difficult or even inappropriate.)
For example, the following ASN.1 type definition:
------------------------------------------
EmailMessage ::= SEQUENCE {
from EmailAddress,
to SEQUENCE OF address EmailAddress,
cc SEQUENCE OF address EmailAddress,
sent DATE-TIME,
received DATE-TIME,
subject UTF8String,
body UTF8String
}
EmailAddress ::= UTF8String (PATTERN "(some pattern)")
------------------------------------------
is a complete description of data (from ASN.1's point of view), but says
nothing at all about the on-the-wire representation of the data.
In particular, there is no implication that the data will be represented in
some binary form. The on-the-wire representation can be XML 1.0 just as
well.
ASN.1 folks call the data-description level "type definition" or "abstract
syntax", and call the on-the-wire representation "encoding" or "transfer
syntax". The main focus being on the "abstract syntax" enables multiple
distinct "encoding rules" to exist, each specifying a different on-the-wire
representation of the data that has been defined at the "abstract syntax"
level.
This has given rise, over the years, to a number of standard "encoding
rules", some of which are binary, some of which use XML 1.0. Every time,
there has been a good reason for standardizing a new set of encoding rules,
starting from BER, then DER/CER, then PER, then XER, then EXTENDED-XER.
I am not saying that the ASN.1 solution fits all cases (or even most of the
cases). I know that many people prefer thinking in terms of
bits-on-the-wire (or in terms of Unicode characters to be encoded in some
character-encoding before being placed on the wire), and I am not
questioning their views here.
However, I suspect that many applications are being built around a schema
(now often XML Schema) in such a way that they will not tolerate any
variations to the form of XML document that does not conform to the schema.
If my suspect is well-founded, then these applications could be built as
easily around a schema written in ASN.1. ASN.1 fits a common definition of
a schema language, in that it "offers facilities for describing the
structure and constraining the contents of XML 1.0 documents, including
those which exploit the XML Namespace facility".
One special characteristic of ASN.1, (currently) not shared by XML Schema
and others, is to allow multiple standardized on-the-wire representations,
some of which are not based on XML 1.0.
Here is an example of a fragment of XML that is valid according to the type
definition above:
-----------------------------------------------------
<EmailMessage>
<from>abcde@xyz.com</from>
<to>
<address>1@abc.com</address>
<address>2@abc.com</address>
</to>
<cc/>
<sent>2004-03-05T22:03:55</sent>
<received>2004-03-05T22:04:55</received>
<subject>Validation vs. performance</subject>
<body>This is the body of the email</body>
</EmailMessage>
-----------------------------------------------------
Here is a fragment of XML Schema equivalent to the ASN.1 fragment above:
-----------------------------------------------------
<xs:schema xmlns:xs="(schema namespace)">
<xs:element name="EmailMessage">
<xs:complexType>
<xs:sequence>
<xs:element name="from"
type="EmailAddress"/>
<xs:element name="to"
type="MultipleEmailAddresses"/>
<xs:element name="cc"
type="MultipleEmailAddresses"/>
<xs:element name="sent" type="xs:dateTime"/>
<xs:element name="received"
type="xs:dateTime"/>
<xs:element name="subject"
type="xs:string"/>
<xs:element name="body" type="xs:string/>
</xs:sequence>
</xs:complexType>
<xs:element/>
<xs:complexType name="MultipleEmailAddresses">
<xs:sequence>
<xs:element name="address" type="EmailAddress"
minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:simpleType name="EmailAddress">
<xs:restriction base="xs:string">
<xs:pattern value="(some pattern)"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
-----------------------------------------------------
Both the fragment of XML Schema shown here and the fragment of ASN.1 shown
above "describe the structure and constrain the content" of a class of XML
documents which includes the example shown above.
Does it make any sense to compare ASN.1 with XML? No.
Does it make any sense to compare ASN.1 with XML Schema? Probably yes.
XML Schema has a dualism between value and lexical representation, which is
not very far from the ASN.1 dualism between value and encoding. The main
differences are:
1) In XML Schema, the concept of value only exists for simple types, whereas
in ASN.1, the concept of value exists both for complex types and for simple
types.
2) XML Schema specifies one standard mapping between the value space and the
lexical representation, whereas ASN.1 specifies multiple standard mappings
(encoding rules).
I am sure there are many other points of contact between the two languages.
In fact, the X.694 standard, which specifies a translation from XML Schema
to ASN.1, would not be possible if XML Schema and ASN.1 were not
sufficiently similar. Although there are some features of XML Schema that
have no match in ASN.1, most of the language can be mapped faithfully.
So, if ASN.1 can be considered as another schema language for XML, what is
so special about it? The fact that a "value" (of a complex or simple type)
can have several "lexical representations", some binary, some based on XML
1.0.
This provides **one solution** to the "binary XML" problem. This is not a
universal solution, of course, because it requires that a schema be known,
shared, and invariable (although ASN.1 has important provisions for
extensibility both across space and over time.)
Alessandro Triglia
OSS Nokalva
|