xml-dev - RE: [xml-dev] Validation vs performance - was Re: [xml-dev] Fast text ou

RE: [xml-dev] Validation vs performance - was Re: [xml-dev] Fast text ou

[ Lists Home | Date Index | Thread Index ]

To: "'Stephen D. Williams'" <sdw@lig.net>
Subject: RE: [xml-dev] Validation vs performance - was Re: [xml-dev] Fast text output from SAX?
From: "Alessandro Triglia" <sandro@mclink.it>
Date: Mon, 19 Apr 2004 23:46:16 -0400
Cc: "'Bullard, Claude L \(Len\)'" <clbullar@ingr.com>, "'XML DEV'" <xml-dev@lists.xml.org>
Importance: Normal
In-reply-to: <4084905A.80603@lig.net>

> -----Original Message-----
> From: Stephen D. Williams [mailto:sdw@lig.net] 
> Sent: Monday, April 19, 2004 22:52
> To: Alessandro Triglia
> Cc: 'Bullard, Claude L (Len)'; 'XML DEV'
> Subject: Re: [xml-dev] Validation vs performance - was Re: 
> [xml-dev] Fast text output from SAX?
> 
> 
> Alessandro Triglia wrote:
> 
> -----Original Message-----
> From: Stephen D. Williams [mailto:sdw@lig.net]
> ...
> I mentioned XML consisting of idioms along with syntax and 
> other explicit standards.  One powerful idiom that has become 
> accepted and 
> expected with XML is that, whenever at all possible, you produce 
> precisely but accept loosely.  
>     
> I think you are being too vague here.  There must certainly 
> be **rules** on how and when you can be loose.  
>   
> Yes, I was being vague.  There is a certain accepted, but not 
> necessarily universally uniform or acceptable, range of 
> looseness. My interpretation of these:
> 
> If you expect an attribute "age", will you accept an 
> attribute "Age" instead?
>   
> No.  Case insensitivity is so DOS.  Good for DNS but little else.
> 
> If you expect an attribute "age", will you accept a child 
> element <age> instead?
>   
> Yes, in many applications I would write it this way.  This 
> allows me brevity with an attribute and extensibility with an 
> element tag.  Except for special-purpose attributes (ID, et 
> al), this seems to make sense to me in a general application sense.

XML Schema will not validate a document where an element with an expected
attribute age  contains instead a child element <age>.  So you cannot
benefit from using, say, a schema-validating parser, unless you massage your
document after receiving it and before submitting it to validation.  How
realistic is this scenario?

Of course, if you want that kind of flexibility and still be able to use
schema-based validation, you can write your schema in such a way that it
accepts both the attribute and the child element, and make the attribute not
required and the element minOccurs zero.  Then you can impose an
application-level constraint that the element and the attribute must
(should?) not both be present at the same time.

In ASN.1, you can do the same, like this:

PersonInformation ::= SEQUENCE {
	name UTF8String,
	age [ATTRIBUTE] INTEGER OPTIONAL,
	age-elem [NAME AS "age"] INTEGER OPTIONAL,
	....etc....
}

> 
> If you expect an attribute "abc:age", will you accept an 
> attribute "age"
> (unqualified) instead?
>   
> No.
> 
> If you expect two child elements <a> and <b>, will you accept 
> a <c> in between?  What if you have a <d> before the <a> but 
> the mandatory element <b> is missing altogether?  Will you 
> accept and know how to handle this situation?
>   
> Yes.  Yes/Depends.
> 
> Will you accept a namespace name "abcde" when a schema 
> specifies the namespace name "abcdef", or "abcdef/", or "ABCDE"?
>   
> No.
> 
> Are you saying that it has become a common idioma in 
> application development to expect and accept one or more of 
> the above?  I cannot believe you really mean that.
>   
> I do agree that you may want to tolerate an addition to an 
> element (say, an extra child element, usually at the end of 
> the "expected" content, or an extra attribute).  So what?  
> ASN.1 supports this **formally** in the type definitions.  It 
> has done so for many years.  There is even a clause that you 
> can place at the beginning of an ASN.1 module and means that 
> one must expect additions anywhere.
>   
> I will have to read to see how that works with typical 
> PER-based code.  I didn't think that typical implementations 
> were that flexible.

The H.323 family of videoconferencing standards (and other standards that
use PER) make heavy use of extensibility.

If you have the following type definition:

--------------------------------------------------
PersonInformation ::= SEQUENCE {
	name UTF8String,
	age [ATTRIBUTE] INTEGER OPTIONAL,
	age-elem [NAME AS "age"] INTEGER OPTIONAL,
	address UTF8String,
	...
}
--------------------------------------------------

the ellipsis ("...") indicates a position in the sequence (usually but not
necessarily at the end) where an "addition" may occur.

A writer may add stuff after the "address" field.  PER wraps the stuff and
adds a length prefix right before it.  A reader will be able to either
decode the stuff (if he thinks it understands it) or skip it.  If a reader
decides to relay the message, it can re-include the wrapped stuff in the
message, whether or not it understands the stuff.

Without the ellipsis, it is forbidden to add anything after the address.
With the ellipsis, the wrapping mechanism ensures that any addition will be
recognized as such, whether or not it is understood.

***

Although I have referred to PER, all of the above holds for all encoding
rules.  Additions are allowed or forbidden based on the presence of an
ellipsis.  (I have mentioned PER because I think you asked how PER handles
extensibility.)

> 
> This is a direct expansion of the IETF 
> meta-rule that states a similar principle.  Furthermore, when 
> accepting 
> loosely and re-emitting an existing document/object, you attempt to 
> preserve anything originally present, even if you didn't 
> expect it.  For 
> instance, an 'object', i.e. a complex data type, may have grown a new 
> field.  You should not die when encountering this field and 
> if you are 
> modifying and exporting that object, you should preserve the field.
>     
> Exactly.  This is what the ASN.1 notion of extensibility was 
> invented for.
>   
> I don't understand how this works in practice yet.
> 
>  
> ...
> ASN.1 is, of course, a logical data/API definition syntax, 
>     
> 
> 
> ASN.1 defines no API whatsoever, nor do ASN.1 modules define 
> APIs.  ASN.1 modules define **data types**.
> 
>   
> <Sigh>  Make that "ASN.1 + xER + existing implementations".  
> Message format + semantic sugar = API or Message format + 
> protocol + semantic sugar = API Standard message format  
> implies  standard API Standard API  does not imply standard API.

Sorry, I still don't get it.  An API is not the same thing as a data type.
Why do you think they are the same thing?

Alessandro

Follow-Ups:
- RE: [xml-dev] Validation vs performance - was Re: [xml-dev] Fasttext output from SAX?
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] Validation vs performance - was Re: [xml-dev] Fasttext output from SAX?
  - From: "Stephen D. Williams" <sdw@lig.net>

References:
- Re: [xml-dev] Validation vs performance - was Re: [xml-dev] Fasttext output from SAX?
  - From: "Stephen D. Williams" <sdw@lig.net>

Prev by Date: Re: [xml-dev] Validation vs performance - was Re: [xml-dev] Fasttext output from SAX?
Next by Date: Re: [xml-dev] Validation vs performance - was Re: [xml-dev] Fasttext output from SAX?
Previous by thread: Re: [xml-dev] Validation vs performance - was Re: [xml-dev] Fasttext output from SAX?
Next by thread: Re: [xml-dev] Validation vs performance - was Re: [xml-dev] Fasttext output from SAX?
Index(es):
- Date
- Thread