xml-dev - Re: [xml-dev] Constrain the Number of Occurrences of Elements inyour XML

Re: [xml-dev] Constrain the Number of Occurrences of Elements inyour XML

[ Lists Home | Date Index | Thread Index ]

To: Michael Kay <mike@saxonica.com>
Subject: Re: [xml-dev] Constrain the Number of Occurrences of Elements inyour XML Schema
From: Greg Hunt <greg@firmansyah.com>
Date: Mon, 08 Aug 2005 11:44:17 +1000
Cc: xml-dev@lists.xml.org
In-reply-to: <E1E1rst-0005Df-I1@mx.mailix.net>
References: <E1E1rst-0005Df-I1@mx.mailix.net>
User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)

Michael,
Why do you think I am talking about out of memory (OOM) errors?  I 
haven't mentioned them.  I am more interested in managing throughput.  
My experience with letting the Java heap fill up has in general not been 
encouraging; particularly with multi-threaded applications, both 
performance and function tend to suffer in unpredictable ways. My 
experience with C and C++ OOM is even less encouraging.

The systems that I deal with are big enough for OOM to very rarely 
occur, but system performance is more complex than OOM.  Properly 
engineered commercial systems will have understood capacity limits that 
include assumptions about input size distributions and the resulting 
service demand: memory, CPU, IO rate, IO volume, residency, concurrency. 
I get OOM errors on my aged email desktop machine, not on 16-way AIX boxes.

However, lets take the OOM example and apply it to a system that is in 
production today:

Imagine that you are running a real-time financial information system 
with hard (meaning financial) penalties for transactions timing out 
(this is a not a hypthetical case).  What diagnostic or "helpful 
recovery action" do you take for an OOM error if you have either gone 
into excessive paging or caused the runtime to go very quiet for a few 
tens of seconds?  Who pays and how do the diagnostics help with the 
money?  Do you say to the client "well, someone sent us a document that 
was too big so you don't get to penalise us"?  They may say "it wasn't 
us so we don't care".  They may just laugh and walk away.  They may say 
"we sent you a document this big during testing and it validates against 
your schema", and then laugh and walk away.

The approach that you describe works for desktop applications that do 
not have hard performance limits or, less well, for processes which do 
not have multiple users or multiple threads (so that the OS can insulate 
transactions from each other to some degree).  Highly variable 
performance is entirely acceptable for some classes of user and for some 
architectures and is not at all acceptable for others.

The other example of constraints (from EDI, but still applicable to XML) 
that I have referred to is a real-world several hundred million dollar a 
year interchange between two large companies.  That type of constraint 
is based on both performance (throughput and service demand) and 
business process limits.  Those constraints are not going to go away. 
The specification of additional constraints to published standards will 
keep happening and it would be nice to have some straight-forward tool 
support for it.

This is a complex question, and one that only bites when you deal with 
complex systems where you have to provide guaranteed behaviour.  In my 
first email on this subject I said that there are two separate concerns 
that are overloaded on the one set of attributes:  maxOccurs has a data 
model theory sense and maxOccurs has an operational sense.  We should be 
able to define a local operational profile for a schema that constrains 
that schema in additional ways, separating the modelling from the 
operational senses, and be able to distinguish between local profile 
constraint violations and data model violations and respond to them 
differently if we choose to.  That would make shared models more useful 
because it would allow organisations to say "we accept this subset" or 
"we accept this value domain" of the basic schema while still sharing a 
basic data model, without having to rewrite the XML document for each 
consumer to map it into the different sets of constraints.

Greg

Michael Kay wrote:

>>If we don't put these limits in the schema, they just have to go 
>>somewhere else, somewhere less visible, less maintainable, 
>>and with less 
>>tool support. How SHOULD we do this if we aren't using the schema for 
>>validation of these constraints?
>>    
>>
>
>If processing is limited by memory size, then let the limit be enforced by
>the memory manager. That way, if you add more memory, the limit goes away.
>The concern at higher levels should only be to trap the errors coming from
>lower levels and produce meaningful diagnostics and helpful recovery
>actions.
>
>I would say it is definitely bad design to try to write a schema that
>imposes its own limits solely in order to prevent an "out of memory" error
>happening.
>
>Michael Kay
>http://www.saxonica.com/
>
>
>
>  
>

Follow-Ups:
- RE: [xml-dev] Constrain the Number of Occurrences of Elements in your XML Schema
  - From: "Michael Kay" <mike@saxonica.com>

Prev by Date: RE: [xml-dev] Constrain the Number of Occurrences of Elements in your XML Schema
Next by Date: XSLT unit testing from Java
Previous by thread: RE: [xml-dev] Constrain the Number of Occurrences of Elements in your XML Schema
Next by thread: RE: [xml-dev] Constrain the Number of Occurrences of Elements in your XML Schema
Index(es):
- Date
- Thread