Lists Home |
Date Index |
- To: Michael Kay <email@example.com>
- Subject: Re: [xml-dev] Constrain the Number of Occurrences of Elements inyour XML Schema
- From: Greg Hunt <firstname.lastname@example.org>
- Date: Mon, 08 Aug 2005 11:44:17 +1000
- Cc: email@example.com
- In-reply-to: <E1E1rst-0005Df-I1@mx.mailix.net>
- References: <E1E1rst-0005Df-I1@mx.mailix.net>
- User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)
Why do you think I am talking about out of memory (OOM) errors? I
haven't mentioned them. I am more interested in managing throughput.
My experience with letting the Java heap fill up has in general not been
encouraging; particularly with multi-threaded applications, both
performance and function tend to suffer in unpredictable ways. My
experience with C and C++ OOM is even less encouraging.
The systems that I deal with are big enough for OOM to very rarely
occur, but system performance is more complex than OOM. Properly
engineered commercial systems will have understood capacity limits that
include assumptions about input size distributions and the resulting
service demand: memory, CPU, IO rate, IO volume, residency, concurrency.
I get OOM errors on my aged email desktop machine, not on 16-way AIX boxes.
However, lets take the OOM example and apply it to a system that is in
Imagine that you are running a real-time financial information system
with hard (meaning financial) penalties for transactions timing out
(this is a not a hypthetical case). What diagnostic or "helpful
recovery action" do you take for an OOM error if you have either gone
into excessive paging or caused the runtime to go very quiet for a few
tens of seconds? Who pays and how do the diagnostics help with the
money? Do you say to the client "well, someone sent us a document that
was too big so you don't get to penalise us"? They may say "it wasn't
us so we don't care". They may just laugh and walk away. They may say
"we sent you a document this big during testing and it validates against
your schema", and then laugh and walk away.
The approach that you describe works for desktop applications that do
not have hard performance limits or, less well, for processes which do
not have multiple users or multiple threads (so that the OS can insulate
transactions from each other to some degree). Highly variable
performance is entirely acceptable for some classes of user and for some
architectures and is not at all acceptable for others.
The other example of constraints (from EDI, but still applicable to XML)
that I have referred to is a real-world several hundred million dollar a
year interchange between two large companies. That type of constraint
is based on both performance (throughput and service demand) and
business process limits. Those constraints are not going to go away.
The specification of additional constraints to published standards will
keep happening and it would be nice to have some straight-forward tool
support for it.
This is a complex question, and one that only bites when you deal with
complex systems where you have to provide guaranteed behaviour. In my
first email on this subject I said that there are two separate concerns
that are overloaded on the one set of attributes: maxOccurs has a data
model theory sense and maxOccurs has an operational sense. We should be
able to define a local operational profile for a schema that constrains
that schema in additional ways, separating the modelling from the
operational senses, and be able to distinguish between local profile
constraint violations and data model violations and respond to them
differently if we choose to. That would make shared models more useful
because it would allow organisations to say "we accept this subset" or
"we accept this value domain" of the basic schema while still sharing a
basic data model, without having to rewrite the XML document for each
consumer to map it into the different sets of constraints.
Michael Kay wrote:
>>If we don't put these limits in the schema, they just have to go
>>somewhere else, somewhere less visible, less maintainable,
>>and with less
>>tool support. How SHOULD we do this if we aren't using the schema for
>>validation of these constraints?
>If processing is limited by memory size, then let the limit be enforced by
>the memory manager. That way, if you add more memory, the limit goes away.
>The concern at higher levels should only be to trap the errors coming from
>lower levels and produce meaningful diagnostics and helpful recovery
>I would say it is definitely bad design to try to write a schema that
>imposes its own limits solely in order to prevent an "out of memory" error