OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [xml-dev] When to Validate XML?




-----Original Message-----
From: Magick, Brian [mailto:Brian.Magick@COMPAQ.com]
Sent: Wednesday, October 31, 2001 9:25 PM
To: xml-dev@lists.xml.org
Subject: [xml-dev] When to Validate XML?


> Anyone interested in sharing their thoughts on under what circumstances
you 
> should validate XML?  
 
Oh goodie, another decision tree that isn't really a tree ...

I guess the first question is "do you care"?  If the text will be displayed
to humans and the stylesheet or whatever other GUI can extract the
information it needs, you may not care all that much whether it exactly
matches the schema or not.

If you do care, do you (as Marcus Carr put it) trust the sender to produce
valid XML?  If so, why validate it twice?   The obvious case where you don't
trust the 'sender' is when a human is authoring a document, and validating
the structure as the document is created (or when it is being saved) is
useful.  DTDs evolved to describe the structure of rigorously defined legal
and technical documents, so DTD validation does a pretty reasonable job of
this.

If you care and don't trust, I guess another question is "what does
'validation' mean?  It may be that you must validate against business logic
that can't be expressed in DTD or Schema syntax; if so, you have to make an
optimization decision on whether the XML validation step costs more than it
benefits.  My impression from various discussions with "real world" XML
users is that most need to 'validate', but that Schema/DTD validation
doesn't buy them much, so they end up doing the 'validation' in procedural
code.  The obvious example here is a message that refers an account number;
few care whether the account number is syntactically valid, they care
whether it identifies an active account in their database.

Another consideration is the cost of rejecting a "semantically" valid
document that happens to have some syntax out of place.  My favorite
(hypothetical) example is a $1000000 purchase order with a private attribute
in it somewhere; would you really reject it?  If "reject" means turn it over
to a human, sure ...But if you're using human or programmed business logic
look the documents over anyway, why bother validating the XML in the first
place? 

Another impression I've received over the years is that validation (via
DTDs, Schemas, or whatever) tends to be more of an "audit time" activity
than a real-time activity. The DTD or schema represents a "contract" between
the producer and consumer; validation is useful at the debugging stage, but
after things are up and running it's used mainly for spot checks or problem
investigation.  The MSN imbroglio last week exemplified this; few pay any
attention to whether documents are valid unless the issue is raised and
fingers are pointed, and THEN they crank out the validation tools.  Another
favorite analogy is between validators and coroporate lawyers -- you have
the lawyers look over the contracts and implementing documents when a
relationship is first started, and to audit documents randomly or when
there's trouble.  You don't generally pay somebody $500/hr to look over
every purchase order or business letter that goes through your organization.