[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] A single, all-encompassing data validation language - good or bad for the marketplace?
- From: "bryan rasmussen" <rasmussen.bryan@gmail.com>
- To: "Costello, Roger L." <costello@mitre.org>
- Date: Wed, 8 Aug 2007 11:10:16 +0200
Well, this started up just as I was going on vacation but:
>
> Up till this date, grammar-based and rule-based languages have been
> kept separate:
>
> Grammar-based Languages: XML Schema, Relax NG, DTD
>
> Rule-based Languages: Schematron, RuleML
James Clark one time said something about wanting to make sure that
your markup language only addresses one need (can't find or remember
the relevant wording)
This saying is however problematic. Because it needs some sort of
constraints as to how we decide if a language addresses only one need
(requirement, problem domain, whatever you want to call it)
Using the concept of validation as an example I think we can see what
this constraint should be and it is simply that we can talk about
meaningfully from a computational aspect.
I cannot discuss validation meaningfully from a computational aspect
and I don't think the many people of greater capabilities than mine on
this list can either.
what can be discussed meaningfully is stuff like graphs, grammars,
contexts etc.
So I would say from this premise, don't have a general purpose
validation language have general purpose grammar languages, data
typing languages, and so forth.
> What do you think about XML Schema working group incorporating
> rule-based capabilities into the language?
Its not as powerful as Schematron. So if I am starting a project that
needs a grammar I will use the grammar capabilities in the grammar
language of choice. When I need to use context specific checking I
will switch to schematron instead of using the context checking
capabilities of my grammar language. Why? Because
1. Schematron is simple.
2. I can't be sure that all of the context checking I need to do over
the course of the project will be met by the capabilities of the
grammar language.
3. I don't needlessly complicate the process of writing either
validation language, and I can be more sure of having it work well
across all platforms if things are kept relatively simple.
> Here are some potential advantages and disadvantages:
>
> ADVANTAGES
>
> 1. Need only one language to express all data validation requirements.
My point earlier. My data validation requirement might include that
x > 500
value of x is divisible by value of y accessible over http
value of x is in the allowedXvalues column of the big X database.
that the encoding of the document is utf-8.(really, I have seen this
requirement)
Data validation is an open ended thing that cannot be described
meaningfully other than to say we know it when we see it therefore one
should not try to make a domain specific language to handle what is
evidently a general programming language problem.
> 2. Possible performance improvement (as compared to separate languages
> with separate validations).
Likely performance degradation if you have requirements for multiple
types of validation - why?
1. The simpler the model of what something does the easier to
optimize, in my experience.
2. In relation to the above certain types of validation may be more
amenable to streaming.
3. My experience of real world (data) validation implies that a
pipeline is the best model for performance optimization because I am
always running into situations where certain conditions must be met in
the incoming document that if they are not met can mean they will be
thrown away.Example, in the Danish UBL routing is done via an EAN
locations number, if this number is not present or is not a correct
EAN then the document cannot be routed and should be returned.
Obviously it makes no sense to validate the whole document if you have
a constrain that means you won't need to. This relates to streaming
requirement above.
>
> 2. Is grammar validation of a fundamentally different nature than rule
> validation?
Perhaps, if we say that what one has is a grammar based language then
there are certain things that gives you. These are
1. the ability to easily implement in editors.
2. most grammar based languages make reading the grammar and
understanding what is allowed relatively easily.
I remember Rick Jelliffe suggesting that it would be easy enough to
implement a schematron based editor but I think it would be pretty
difficult to place the contexts of where one should edit or do auto
completion very well. Sure, finding the errors and reporting them is
easy but the more helpful aspects of modern editors?
>
> 3. If so, is it reasonable to merge two fundamentally different things?
> 4. Is it in the best interest of the marketplace to have a single,
> all-encompassing data validation language, or is it better to have
> multiple data validation languages that work together?
for me, multiple data validation languages. Especially as the word
validation seems to be a very changeable one.
Cheers,
Bryan Rasmussen
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]