[
Lists Home |
Date Index |
Thread Index
]
From: "Mike Champion" <mc@xegesis.org>
> 1/13/2002 8:10:02 PM, Paul T <pault12@pacbell.net> wrote:
> >Sure. And it is kinda more convinient to use CSV, because
> >CSV-based world has developed a sophisticated, convinient,
> >universal binding mechanizm, called "regular expressions".
>
> Hmmm, that's an interesting way to put it. What would an XML
> universal binding mechanism look like .... a clean
> integration of XPath and DOM (and maybe RELAX-NG)? A RELAX-
> NG data binding tool
> http://www.asahi-net.or.jp/~dp8t-asm/java/tools/Relaxer/ ?
0. I belive that regular expressions, as we know them, are
also not the best possible binding, because I belive that
they should not be greedy. I think they are greedy
because of historical reasons and maybe because of
'match/split' processing pattern ( awk ). I think that the
uiniverse of non-greedy regular expressions is not yet
explored.
1. As to Relaxer: Code generators are always a pain to maintain
and to debug. It has taken a long time to get, for example,
yacc / lex to output 'everything' ( nopt only C) - and when that
happened - both yacc and lex became obsolete (JavaCC).
Just look at JavaCC. It is sooo convinient ... but it generates
Java only and there is no way I can use JavaCC for, say,
Perl.
I hate to say it, but it could be that the simplest
language-neutral binding mechanizm would be :
"take a subset of XML and everything would immediately
become simple, otherwise you may keep trying to find
that black cat in that dark room for ages".
> Both Sun and Microsoft seem to be working hard to make it
> easier to use XML from ordinary programming languages; are
> either/both at least moving in the direction you want to see?
They both are moving into usual direction : to lock-in
as many developers, as they can, into the platforms,
that they sell. That would be insane for a big software
company to provide something that could be easily
used outside their core platform. However, that's minor.
The most important thing could be that ... no
good language-neutral binding is even *possible*
for XML v 1.0.
Chunks binding is convininent, *only* because it does not
support a complex mixed content cases + there is some
'intuitive' whitespace stripping rule. Place the complex
mixed content and complex whitespace processing back
into the model - and it could be that the *only*
possible model for XML *is* DOM. Which is a hell to
process without the XPath and even with XPath it is still
hard.
It has taken a very long time for SML-dev to figure out
what could be possibly simplified in the XML model and
to me the answer is "... we still don't know, but maybe it
is possible ;-)"
It is interesting to look at YAML ( even I don't like it ;-)
YAML is the 'markup' language, that has been designed
*other way around*.
They *first* looked at language-neutral data structures and
*then* they designed a 'markup'. Of course YAML allows
an easy binding!
If XML would have been designed
'to serialize Fortran Arrays' that would make XML
binding obvious and straightforward.
However, I think that there could be some problems,
if trying to ask IBM lawyers to map their documents
into Fortran arrays ;-)
<aside>
Also, I think that YAML is too complex,
because I think they thought that if they publish only
a really simple part of it, nobody would take it seriosly ;-)
</aside>
<rant>
I hate to say it, but I think that all that markup stuff
is actually about placing '\' and ',' symbols on steroids in
one way or another. Why can't people agree that
any 'markup' language is :
0. Everything is (unicode) text.
1. Text can have 'groups' , separated by 'separators'
( the less, the better, but hard to tell in advance ;-)
2. There should be some way to escape separators
( \ works just fine, from my point of view ;-)
Isn't it all we need to know about the 'markup language' ;-)?
Why restrict ourselvs to 'one markup language, that fits all' ?
It is like in early days, before YACC and similiar tools, people
were trying to invent 'the best possible programming language,
good for everything'. I think it is now obvious that such a
programming language does not exist. Why there should be
a 'markup language that is good for everything' is a question
I have no answer to. ;-)
</rant>
> >Regular expressions are not blessed by W3C, sure.
>
> But they are at the heart of RELAX-NG ... I hope we can
> distinguish "XML" from "the set of all specs that the W3C has
> put out dealing with XML" or "the picture of XML promulgated
> by the most visionary W3C working groups."
I think it's even worse, than that. There is also:
"Maybe XML is not a convinient thing to process,
but because the XML text looks nice, let us migrate
all the binary formats into some 'industry-blessed'
vocabularies and then 'industry-blessed' APIs
would deal with the mess. Still that would be
better than binary / CORBA driven solution".
That would be also the right thing to do!
Just some better unification of industry-specific
legacy data formats would be better than to
keep building on a formats, designed 30+
years ago by hardcore COBOL coders, who
cared about saving bytes, because storage
was important.
Rgds.Paul.
|