xml-dev - RE: [xml-dev] RELAX NG Marketing (was RE: [xml-dev] Do Names Matter?)

RE: [xml-dev] RELAX NG Marketing (was RE: [xml-dev] Do Names Matter?)

[ Lists Home | Date Index | Thread Index ]

To: 'Matthew Gertner' <matthew.gertner@schemantix.com>, 'James Clark' <jjc@jclark.com>
Subject: RE: [xml-dev] RELAX NG Marketing (was RE: [xml-dev] Do Names Matter?)
From: Nicolas LEHUEN <nicolas.lehuen@ubicco.com>
Date: Wed, 27 Mar 2002 12:29:33 +0100
Cc: "'xml-dev@lists.xml.org'" <xml-dev@lists.xml.org>

If you switch off all validation inside Xerces-J and feed the SAX2 events to
Jing or MSV, then yes, you can validate an instance against a RELAX NG
schema. But in this case, Xerces-J is just a parser, and nothing else.

If, for an unknown reason, you would like the parser to "do" the validation,
then you'll have to cope with monolothic parser architectures. You'll end up
with multi-megabytes parser libraries (which is nonsense ; I've seen pure
SAX2 parser fit in JAR smaller than 100 kb), and you'll have a strong
dependency of your application with regards to the parser. Anyway, you could
still implement RELAX NG support within Xerces-J but using Xerces's XNI API.

Like I wrote before, I don't think that trying to stuff as many features as
possible under the hood of the parser is a wise thing. I think we should
have a lean and mean parser API (SAX2) and lean and mean parsers (less than
100 kb or JAR). Then, separated from the parser, you would have a lean and
mean XML tree API (DOM2 for compatibility, or dom4j for functionality), a
lean and mean XPath API (Jaxen), a lean and mean validation API (Sun MSV), a
lean and mean pipe building API, and so on and so forth.

The biggest difficulty of such an modular architecture is to define and
refine the APIs. XML has an advantage here : there is a common background, a
common data model, which is the XML data tree, or the infoset if you prefer.
Trees can go from one module to another either in serial form (SAX2 events)
or tree form (DOM2 nodes). This should ease the design of APIs.

Once the API set is stable enough, each implementor of each technology can
tackle its own implementation and optimization problems, without risking to
introduce bugs in other parts, and without stepping on each other's feets.

Look at all overlapping projects in the open (or closed) source XML world.
It's not that it's bad to have many projects competing on parsing or
validating XML documents. It's a good thing to see parser implementors
compete together, validator implementors compete together.

What I find stupid is that when parser implementors try to compete with
validator implementors. The result is that the attention span of
implementors is skipping from a subject to another (which is quite
different, in fact), and that their project is suffering from that, because
what you need is sometimes only the validating part, and not the parsing
part, or vice versa.

I'm sorry but I don't use Xerces-J, because it is bloated as a parser (I
don't care about HTML or WML DOMs, even if I produce both type of content),
and limited as a validator (limited to DTDs and XML Schema). There is
another nasty effect that you pointed out, Matthew : people expect the
parser to perform the validation, and as Xerces only performs XML Schema
validation, then a lot of people think XML Schema is the only schema
language for which an implementation exists. Not good for RELAX NG...

Like I wrote before, I'd rather have a lean and mean parser on one side,
built by people fully dedicated to the parsing problems, and a lean and mean
validator on the other side, built by people fully dedicated to the
validation problems. And I don't want any dependency between the two parts,
except for a standard API. That's why today I would use a small SAX2 parser
and Sun MSV.

I suspect there are a lot of people that would be interested by such an
approach, too. If I wanted to build a project for, say, XML data storage,
then my specialty would be XML data storage. I wouldn't like to have to cope
with parsing and validation problems, so I would appreciate to be able to
reuse parsers and validators that exist out there.

Plus, for many reasons, I wouldn't like to force my users to use a given
parser or validator, nor would I like to have to bundle it with my library.
Reasons include technical issues, like the ability to switch to a validator
that can handle any given schema language that my users may want me to
support, but also licensing issues.

I could develop parsing and validation by myself, to be sure that I wouldn't
depend on any other project, but I would soon fail, because my expertise is
in XML data storage, not parsing or validation, so I would not be able to
keep up with the evolution in validation and provide my users with the
support for the latest schema language. I'd rather spend time and money on
storage issues rather than any other issue.

That's the reason why I don't care about XML Schema support in Xerces-J. I
just want it to provide an efficient SAX2 parser and an efficient
implementation of various DOM specifications. People building Xerces are
very clever people ; if they want to implement schema support, that's really
great, but please, not under the parser hood. Please, implement it in
another project. And let the competition with Sun MSV begin.

Regards,
Nicolas Lehuen

>-----Message d'origine-----
>De : Matthew Gertner [mailto:matthew.gertner@schemantix.com]
>Envoy? : mercredi 27 mars 2002 11:09
>? : 'James Clark'
>Cc : xml-dev@lists.xml.org
>Objet : RE: [xml-dev] RELAX NG Marketing (was RE: [xml-dev] Do 
>Names Matt er?)
>
>
>> RELAX NG does not require anything beyond what is provided by 
>> SAX and DOM, 
>> and can be cleanly layered on top of the parser.  Xerces (or rather 
>> Xerces-J, which I assume is what you mean) supports SAX 2, 
>> and so works 
>> just fine with both Jing and MSV. There would be no advantage 
>> in having 
>> Xerces-specific RELAX NG support.
>
>Does this means that Xerces-J (yes, I meant the Java version) 
>can validate
>an instance against a RELAX NG schema? I can't understand how 
>this would be
>possible with just support for SAX and DOM; wouldn't the parser have to
>understand RELAX NG semantics?
>
>Matt
> 
>
>-----------------------------------------------------------------
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://lists.xml.org/ob/adm.pl>
>

Follow-Ups:
- Re: [xml-dev] RELAX NG Marketing (was RE: [xml-dev] Do Names Matt er?)
  - From: Daniel Veillard <veillard@redhat.com>

Prev by Date: RE: [xml-dev] RELAX NG Marketing (was RE: [xml-dev] Do Names Matter?)
Next by Date: Capitalism and XML (was RELAX NG Marketing)
Previous by thread: Re: [xml-dev] RELAX NG Marketing (was RE: [xml-dev] Do Names Matter?)
Next by thread: Re: [xml-dev] RELAX NG Marketing (was RE: [xml-dev] Do Names Matt er?)
Index(es):
- Date
- Thread