OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: Subsetting/ Canonical Parsers/ XML Compliance/ etc.

[ Lists Home | Date Index | Thread Index ]
  • From: Michael Brennan <Michael_Brennan@Allegis.com>
  • To: xml-dev <xml-dev@lists.xml.org>
  • Date: Fri, 10 Nov 2000 15:49:06 -0800

As a developer, I've often been burned by wasting time investigating a tool that supposedly supports some particular standard or protocol, only to find after investing time and effort that the tool does not live up to expectations. I take this very seriously. I'm too busy to waste time sifting through the claims of dishonest vendors to determine if the tool really meets my needs. There are vendors that I routinely ignore; I don't bother to investigate their tools because I've been burned by them in the past. I wish more developers would do the same. If developers would boycott vendors that misrepresent their products, than we would have much more honest vendors and better interoperability in the world.
Does it make sense to write specialized parsers that only deal with a specific DTD/schema? Certainly it does. If I have a need to deal with SVG in a program, I am going to try to find an SVG parser before I search for an XML parser, because an SVG parser will probably give me much greater value. Likewise, if I need to parse RDF, I probably will try to identify an RDF parser before resorting to a generic XML parser. But if you are going to write an SVG parser, than call it a "SVG parser", not an XML parser! If you write an RDF parser, than call it an RDF parser, not an XML parser! For that matter, if you write a MinML parser, than call it a MinML parser and not an XML parser. If you do that, then you've got no argument from me. When you call it an XML parser, though, then we have an argument, and I will boycott your products.
If I obtain an "XML parser" from someone, only to find that it only supports a subset of XML or a specific document type, I will feel deceived and I will boycott that vendor's or developer's products. If I need something that only supports a specific document type or a specific subset of XML, than I will seek something that only supports that document type or a specific subset. If I obtain an "XML parser" it is because I am looking for an XML parser, and it better support XML! Otherwise, my time has been wasted because the developer or vendor has misrepresented their product, and I resent that.
I will also add that there is typically great value in implementing specialized parsers as a layer on top of more generalized parsers. The value is in not reinventing the wheel and in making sure that the subset of XML relevant to that specialized parser is implemented properly. I've been working with SOAP quite a bit lately. When I first started working with it and surveyed the toolkits available at the time, I found the existing toolkits to be in a very sorry state. Not only that, but implementors posting on the SOAP discussion list often raised "issues" and proposed "solutions" for those issues blissfully ignorant of the fact that these issues had already been solved by available XML technologies. One that kept surfacing, for instance, was various proposals for how to encode Unicode characters in a SOAP message (which became a particularly important issue since the "XML" implementations of the SOAP libraries did not deal with character encoding issues properly). I think I got a bit irate in some of my postings in response to this, but I was frustrated over people who had swept aside mature, proven XML technologies and decided to implement the "relevant subset" themselves -- and they kept doing it wrong! I found it more expedient to build my own SOAP implementation built atop generic XML technologies, than to waste time with more specialized SOAP libraries that had major deficiencies and interoperability issues because the implementors failed to leverage proven, mature, general XML technologies that were readily available. I think there's some lessons for folks to learn, there.
-----Original Message-----
From: Seairth Jacobs [mailto:seairth@bbglobex.com]
Sent: Friday, November 10, 2000 8:21 AM
To: xml-dev
Subject: Subsetting/ Canonical Parsers/ XML Compliance/ etc.

Okay, so here are the two basic camps:
1) XML is a standard that should be conformed to 100% in parsers.  If it isn't going to be, then is should be called an "XML Parser".  Certainly, this is an ideal goal if you do not know how the parser will be used.  As a result, it needs to stay generic and expect to handle all combinations of needs.
2) XML is a recommendation that should be implemented in parsers as is appropriate to the situation.  It is still an XML parser of sorts, however.
The second camp is the way I choose to see XML.  For any particular implementation of XML, I define a DTD or Schema that fits my needs.  In my case, I deal exclusively with e-commerce markup and I use only a specific subset of the XML specification.  So I have two choices:  use a parser specific to my needs or use a general-purpose parser that will work of anyone.  While the latter will work, it is overkill (just like using an SGML parser would be overkill when processing XML).  My needs are fixed.  I will never receive XML that doesn't conform to the subset I use.  As a result, a parser that handles only that subset makes more sense.  Is it an actual XML parser?  Yes, since it does process certain XML.  No, since it doesn't process all XML.  But remember, I am only ever using certain XML, so from my point-of-view, the answer is "yes".
Everyone can complain until they are blue in the face about which is better.  In the end, it's a moot point.  If 1 million people use a special-purpose XML parser for a specific purpose, then that is absolutely fine.  It doesn't matter what the rest of the universe is doing with XML because it's not within their problem domain.  If there is cross-over of domains, then those people will use a different parser that fits their needs.
In the end, I would expect any development process to go something like this:
1) Define XML DTD or Schema.
2) Choose parser that will handle the application of this XML.
3) Implement XML using chosen parser.
If it means that someone uses more than one parser to get different jobs done, that's fine.  If it means that someone uses the "fully standard" parser to get different jobs done, that's fine as well.  In the end, deciding what subset of parser should be used for development is every bit as important as deciding what subset of XML should be used for DTD or Schema definition.
Seairth Jacobs


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS