OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: typing (was RE: Personal reply)

Two points:

1) By separating DTD/schema-supplied pieces out of the Infoset you break all the instance documents that depended on those items being *included*.  Why would anyone use those DTD/schema features if they didn't intend this behavior?

2) One doesn't include arbitrary DTDs or schemas into production pipelines.  You have to evaluate each such document individually for compatibility with local processing requirements.  Anything else invites disaster.  Thus, for the time being, the pipe dream of dynamic integration of discoverable services is just that.

What I think is missing from the Infoset that can make things work properly is a wee bit 'o meta-data that tells any point in the pipline what has come before: External Entities Resolved=yes|no, CDATA Section Markers Removed=yes|no, etc.  

The number of optional features is a bit larger than we all would like, but it is manageable.  In the end, it will indeed boil down to, "if you don't like that feature, don't use it."  This is how it is currently done - with success - for XML and a great many other software systems.  How else can we all get along and interoperate?  I think anything else becomes unreasonable.  You push hard for unity until continued pushing will break the thing you are trying to keep together.

take it easy,
Charles Reitzel

P.S. use NOTATIONs for declaring JPGs as unparsed entities

At 11:11 AM 3/13/01 +0000, you wrote:
>At 03:23 PM 3/12/01 -0500, Simon St.Laurent wrote:
>>Telling me "don't use the feature if you don't like it" isn't a reasonable
>>answer to the kinds of problems we're addressing here.
>This is one of the most important questions we have to address.
>We are all conscientious software developers, we like to build
>things that work reliably. To do that, we have to be very careful with
>optional features in XML and related technologies. It is oh, so easy
>to say that software is "100 per cent XML compliant" but
>fiendishly difficult to live up to that promise in anything but
>marketing bumph.
>Pipeline processing is a good example of a technique where
>optional features of XML bite and bite hard. The work you need to
>do to do the right thing in the presence of validating XML 1.0
>parsers is orders of magnitude larger than if you just work
>with WF XML.  I am not talking about the parsing act
>itself - I am talking about the infoset that is yielded
>which needs to be nurtured through the processing.
>I may want to use DTDs (I often do!) but specifying a DTD
>opens a wasps nest in the infoset. All of a sudden - just
>to get content model validation in my pipeline, I need to worry about
>general entities, internal document type declaration subsets,
>include marked sections, entity resolution, public/system
>identifiers, defaulted attribute values etc. etc. I need to
>worry about these because I may well need to reflect
>their presense in the XML my processing produces.
>Compounding this is that fact that as my pipeline progresses,
>I am typically morphing document structures from one
>form to the next. Most of the time, as a pipeline is in
>progress, there is no content model in the XML 1.0 sense
>of the word. I can certainly use intermediate content model
>validation to great benefit but XML 1.0 actually gets in the
>way of doing so.
>Heres why. With SGML, I could keep all the content model stuff in
>separate entities from the document instances. I can
>feed them as separate things to an SGML parser.
>With XML today, the situation is at once better and a lot
>worse. I love the stuff that is going on with the
>alternative schema languages/validators and plan
>to use 'em all to varying degrees in pipeline processing.
>Unfortunately, DTD schemata have a privilidged place *in*
>the document instances. This creates no end of
>round-tripping-the-important-stuff problems. A solution
>to this is to be found in the abstract infoset paradigm.
>This is the road the grove men tool in the SGML days.
>A simpler solution is also to be found, I believe, by looking
>at the problem differently.
>What if schema stuff including DTDs is *always* outside
>the instance?
>Does this simplify the infoset issues? yes
>Does this allow a variety of schema approaches to be used on
>a mix and match basis during pipeline processing? - yes
>Does it allow the same instance to be viewed through
>the eyes of both local and global semantics via different
>schemata? yes
>Does it appeal to simpletons? yes
>The optionality of DTD validation, coupled with its explicit binding
>to document instances, coupled with its explosive effect
>on the complexity of infosets, is the nub of the problem
>in my opinion.
>I know analogies with SGML/HTML have been flogged to
>death but... HTML is an SGML application  - or so
>some would have it. Ever tried using general entities in
>HTML? How about DTD subsets? CDATA sections?
>Ever tried to declare your JPGs as unparsed entities?
>HTML parsers don't do any of that stuff. There is no
>point in putting them in your HTML even though
>SGML says you can. As a consequence of simply
>ignoring all these "optional" features, HTML parsers
>yield a simple infoset. Yes, I know that the absence
>of start and end-tags makes the DAG variable from
>one parser to the another but the core infoset is
>We want the DAG to be formally defined - not only
>for HTML but for all pointy bracket tag languages.
>WF XML takes care of that. But the continuing
>optionality of all the embedded DTD stuff makes
>the infoset surrounding the DAG complicated.
>A lot more complicated than I for one, feel happy
>with. XML in its mission statement set out to
>have as few optional features as possible - preferably
>zero. The mother of all optional features has
>unfortunately crept under the radar.
>What would it take (I am addressing this question to
>those with an intimate knowledge of the XML 1.0 spec.)
>to allow validating XML 1.0 parsers to be handed
>two URIs. One for the DTD and one for the instance.
>This I believe, would be a great first step towards
>separating the expression of data and model.
>It would also make DTD level validation a peer
>of other validation/mapping/transclusion
>technologies rather than an eminence.
>Sean (deprecate DOCTYPE) McGrath.
>The xml-dev list is sponsored by XML.org, an initiative of OASIS
>The list archives are at http://lists.xml.org/archives/xml-dev/
>To unsubscribe from this elist send a message with the single word
>"unsubscribe" in the body to: xml-dev-request@lists.xml.org 

take it easy,
Charles Reitzel