OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Parser Behaviour (serious)

[ Lists Home | Date Index | Thread Index ]
  • From: Peter Murray-Rust <peter@ursus.demon.co.uk>
  • To: <xml-dev@xml.org>
  • Date: Thu, 06 Apr 2000 15:52:51 +0100

At 11:51 PM 4/5/00 -0400, Frank Boumphrey wrote:
>I think that this approach as advocated by Peter is just what we need.>
>It would be nice if one of the 'standards body' could come up with a
>recommendation,  It need not be heavy-weight, just official. IMO, however,
>this body will not be W3C as they are too wedded to Schemas, and in my
>(personal) opinion will not do any thing to extend the life or usefulness of
>I wonder if this is something that OASIS might take on. Alternatively an
>ad-hoc WG could be put together from this list.
>----- Original Message -----
>From: THOMAS PASSIN <tpassin@idsonline.com>
>> I completely agree with Peter M-R on this.  If you need an important
>> such as entity resolution, your document should be able to say so.  This
>> NOT the same issue as being able to turn a feature on and off, something
>> which has not been settled in this list either.
>> Peter's suggested approach here is elegant and simple.  As for Tim B's
>> statement about there being too many opinions about too many combinations,
>> well, David Megginson has written that SAX only handles 80% of the xml-ish
>> things you might like to do.  Let's strive towards agreement on 80% of the
>> key possibilities here, and we will accomplish something really
>> Of course, even if we get a system such as Peter has asked for, older
>> parsers will ignore any such directives.  So Peter's pharmaceutical
>> could still get ignored by any given processor.  But still, at least you
>> could tell users that they could use a range of processors instead of just
>> one.  That would be good.
>> Regards,
>> Tom Passin

To summarise so far...

	It seems implicitly agreed that we have a problem here - it is not a new
one, but it hasn't gone away either! [No-one has posted to say that there
is a standard way of achieving i14y using XML1.0 specs and tools]. For some
people the problem is not seen as  high priority (perhaps because they have
a custom set of application software that can be tuned to the types of
document the author produces; or because document fidelity is not
critical). For others it is. For this subgroup we need agreement on how
parsers should treat our documents.

	Some posters have suggested that we are attempting to develop a
"standard". I don't think so - it may simply be that a common, agreed way
of treating documents is all that is required.

	We should attempt to do everything using standard XML1.0 (i.e. no subsets
- SML, etc.). We must assume that authors wish to use every feature in
XML1.0 and not voluntary restrict the constructs. In particular we have
identified external entities, PUBIDs and SYSIDs and their implications
(default attribute values, entity expansion, severity of parser errors).

	It seems possible that SAX2 may have some useful tools, but we need to
agree what we want to do anyway. [It seems unlikely that XML Schemas will
solve the problem - it may be that if we *don't* solve the problem is will
remain as a serious one for schema implementation].
	It seems that we can approach this by:

		- cataloguing the possible behaviours allowed under XML1.0. 
		- identifying ambiguities in interpretation
		- describing these aspects with clear labels
		- persuading the major parser implementers to identify how their parser
currently behaves
		- asking them whether they can provide clear and simple switches between
these behaviours

*if* a sufficiency of parser writers cooperate in this (or even if third
parties accurately describe parser behaviours) it may be that we reach a
consensus approach without strife. If, for example, I knew that parser A
had 5 behaviours, B had 3 and that behaviour Z was common to them (and to
many other parsers) I could reasonably construct my documents to be
processed under behaviour Z. The community would still be free to use other
parser behaviours, but I could inform my readers what behaviour I required
in the knowledge that they had reasonable means of implementing it.

In the absence of this we shall fragment. It is possible that
MajorGPLParser might become so prevalent (in the way that sgmls did) that
all OpenSource efforts were normalised by the use of this and a certain de
facto i14y develop. And that MajorCommercialParser might do something
different. The great success of SAX is that everyone has adopted it without
a major management effort - it was self-evidently a useful thing to do.

This should be easier than SAX. It needs a leader. The leader collates
input from the list (as DavidM did), summarises and waits for responses to
the summary. It needs constant drive, patience and tact, especially in the
face of apathy - most list readers are "lurkers" - probably 90%+. But I
think we have the signals that it's worth doing. It should be a mainstream
activity on XML-DEV - not a separate group. 

I do not want to prejudice the outcome but it could be quite simple. We
might conclude there were 5 separate ways a parser could behave (say A1 A2
B1 B2 C1) and document each of these carefully. Parser author M might label
their parser as having A1 A2 and C1. They might indicate that the receiver
could choose between A1 and A2 but everything else was determined by the
document. The group might suggest that the author use a PI to indicate the
category the document was intended to be in. The parser author would
hopefully honour this PI and respond to the author's request in an
appropriate manner.

Or, of course, we might have to build it into SAX2 before it's too late...

But it needs to be done. Because otherwise most users won't know the
problem exists and it will be buried very deep in many applications.


This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS