OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: Parser Behaviour (serious)

[ Lists Home | Date Index | Thread Index ]
  • From: Peter Murray-Rust <peter@ursus.demon.co.uk>
  • To: <xml-dev@xml.org>
  • Date: Wed, 05 Apr 2000 08:31:31 +0100

At 09:16 AM 4/3/00 -0700, Tim Bray wrote:
>At 08:48 AM 4/3/00 +0100, Peter Murray-Rust wrote:
>>So my summary is:
>>	- The experts (on this list) cannot agree precisely what a parser 
>>and should'nt do with various combinations. these include:
>It is not clear that rules could ever be written.

If this is true it spells the death of interoperability for information
components such as MathML, SVG and CML (chemical markup language).

What Tim is saying is that XML was designed to allow flexibility in the
"application" - the software that processes the XML document. The receiver
may wish to do this or that with it.
I am sure this is perfectly reasonable for a large class of documents. It
is clear that e-business is going to build islands of local operability -
i.e. the same document will behave differently between in different
environments (and may not even be processable in some).

However I and many others are primarily concerned with the document itself
and not its processing. The SVG example highlights this. SVG has the
opportunity to revolutionise 2-D graphics ***if it is seamless at either
end***. Alice sends Bob an SVG document and she has no knowledge of what
software Bob is going to use ***and couldn't and shouldn't care***. She is
sending him a graphics file whose *effective content* must be identical
whatever the applications used. Otherwise SVG is already dead and IMO XML
as well.

In chemistry precision matters. Chemistry is used by a wide range of
disciplines, many of whom have never used XML (yet, though henry and I are
working on it). It is no good telling them:

	" make sure your parsers has access to the PUBID in a catalog so the
external DTD is processed and the external entities are expanded". 

They will reach for their FORTRAN compiler.

It matters very much. I and Henry are persuading may people to use CML as
*part* of their content. Examples are publishers, patents, safety, health,
materials, pharmaceutical, etc. etc. It is *absolutely essential* that
everyone interprets the chemistry in the same way. Otherwise people could die.

"This product should be used for external use only.


Always read the label:

This is not made up - I demonstrate this sort of thing when I promote XML.
I (naturally) expect the external parsed entities to be included (or at
least some serious error telling me the parser was lazy).

There must be a relatively small finite number of combinations of parser
behaviour. What I suggest we tackle is:
	- an *exhaustive examination* of all parser behaviours consistent with the
XML1.0 spec
	- a clear tabulation and labelling of these
	- the requirement that a parser announce which of these behaviours it
	- the ability to select this behaviour
	- a means for the author of a document to communicate which of these
behaviours she expects the receiver to use in their parser.

I cannot believe this is impossible. If we don't tackle it MathML, SVG and
CML are in the category of "this document can only be read with software X
on browser Y on platform Z". Sounds familiar?

BTW I use xp, Xerces and AElfred. I am finding great difficulty in finding
command-line switches or other devices to influence their behaviour here.
Could someone list for me how to  change their behaviour on PUBIDs and
<!DOCTYPE foo PUBLIC "-//some possibly resolvable FPI"
Is it the same problem in
<!DOCTYPE foo SYSTEM "http://some.internet.place">

Friends, this is too serious to ignore. We decided after a years' debate
that we had to have an API for parsers (SAX). I think we know deep down we
have to something here. It may be a document, it may be a piece of
software. The response to SAX was so exciting and encouraging - everyone
took it on board. I am absolutely sure that all parser writers want their
parsers to be interoperable. The following expresses my desire:

<!DOCTYPE molecule SYSTEM "http://www.xml-cml.org/DTD/v10">
<?xdev externalEntityExpansion="mandatory" DTDResolution="optional"?>
<molecule name="raniditine bismuth citrate">

<!-- I am not advertising - I happen to have worked on ranitidine -->


This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS