[
Lists Home |
Date Index |
Thread Index
]
- From: Peter Murray-Rust <peter@ursus.demon.co.uk>
- To: xml-dev@ic.ac.uk
- Date: Thu, 11 Dec 1997 14:37:39
At 06:41 11/12/97 -0500, David Megginson wrote:
>Peter Murray-Rust writes:
>
> > As a corollary: Is anyone testing the ESIS output of the current crop of
> > XML parsers (4 Java + nsgmls, I think)? Regardless of the whitespace model
> > or the value of xml:space they should all produce identical ESIS (right?)
> > If not, then one or more is wrong. And all applications should (IMO) be
> > prepared to work with ESIS which I think is isomorphous with a WF XML
> > document.
>
>There are quite a few more XML parsers out there, including at least
>one in TCL -- see
>
> http://www.sil.org/sgml/XML.html#xmlSoftware
Apologies to anyone I missed. I am a great fan of tcl and wrote costwish in
it to sit on top of Joe English's CoST...
>
>As for ESIS, there are some problems that we'd have to overcome first:
Are there? How does a WF document differ from the corresponding ESIS
stream? IOW if I do the transformation:
WF -> ESIS -> WF shouldn't I be able to recover the original?
>
>1) How should empty elements be represented? Right now, Ælfred generates a
> startElement event immediately followed by an endElement event.
Yes - and JUMBO is happy with that. As far as JUMBO os concerned
<FOO></FOO> and <FOO/> are processed in the same way and I will need a very
clear argument to convince me that it should do different.
>
>2) How should the XML declaration be represented? Should it appear as
> a processing instruction, or should it be ignored?
JUMBO regards it as a PI. I hang all PIs off the preceding ELEMENT (not
PCDATA). In that way the tree can be processed with these intact. JUMBO
understands namespace PIs, <?JUMBO ...?> PIs and will also store the
others. It's useful to store them in case one wants to compare trees. BTW -
although it is nowhere stated most people seem to create PIs as name-value
pairs and JUMBO expects this.
>
>3) How should space in element content be handled? According to the
> spec, a DTD-aware parser should handle whitespace in element
> content differently from whitespace in mixed content (Ælfred just
> ignores whitespace in element content right now).
This is a critical area for the parser writers to agree on. I assume that
for the DTD-aware stuff there has to be a validating parser (i.e. one that
matches contentspec against element content). I am not sure what algorithms
are being used - JUMBO wants a java one for its birthday, please - but I
can imagine that with certain contentspecs they might get different answers.
>
>4) DTD-aware and non-DTD-aware parsers will handle whitespace in
> attribute values differently. Non-DTD-aware parsers will treat all
> attributes as CDATA, but DTD-aware parsers will treat tokenised
> attributes specially, by stripping all leading an trailing
> whitespace, and normalising internal whitespace to single spaces.
In this case presumably only the TYPE in the ATTLIST is needed.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|