[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: SAX2 ... missing features?
- From: Rob Lugt <roblugt@elcel.com>
- To: David Brownell <david-b@pacbell.net>, xml-dev@lists.xml.org
- Date: Tue, 17 Jul 2001 10:59:02 +0100
David Brownell wrote
> > * DTDInputSource
> > An application can set this property to provide a DTD. This will
override
> > the document's DTD (if it exists) but, more importantly, will create one
if
> > it doesn't.
> >
> > Currently applications can use an EntityResolver to achieve this, but
only
> > when the input document references an external DTD. This will allow
> > applications to inject a DTD regardless.
>
> This one is interesting because it clearly can't be layered: swapping
DTDs
> means changing entity and attribute declarations, which affect the view of
> content produced by parsing the body.
>
> Though I don't think "InputSource" is the right model, since it doesn't
> support use of internal subsets ... used for many parameterized or
> modularized DTDs. One really wants the three components of a
> DTD: root name decl, "external subset" system ID (and maybe public
> ID), and internal subset.
The interface you suggest is the one you employed for your ValidatorConsumer
[1] which you mentioned in a post yesterday.
Like all things, the most appropriate interface depends on the context in
which it will be used. But on balance I think the InputSource approach is
most flexible. My reasons for this are:-
- Specifying the root name decl can be problematic when validating multiple
documents of different types. Our XML Validator [2] enables the user to
specify a DTD URL on the command line as well as a list of files to
validate. The xml files may contain different root elements, yet they could
all be valid with reference to the supplied DTD. Just like our XML
Validator, the application may not know the exact type of document that is
being read in advance, so it may be impossible to specify the required root
element name. I can't really see any sense at all in the root element
validity constraint, but that's a different matter.
- For packaging reasons, applications may want to keep a private, in-memory
copy of an entire DTD. The InputSource approach allows this to be passed to
the xml parser as a StringReader. Your approach also allows this, but in a
more restricted way. You would have to either
(a) make use of the Internal Subset string, but this is subject to the
constraints which apply to the internal subset, e.g. PEs not allowed within
markup.
(b) use a combination of systemId/publicID that will be recognised by your
EntityResolver.
So (a) has unnecessary constraints and (b) requires the logic to be
distributed accross multiple methods/classes.
- The systemId/publicId ultimately resolve to an InputSource anyway. By
providing the InputSource directly, the application is short-cicuiting the
EntityResolver. I believe this is a reasonable thing to do, but I'm open to
arguments as to why this may be inappropriate.
Regards
~Rob
--
Rob Lugt
ElCel Technology
http://www.elcel.com/
[1]
http://www.gnu.org/software/classpathx/jaxp/apidoc/gnu/xml/pipeline/Validati
onConsumer.html
[2] http://www.elcel.com/products/xmlvalid.html