Lists Home |
Date Index |
K. Ari Krupnikov writes:
> How much of a "violation" would it be to have a caching XMLFilter that
> would report all contiguous character data in a single event,
> including across entity boundaries?
It would not be a violation, since readers are not required to provide
a locator at all:
SAX parsers are strongly encouraged (though not absolutely required)
to supply a locator: if it does so, it must supply the locator to
the application by invoking this method before invoking any of the
other methods in the ContentHandler interface.
Application-specific error reporting would be pretty sucky, but that
might not matter in many cases.
If you did this, though, I'd suggest still putting in a hard-coded
limit. In fact, as XML gets used in more security-sensitive
environments, we may need to consider putting (very high) limits on
everything to avoid various attacks.
SGML gave limits a bad name because they were so ridiculously low by
default (eight-character names spring to mind), and SGML declarations
were a nightmare to manage in any real-world processing and
interchange situation. On the other hand, high fixed limits, like
(say) 16K characters for element and attribute names, might help us
avoid some problems in the future.
All the best,
David Megginson, firstname.lastname@example.org, http://www.megginson.com/