Lists Home |
Date Index |
David Megginson <firstname.lastname@example.org> writes:
> K. Ari Krupnikov writes:
> > How much of a "violation" would it be to have a caching XMLFilter that
> > would report all contiguous character data in a single event,
> > including across entity boundaries?
> It would not be a violation, since readers are not required to provide
> a locator at all:
> Application-specific error reporting would be pretty sucky, but that
> might not matter in many cases.
You won't have startEntity events (which are not part of content
handler anyway), but you can still report the correct SystemId and
PubliId through the locator or SAXParseException. Or did you mean
errors that an application wants to report /after/ parsing is done?
> If you did this, though, I'd suggest still putting in a hard-coded
> limit. In fact, as XML gets used in more security-sensitive
> environments, we may need to consider putting (very high) limits on
> everything to avoid various attacks.
Would you report it as a (perhaps recoverable) error? Braking
character data into multiple events would defy the purpose of this
filter (to relieve content handlers from the need to do that
themselves) and do nothing to solve the security issue.
In general, I'm not convinced that it is the job of a generic parser
(or a filter for that matter) to enforce what is essentially
process-accounting policy. If you want to enforce memory or CPU usage
limits, isn't it better do it on the OS process level or if your VM
supports it, in the VM?
> On the other hand, high fixed limits, like (say) 16K characters for
> element and attribute names, might help us avoid some problems in
> the future.
This sounds like a reasonable proposition to me. But would you also
impose a limit on character data? Entities? In the gigabytes perhaps?