[
Lists Home |
Date Index |
Thread Index
]
> On Sun, Jan 04, 2004 at 10:32:25AM +0400, David Tolpin wrote:
> > I am proposing to validate it against a very small schema.
> [...]
> > In the worst case, that is, with the current implementation, it takes 1 second
> > of processor time to determine the type of a 5 megabytes XML file that fails
> > to match three RNG pattern productions.
>
> And if I give you a 20 gigabyte stream?
> *not* parsing is always faster than parsing...
>
If you give me a 20 gigabyte stream and ask to determine its contents based on a
pattern that requires parsing of the whole stream, then it will take as much time
as required to parse it.
However, you will either not want to determine the stream's type according to something
that is half the way to the end of the stream or will not want to use the automatic
mechanism at all. As I mentioned, for all cases listed in the original proposal,
it is easy not to parse the document beyond the point where one has to parse it.
'any' pattern is easily recognized and can be used to stop parsing.
If I am asked for the document element's name or namespace, or a list of names, then
the current program can be enhanced to parse just the document element. I will do it.
It is trivial. I just can't choose between parsing one input buffer (1024 bytes) of the stream
and analysing pattern's contents, or adding a predefined pattern to the configuration grammar.
David Tolpin
|