[
Lists Home |
Date Index |
Thread Index
]
Daniel Veillard <daniel@veillard.com> writes:
> On Sun, Oct 10, 2004 at 10:57:32PM +0100, Henry S. Thompson wrote:
>> Daniel Veillard <daniel@veillard.com> writes:
>>
>> > libxml2 regexps used counters since day 1 for min/maxoccurs
>> > implementation. The explosion didn't look a supportable
>> > alternative to me as it opens the door to trivial DoS attacks or
>> > forces to break the schemas validation which is also a big
>> > problem if you consider schemas as a contract between two
>> > communicating parties.
>>
>> I would be very interested, as I'm sure would others, in a description
>> of your algorithm.
>
> Well nothing fancy really. You need to add state to the regexp, in
> that case the state is the number of time you went through the
> transition labelled by the element name:namespace pair. Due to the
> single particle rule, it means you have no need to be able to
> rollback the state to enter a different transition labelled by the
> same name:namespace pair. Which means the effective state required
> is purely linear based on the number of counted transition you have
> in your regexp: one integer counter is sufficient per such
> transition in the regexp runtime structure. It's actually more
> automata with state that I'm using than pure regexps. And the
> generated constructs for something like x{a,b} involves 3 states
> (IIRC) a couple of epsilon transitions and the counted regexp, but
> it's not really rocket sience either, very similar to what I was
> taught in the classroom.
Right, that works fine for exponents on individual elements, but I
don't see how it works for groups.
Here's a real example from a published schema document [1]:
<xsd:sequence minOccurs="0" maxOccurs="1000">
<xsd:element ref="ReferenceIdentification" minOccurs="0"/>
<xsd:element ref="Message" minOccurs="0" maxOccurs="1000"/>
</xsd:sequence>
Or consider a (constructed) case that's tricky in a different way:
<xsd:sequence minOccurs="2" maxOccurs="2">
<xsd:element ref="a" minOccurs="1" maxOccurs="2"/>
<xsd:element ref="b" minOccurs="0"/>
</xsd:sequence>
which allows _inter alia_
<a/><a/>
<a/><a/><a/>
<a/><a/><a/><a/>
ht
[1] http://www.idealliance.org/spacexml/files/spacexmlv100.zip
--
Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
Half-time member of W3C Team
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
|