OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] New release (2.8) of XSV

[ Lists Home | Date Index | Thread Index ]

Daniel Veillard <daniel@veillard.com> writes:

> On Sun, Oct 10, 2004 at 10:57:32PM +0100, Henry S. Thompson wrote:
>> Daniel Veillard <daniel@veillard.com> writes:
>> 
>> > libxml2 regexps used counters since day 1 for min/maxoccurs
>> > implementation.  The explosion didn't look a supportable
>> > alternative to me as it opens the door to trivial DoS attacks or
>> > forces to break the schemas validation which is also a big
>> > problem if you consider schemas as a contract between two
>> > communicating parties.
>> 
>> I would be very interested, as I'm sure would others, in a description
>> of your algorithm.
>
> Well nothing fancy really. You need to add state to the regexp, in
> that case the state is the number of time you went through the
> transition labelled by the element name:namespace pair. Due to the
> single particle rule, it means you have no need to be able to
> rollback the state to enter a different transition labelled by the
> same name:namespace pair.  Which means the effective state required
> is purely linear based on the number of counted transition you have
> in your regexp: one integer counter is sufficient per such
> transition in the regexp runtime structure. It's actually more
> automata with state that I'm using than pure regexps.  And the
> generated constructs for something like x{a,b} involves 3 states
> (IIRC) a couple of epsilon transitions and the counted regexp, but
> it's not really rocket sience either, very similar to what I was
> taught in the classroom.

Right, that works fine for exponents on individual elements, but I
don't see how it works for groups.

Here's a real example from a published schema document [1]:

<xsd:sequence minOccurs="0" maxOccurs="1000">
    <xsd:element ref="ReferenceIdentification" minOccurs="0"/>
    <xsd:element ref="Message" minOccurs="0" maxOccurs="1000"/>
</xsd:sequence>

Or consider a (constructed) case that's tricky in a different way:

<xsd:sequence minOccurs="2" maxOccurs="2">
    <xsd:element ref="a" minOccurs="1" maxOccurs="2"/>
    <xsd:element ref="b" minOccurs="0"/>
</xsd:sequence>

which allows _inter alia_

 <a/><a/>
 <a/><a/><a/>
 <a/><a/><a/><a/>

ht
 
[1] http://www.idealliance.org/spacexml/files/spacexmlv100.zip
-- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS