[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Transform XSD complexType into regular expressions
- From: Michael Kay <mike@saxonica.com>
- To: Roger L Costello <costello@mitre.org>
- Date: Thu, 12 May 2022 12:34:01 +0100
Testing whether a sequence of element names matches the content model of a complex type is fairly straightforward in principle. You take the particles in the content model and for each one you generate a term in the regular expression of the form `(Q\{uri\}local~){minOccurs,maxOccurs}`; for wildcards in the content model you generate a term in the regular expression that only matches the correct namespaces. Then you take the actual sequence of element names found in the instance document, form a ~-separated string of their EQNames, and match this against the regular expression.
You can of course abbreviate the names to something shorter provided you do it consistently; you could even abbreviate to single characters (Unicode has plenty available) and drop the separators.
But of course, you then find there are complications. If an element in the instance document matches a wildcard particle, then you need to know whether the wildcard particle specified strict or lax validation for its own content; the regex approach doesn't directly give you that information. Handling XSD 1.1 open content models gets tricky, etc. But I think the biggest obstacle is that you get very poor diagnostics for invalidities.
The approach depends on the "Element declarations consistent" constraint, which means that to validate a child element, you only need to know its name, you don't need to know which particle it matched -- unless it was a wildcard.
Michael Kay
Saxonica
> On 12 May 2022, at 11:29, Roger L Costello <costello@mitre.org> wrote:
>
> Michael Kay wrote:
> ---------------------------------------
> I've considered the approach of validating complex types by turning them into regular expressions against a string and using a regex engine. The main reason I decided against it is that regex engines produce no useful diagnostics; they just tell you the string doesn't match. Perhaps the answer to that would be to write a regex engine with better diagnostics - I can see that being useful!
> ---------------------------------------
>
> Oh, that sounds wicked cool. Michael, would you describe at a high level how one would go about converting a complexType into regular expressions, please? Would one complexType be transformed into one regular expression or into multiple regular expressions?
>
> /Roger
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]