xml-dev - ANN: Python implementation of Regular Fragmentations and online demo

ANN: Python implementation of Regular Fragmentations and online demo

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: ANN: Python implementation of Regular Fragmentations and online demo
From: Eric van der Vlist <vdv@dyomedea.com>
Date: Mon, 17 Jun 2002 16:57:23 +0200
User-agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.0rc3) Gecko/20020531 Debian/1.0rc3-2

See: http://downloads.dyomedea.com/python/regfrag/

Even if the title says it all, I'd like to propose for discussion some 
suggestions and extensions which I have added to my implementation as a 
proof of concept and will eventually remove depending on the result of 
this discussion.


1)Matching errors handling

These errors deserve some more specification: what's happening when a 
pattern doesn't match?
when there are more matchs than nodes specified to serialize them?
when there are more nodes specified than matchs?

My suggestion is to ignore nodes and matches "overflow" (ie process only 
the minimum between the number of matchs and the number of specified 
nodes). To be coherent with this rule, when there is no match, no nodes 
should be serialized and the fragmented node could be left empty.

2) Attribute prefixes are only hints

Namespace prefixes specified for the attributes generated by regular 
fragmentations cannot be used when they conflict with prefixes used in 
the instance document or required by other attributes in the same 
element. They should therefore be considered as hints rather than 
directives.

The algorithm used in my implementation is the following:

     * The required prefix is used for the generated attribute
       if it is either not defined or defined for the namespace
       URI of the generated attribute.
     * Otherwise, if the namespace URI of the generated attribute
       is already associated to a prefix, this prefix is used.
     * In last resort, an indice is added to the required prefix
       to generate a prefix not yet used in this element.

3) Generalization of the repeat attribute

An alternative way to write the example:

  <fragmentRule pattern="(\d{1})(\d{1})">
   <applyTo>
    <element nsURI="http://simonstl.com/ns/types/"; localName="century"  />
    <element nsURI="http://simonstl.com/ns/types/"; localName="year"  />
    <element nsURI="http://simonstl.com/ns/types/"; localName="month"  />
   </applyTo>
   <produce>
    <element nsURI="http://simonstl.com/ns/types/"; localName="digit" 
prefix="type" />
    <element nsURI="http://simonstl.com/ns/types/"; localName="digit" 
prefix="type" />
   </produce>
  </fragmentRule>

could be to generalize the use of the repeat attribute to match rules:

  <fragmentRule pattern="(\d{1})(\d{1})" repeat="true">
   <applyTo>
    <element nsURI="http://simonstl.com/ns/types/"; localName="century"  />
    <element nsURI="http://simonstl.com/ns/types/"; localName="year"  />
    <element nsURI="http://simonstl.com/ns/types/"; localName="month"  />
   </applyTo>
   <produce>
    <element nsURI="http://simonstl.com/ns/types/"; localName="digit" 
prefix="type" />
   </produce>
  </fragmentRule>

4) skipFirst

It's often subjective to define default values, however I think that the 
default  value for the skipFirst attribute could be "false". Also, it's 
not clear if this  attribute applies to all the types of rules (match 
and split) --I think that for  coherence, it should be the case.

5) Duplicate attributes

The current rule is: "Repeating the same attribute name will leave only 
the last version in the final output" which I find error prone 
especially when attributes are generated out of the fragmentation of 
other attributes: this can lead to recursion loops and even when this is 
not the case, the order which which the attributes will be processed and 
thus generated is not significant. I would suggest to raise a 
fragmentation time error when an attribute is "overriden".

6) Escape recursion

An attribute "break" could be added to the fragmentRule element to 
specify that no further recursion should take place.

7) Attributes fragmentation

I have implemented attribute fragmentations trying to stay as much as 
possible in the original idea of using the same mechanism even though 
the semantic is slightly different and this proposal is coherent even 
though not always deterministic.

The major two issues with fragmentating attributes are that the result 
of the fragmentation cannot be kept in the attribute (at least not in 
the general case) since attributes are not structured and that the order 
of the attributes is not meaningfull.

Since the result cannot be kept into the attribute, it is located in the 
"hosting" element and if the result is serialized as elements or 
characters, the relative order of the serialization of the fragmentation 
of two or more attributes in the same element cannot be guaranted.

7) Other node types (not implemented)

Currently, elements and attributes can be fragmented into elements, 
attributes and text nodes. What about adding other types of nodes (ie 
PIs and comments) to the list?

Thanks for your feedback,

Eric
-- 
See you in San Diego.
                                http://conferences.oreillynet.com/os2002/
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
http://xsltunit.org      http://4xt.org           http://examplotron.org
------------------------------------------------------------------------

Follow-Ups:
- Re: [xml-dev] ANN: Python implementation of Regular Fragmentations and online demo
  - From: "Simon St.Laurent" <simonstl@simonstl.com>

Prev by Date: RE: [xml-dev] Illegal Characters in Namespace URIs
Next by Date: OASIS announcement of edXML discussion list
Previous by thread: RE: [xml-dev] Best Practice - beyond schema
Next by thread: Re: [xml-dev] ANN: Python implementation of Regular Fragmentations and online demo
Index(es):
- Date
- Thread