[
Lists Home |
Date Index |
Thread Index
]
- To: xml-dev@lists.xml.org
- Subject: ANN: Python implementation of Regular Fragmentations and online demo
- From: Eric van der Vlist <vdv@dyomedea.com>
- Date: Mon, 17 Jun 2002 16:57:23 +0200
- User-agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.0rc3) Gecko/20020531 Debian/1.0rc3-2
See: http://downloads.dyomedea.com/python/regfrag/
Even if the title says it all, I'd like to propose for discussion some
suggestions and extensions which I have added to my implementation as a
proof of concept and will eventually remove depending on the result of
this discussion.
1)Matching errors handling
These errors deserve some more specification: what's happening when a
pattern doesn't match?
when there are more matchs than nodes specified to serialize them?
when there are more nodes specified than matchs?
My suggestion is to ignore nodes and matches "overflow" (ie process only
the minimum between the number of matchs and the number of specified
nodes). To be coherent with this rule, when there is no match, no nodes
should be serialized and the fragmented node could be left empty.
2) Attribute prefixes are only hints
Namespace prefixes specified for the attributes generated by regular
fragmentations cannot be used when they conflict with prefixes used in
the instance document or required by other attributes in the same
element. They should therefore be considered as hints rather than
directives.
The algorithm used in my implementation is the following:
* The required prefix is used for the generated attribute
if it is either not defined or defined for the namespace
URI of the generated attribute.
* Otherwise, if the namespace URI of the generated attribute
is already associated to a prefix, this prefix is used.
* In last resort, an indice is added to the required prefix
to generate a prefix not yet used in this element.
3) Generalization of the repeat attribute
An alternative way to write the example:
<fragmentRule pattern="(\d{1})(\d{1})">
<applyTo>
<element nsURI="http://simonstl.com/ns/types/" localName="century" />
<element nsURI="http://simonstl.com/ns/types/" localName="year" />
<element nsURI="http://simonstl.com/ns/types/" localName="month" />
</applyTo>
<produce>
<element nsURI="http://simonstl.com/ns/types/" localName="digit"
prefix="type" />
<element nsURI="http://simonstl.com/ns/types/" localName="digit"
prefix="type" />
</produce>
</fragmentRule>
could be to generalize the use of the repeat attribute to match rules:
<fragmentRule pattern="(\d{1})(\d{1})" repeat="true">
<applyTo>
<element nsURI="http://simonstl.com/ns/types/" localName="century" />
<element nsURI="http://simonstl.com/ns/types/" localName="year" />
<element nsURI="http://simonstl.com/ns/types/" localName="month" />
</applyTo>
<produce>
<element nsURI="http://simonstl.com/ns/types/" localName="digit"
prefix="type" />
</produce>
</fragmentRule>
4) skipFirst
It's often subjective to define default values, however I think that the
default value for the skipFirst attribute could be "false". Also, it's
not clear if this attribute applies to all the types of rules (match
and split) --I think that for coherence, it should be the case.
5) Duplicate attributes
The current rule is: "Repeating the same attribute name will leave only
the last version in the final output" which I find error prone
especially when attributes are generated out of the fragmentation of
other attributes: this can lead to recursion loops and even when this is
not the case, the order which which the attributes will be processed and
thus generated is not significant. I would suggest to raise a
fragmentation time error when an attribute is "overriden".
6) Escape recursion
An attribute "break" could be added to the fragmentRule element to
specify that no further recursion should take place.
7) Attributes fragmentation
I have implemented attribute fragmentations trying to stay as much as
possible in the original idea of using the same mechanism even though
the semantic is slightly different and this proposal is coherent even
though not always deterministic.
The major two issues with fragmentating attributes are that the result
of the fragmentation cannot be kept in the attribute (at least not in
the general case) since attributes are not structured and that the order
of the attributes is not meaningfull.
Since the result cannot be kept into the attribute, it is located in the
"hosting" element and if the result is serialized as elements or
characters, the relative order of the serialization of the fragmentation
of two or more attributes in the same element cannot be guaranted.
7) Other node types (not implemented)
Currently, elements and attributes can be fragmented into elements,
attributes and text nodes. What about adding other types of nodes (ie
PIs and comments) to the list?
Thanks for your feedback,
Eric
--
See you in San Diego.
http://conferences.oreillynet.com/os2002/
------------------------------------------------------------------------
Eric van der Vlist http://xmlfr.org http://dyomedea.com
http://xsltunit.org http://4xt.org http://examplotron.org
------------------------------------------------------------------------
|