OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Suggestion: Microparsing support in XML Schema (long)

[ Lists Home | Date Index | Thread Index ]
  • From: "Anders W. Tell" <anderst@toolsmiths.se>
  • To: xml-dev@xml.org
  • Date: Wed, 10 May 2000 18:59:34 +0200

A common phenomena which now and then surfaces in the markup
world is the occurrence of what some authors calls "Micro-parsing".
This is the situation when Schema writers define that a XML attribute
should contain structured information and therefore creates a need
for customized parsers, hence the above term.

Two examples are
XPath expression in XSL:  match="/cars/car[@name='volvo']"
Path in SVG:  <path d="M 100 100 L 140 100 L 120 140 z"/>

Is this not a paradox? A markup language which cannot be used
for markup anymore?
Of course all markup languages have a limit and maybe XML's limit have
been reached.

What are the reasons for encoding complex information in a single
attribute ?
The reason I have seen are sofar are:
* compression, produces smaller XML streams  (SVG paths,...)
* usage of attribute strings for readability (XPath expressions.,,,)
* usage of attribute strings for compactness (XPath expressions,...)

The following suggestions is an attempt to "internalize" these encoding
to capture as much as possible of the encoding information inside XML
instead of relying on externally created and managed documentation.

Another side effect of the proposal is that its now possible to have DOM

to structured attributes as if they where XML element encoded.

For Grove enthusiasts it is also possible to view (with a little effort
;)) attributes
as hierarchical node's.

So here goes...

- - - - - - - - - - - - - - - - - - -
First a few initial short definitions:

* Encoding "Stereotype" <=> something that should be encoded,
  is defined by a information model which may be defined in terms of
  one or more information items (nodes/properties,...).

* Encoding "Form" <=> principles for how nodes/properties in an
  information model must be encoded as a strings or XML elements.
  (the following suggestion implies two forms, one for attribute
   and one for XML element encoding)

* "Attribute-Micro-Parser" <=> A software artifact which encodes and
decodes XML attribute strings to/from XML elements.

- - - - - - - - - - - - - - - - - - -
* Add new XML Schema data type which represents "MicroParsed" attribute
  Make it a subtype of "string" with all its facets.
  Schema writers can now derive their own MicroParsed data types, one
for each stereotype they want to encode as attribute.

* In this new data type add a reference to a complexType. This
referenced schema
   defines how to encode the contents (information model) of the
attribute string  ("attribute form") as an XML element tree ("element
Note: Maybe this reference should be a new facet for the string data
Note: With this design it is possible to encode the same stereotype as
either XML attribute string or XML element tree in documents.

* In this new data type add a reference to the attribute's "form
   i.e. where to find more information on how to construct attribute
  strings from the underlying information model.

* All available information in the stereotypes information model MUST be

encoded in the "element form" and the information encoded in the
form" MUST be a "subset" of the information encoded in the element
  encoding (similar to applying a grove plan before encoding as
  The "element form" is considered a "complete" encoding form
   (contains all information in the information model).

* Information set:
Add an extra optional property to attribute information item.
 property: "parsed"  sequence<element-info-item[zero or one]>

* Recommend that all Schema authors first create an information model
for the stereotype then create encoding "form"s for the primary XML
element encoding form and last the corresponding XML attribute strings

* DOM framework
Create a new software artifact called  "DOMAttributeMicroParser"

interface DOMAttributeMicroParser {
    readonly  attribute string  name;
    readonly  attribute string  namespace;

/* parse attribute string and create the corresponding element tree*/
    long            parse(in DOMAttribute from, out DOMElement to);

/* Traverse the element tree and create corresponding attribute string
expression */
    long            construct(in DOMElement from, out DOMAttribute to);

* DOM framework [Optional]
Create a subclass to DOM Attribute called "DOMParsedAttribute"

interface DOMParsedAttribute : DOMAttribute {
     attribute DOMElement  fParsed;  /* parsed attribute */

All comment are welcome.

Best Regards
Anders W. Tell
/  Financial Toolsmiths AB  /
/  Anders W. Tell           /

This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS