XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
XML is a space

Hello devs,
 
occasionally I say something along these lines: the sum total of accessible XML resources is a continuous space of information, which we can navigate and process in a unified way.
 
Awareness of this space may have a significant impact on our decisions whether or not to use XML, and how to use it; in fact it may have an impact on what the term "XML" means to us. Common awareness of this space might have a marked impact on the role of XML within IT, and it might even change mainstream approaches to various problems.
 
Alas, there is no common awareness of this phenomenon of an info space created by XML. The average reactions to statements about XML creating a space of information range from scepticism over disparagement to ridicule and hostility. Obviously, those statements do not match the practical experience gathered by the majority of people speaking about XML. Therefore I invite anybody to whom the space is not obvious to contemplate a few small code examples which may be helpful for understanding why I speak of a space.
 
[
If anybody is interested in trying out the code, this can be done, for example, by installing BaseX, an excellent XQuery processor with full support for XQuery 3.0 and the XQuery Update Facility. Download & installation requires less than a minute of your time. Go here:
   http://files.basex.org/releases/latest/
download the latest exe, double click and press "Exec".
]
 
Here we go. To start with, let me share a little idiom which allows you to process any number of XML resources simultaneously with the same ease as you would process a single document. If you create a catalog of XML document references like this:
 
<docs>
  <doc href=""/>
  <doc href=""/>
  <doc href=""/>
   ...
</docs>
 
and pass the catalog to your XQuery or XSLT as the context node (option -i in BaseX, -s: in Saxon), you can access the sum total of referenced documents as conveniently as you would access a single document. See PS for a five-line XQuery creating such catalogs automatically, taking as input a root directory name and a file name pattern.
 
For our examples I created a catalog of NIEM XSDs:
   basex -b dir=niem-2.1 -b files=*.xsd writeCat.xq > cat.xml
 
Now let us explore and process those 123 XSDs. In general, having started a query with this line:
 
   let $space := //@href/doc(resolve-uri(., base-uri(..)))
 
we can access the forest of referenced XML nodes by letting XPath expressions start with $space, like this:
 
   $space//foo
 
Ok. Suppose we want to view the vocabulary established by those schema documents. For simplicity's sake, we content ourselves with producing a sorted list of element names, like this:
 
_Curve
_CurveSegment
_GeometricPrimitive
...
AbstractContinuousCoverage
AbstractCoordinateOperation
AbstractCoordinateSystem
...
XDescriptionText
xmlContent
XValue
YDescriptionText
Year
YearMonth
YValue
 
This is achieved by the following five-liner:
 
= = = = = = = = = = =
let $space := //@href/doc(resolve-uri(., base-uri(..))) return
string-join(
    for $name in distinct-values($space//xs:element/@name)
    order by lower-case($name) return $name
, '&#xA;')
= = = = = = = = = = =
 
It can be called like this:
   basex -s method=text -i cat.xml getElemNames.xq > names.txt
 
Now we want to check data quality. We are interested in enumeration values whose whitespace content is not normalized, e.g. because of trailing blanks. We catch those black sheep with a two-liner:
 
= = = = = = = = = = =
let $space := //@href/doc(resolve-uri(., base-uri(..))) return
string-join(distinct-values($space//xs:enumeration/@value[not(. eq normalize-space(.))]), '&#xA;')
= = = = = = = = = = =
 
The result is two enumeration values with a single trailing blank: 'true ' and 'false '. How to correct our data? BaseX supports the XQuery Update Facility, so the following three-liner does the job:
 
= = = = = = = = = = =
let $space := //@href/doc(resolve-uri(., base-uri(..))) return
$space//xs:enumeration/@value[not(. eq normalize-space(.))]/
(replace value of node . with normalize-space(.))
= = = = = = = = = = =
 
Calling the query, don't forget to set the -u option of BaseX which ensures in-place update of the documents. This is the right way to do it:
   basex -u -i cat.xml correctFaultyEnums.xq
 
Let's assume we want to get rid of the annotations, as they make it difficult to read the structures. You can do it with the following query:
 
= = = = = = = = = = =
let $space := //@href/doc(resolve-uri(., base-uri(..))) return
delete node $space//xs:annotation
= = = = = = = = = = =
 
Finally, let's do a search. The following query yields all enumeration types which contain an enumeration value matching a supplied regex; the display yields for each enumeration value a concatenation of value and annotation. This is the query:
 
= = = = = = = = = = =
declare variable $pattern external;
let $space := //@href/doc(resolve-uri(., base-uri(..))) return
string-join(
   $space//xs:simpleType[.//xs:enumeration[matches(., $pattern, 'i')]]!
   (@name, .//xs:enumeration/(concat('   ', @value, .//xs:documentation/concat('  #  ', .))), '')
, '&#xA;')
= = = = = = = = = = =
 
and this is how it may be called:
   basex -s method=text -i cat.xml -b pattern=.*shipment.* getEnums.xq > h.txt
 
These examples may suffice to convey an impression of the coding experience on which the speaking of a "space" is based. The homogeneity of the resources actually used in the examples (all XSDs) is of course irrelevant to the experience as such; a homogeneous set of documents was only chosen in order to enable minimalistic, self-explanatory examples.
 
Finally, I invite you to a thought experiment. Imagine an environment using 123 relational databases. Further imagine that the database definitions were provided by XML files with an adhoc vocabulary allowing for the straightforward transformation into SQL scripts. It is obvious that the sum total of database definitions could be monitored and analyzed with amazing simplicity, using sweeping XPath expressions, in a way similar to how we processed 123 XSDs. And I think that such monitoring and evaluation would be much more difficult to achieve if the database definitions were only available as SQL scripts. What this thought experiment underlines is the difference between information integrated into the info space, and information not integrated. Contrary to common belief, it is not irrelevant if XML or any other syntax is used, as (currently!) only XML provides integration into the info space.
 
Hans-Juergen
 
 
PS:
Query 'writeCat.xq' creates an XML catalog referencing all XML documents found in or under a given root directory and matching a given file name pattern. This is the query text:
 
= = = = = = = = = = =
declare variable $dir external;
declare variable $files external;
<docs>{
   file:list($dir, true(), $files) ! <doc href="'/', replace(., '\\', '/'))}"/>
}</docs>
= = = = = = = = = = =
 
Example call:
   basex -b dir=niem-2.1 -b files=*.xsd writeCat.xq > cat.xml
 
Example outout:
<docs>
  <doc href=""/>
  <doc href=""/>
   ...
</docs>


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS