OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] Doc vs. Data

[ Lists Home | Date Index | Thread Index ]

If you want to transform XML use XSLT. Doing so with a parser be it pull or push based seems to be hard way to crack that nut IMHO. 
I've mentioned many times on the "No typing in XPath/XSLT 2.0" permathread that one is not required to know or utilize XML Schema if one is working on untyped XML in Xpath/XSLT 2.0. However it is true that XPath is now strongly typed and no longer weakly typed which is probably a rude shock for users of XPath/XSLT 1.0


From: G. Ken Holman [mailto:gkholman@CraneSoftwrights.com]
Sent: Fri 6/6/2003 12:24 PM
To: xml-dev@lists.xml.org
Subject: RE: [xml-dev] Doc vs. Data

At 2003-06-06 11:49 -0700, Dare Obasanjo wrote:
>I'm not sure I agree with your comments on pull vs. push based processing.

Consider the transformation of ...

   <para>Does this <emph>really</emph> work?</para>

... to ...

   <p>Does this <b>really</b> work?</para>

... in a pull-oriented environment ... I'm not convinced it can be done
easily, if at all in a way that could be easily maintained.

>I do agree that mixed content is the  probably the most relevant
>differentiator between data-centric & document-centric uses of XML.

Indeed ... I would posit that data-centric structures are rigid
constructions deployed and employed by programs expecting to position and
find information in expected places (an invoice), where document-centric
structures are free-form collections and mixes of information at the whim
of imagination and free expression (a book).

>However I don't see where this leads to the position that there is
>antagonism between the folks that tend to create XML documents that
>contain mixed content and those that don't.

My hackles were raised when XPath 2 was based on W3C Schema thus requiring
me not to look at my document as text but to look at it as typed data ...
when I don't want to deal with typed data.  I type text, I want to process

XPath 1 is based on XML and Namespaces ... full stop ... and I'd like to be
able to have a fully conformant flavour of XPath 2 that is based *strictly*
on XML and Namespaces with any new features over XPath 1 that do not go
beyond just XML and Namespaces.  That way I can have an XSLT 2 that is
based *strictly* on XML and Namespaces such that I can continue to do my
text processing *as text processing* knowing exactly what I have in my XML
because I can see it in my XML and the XML Information Set ... and not be
forced to look at my documents as typed data and have to second guess what
my XSLT processor is going to see in the Post Schema Validation Infoset.

I confess to be one of those who feels antagonized when the type-based
approach to looking at information marked up in XML is being forced down my
throat without any accommodation for the way I've always worked with XML as
merely marked-up text.

I posit the documentation-oriented people in our community, and I include
myself, have been disenfranchised by W3C Schema and XPath 2.

But then I'm repeating the public arguments of myself and many others over
the months when the realization came to light of what was being done in the
family of W3C recommendations.

In my own work all of my DTDs have been converted to RELAX-NG and I have
*yet* to feel the need in any of my text work to embrace W3C Schema from
the perspective of the best way to solve my text processing requirements.

But ... now the vendors are forcing W3C Schema down my throat for refusing
to acknowledge the features of other document modeling approaches that
might have better features for text processing.  I have yet to weigh the
success of the Microsoft Word use of W3C Schema for text processing, though
I acknowledge and applaud the applicability of W3C Schema for Microsoft
Excel (though I've seen some people do an awful lot of text processing
*inside* Excel so I'm not sure what problems they are going to have,
perhaps my enthusiasm is misplaced).

While I have no doubt Word will support W3C Schema as a text processing
application, I wonder at the complexities Word will have to go through to
do so cleanly and whether that resulting text is easily processable by
other applications on other platforms (an objective of XML) or will it
merely be a complex application of a type-based system so as to not break
when round-tripping through a markup expression?

Would you consider that perhaps the data-oriented objectives of using XML
are merely to get information from point a to point b, regardless of its
expression, such that the internal representations on platform a and
platform b do not lose information but do preserve fidelity between

Whereas one might consider that the document-oriented objectives of XML are
to express the content rigourously and completely such that it can be used
arbitrarily in any way a recipient may wish to interpret the
information.  One of the many ways (and perhaps the least important) of
which might be fidelity, but far and away the more important objective for
the recipient to be able to rearrange, add formatting, present, emphasize
in different ways, interpret, etc.

Had both Microsoft Word and Microsoft Excel been based on RELAX-NG, then
one would have had a consistently applied technology that accommodates the
different uses of XML.

And perhaps that is the main point triggered by your question to the list
today.  You've been echoing that document-oriented and data-oriented XML
are not different, but I would say that document-oriented processing and
data-oriented processing of XML is quite different.  If I'm being forced to
use data-oriented processing tools to work with my document-oriented
requirements, to the extent that I'm *losing* the document-oriented ability
to look at the information in my documents as simple text, that is a *very*
difficult pill to swallow.

Now it sounds like I'm whining/whinging ... sorry ... but you struck a
nerve by belittling the difference between the two.

I hope this is considered constructive.

...................... Ken

Upcoming hands-on courses: (registration still open!)
-      (XSLT/XPath and/or XSL-FO) North America: June 16-20, 2003

G. Ken Holman                mailto:gkholman@CraneSoftwrights.com
Crane Softwrights Ltd.         http://www.CraneSoftwrights.com/x/
Box 266, Kars, Ontario CANADA K0A-2E0   +1(613)489-0999 (F:-0995)
ISBN 0-13-065196-6                      Definitive XSLT and XPath
ISBN 0-13-140374-5                              Definitive XSL-FO
ISBN 1-894049-08-X  Practical Transformation Using XSLT and XPath
ISBN 1-894049-11-X              Practical Formatting Using XSL-FO
Member of the XML Guild of Practitioners:    http://XMLGuild.info
Male Breast Cancer Awareness http://www.CraneSoftwrights.com/x/bc

The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS