OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] Doc vs. Data

[ Lists Home | Date Index | Thread Index ]

At 2003-06-06 11:49 -0700, Dare Obasanjo wrote:
>I'm not sure I agree with your comments on pull vs. push based processing.

Consider the transformation of ...

   <para>Does this <emph>really</emph> work?</para>

... to ...

   <p>Does this <b>really</b> work?</para>

... in a pull-oriented environment ... I'm not convinced it can be done 
easily, if at all in a way that could be easily maintained.

>I do agree that mixed content is the  probably the most relevant 
>differentiator between data-centric & document-centric uses of XML.

Indeed ... I would posit that data-centric structures are rigid 
constructions deployed and employed by programs expecting to position and 
find information in expected places (an invoice), where document-centric 
structures are free-form collections and mixes of information at the whim 
of imagination and free expression (a book).

>However I don't see where this leads to the position that there is 
>antagonism between the folks that tend to create XML documents that 
>contain mixed content and those that don't.

My hackles were raised when XPath 2 was based on W3C Schema thus requiring 
me not to look at my document as text but to look at it as typed data ... 
when I don't want to deal with typed data.  I type text, I want to process 

XPath 1 is based on XML and Namespaces ... full stop ... and I'd like to be 
able to have a fully conformant flavour of XPath 2 that is based *strictly* 
on XML and Namespaces with any new features over XPath 1 that do not go 
beyond just XML and Namespaces.  That way I can have an XSLT 2 that is 
based *strictly* on XML and Namespaces such that I can continue to do my 
text processing *as text processing* knowing exactly what I have in my XML 
because I can see it in my XML and the XML Information Set ... and not be 
forced to look at my documents as typed data and have to second guess what 
my XSLT processor is going to see in the Post Schema Validation Infoset.

I confess to be one of those who feels antagonized when the type-based 
approach to looking at information marked up in XML is being forced down my 
throat without any accommodation for the way I've always worked with XML as 
merely marked-up text.

I posit the documentation-oriented people in our community, and I include 
myself, have been disenfranchised by W3C Schema and XPath 2.

But then I'm repeating the public arguments of myself and many others over 
the months when the realization came to light of what was being done in the 
family of W3C recommendations.

In my own work all of my DTDs have been converted to RELAX-NG and I have 
*yet* to feel the need in any of my text work to embrace W3C Schema from 
the perspective of the best way to solve my text processing requirements.

But ... now the vendors are forcing W3C Schema down my throat for refusing 
to acknowledge the features of other document modeling approaches that 
might have better features for text processing.  I have yet to weigh the 
success of the Microsoft Word use of W3C Schema for text processing, though 
I acknowledge and applaud the applicability of W3C Schema for Microsoft 
Excel (though I've seen some people do an awful lot of text processing 
*inside* Excel so I'm not sure what problems they are going to have, 
perhaps my enthusiasm is misplaced).

While I have no doubt Word will support W3C Schema as a text processing 
application, I wonder at the complexities Word will have to go through to 
do so cleanly and whether that resulting text is easily processable by 
other applications on other platforms (an objective of XML) or will it 
merely be a complex application of a type-based system so as to not break 
when round-tripping through a markup expression?

Would you consider that perhaps the data-oriented objectives of using XML 
are merely to get information from point a to point b, regardless of its 
expression, such that the internal representations on platform a and 
platform b do not lose information but do preserve fidelity between 

Whereas one might consider that the document-oriented objectives of XML are 
to express the content rigourously and completely such that it can be used 
arbitrarily in any way a recipient may wish to interpret the 
information.  One of the many ways (and perhaps the least important) of 
which might be fidelity, but far and away the more important objective for 
the recipient to be able to rearrange, add formatting, present, emphasize 
in different ways, interpret, etc.

Had both Microsoft Word and Microsoft Excel been based on RELAX-NG, then 
one would have had a consistently applied technology that accommodates the 
different uses of XML.

And perhaps that is the main point triggered by your question to the list 
today.  You've been echoing that document-oriented and data-oriented XML 
are not different, but I would say that document-oriented processing and 
data-oriented processing of XML is quite different.  If I'm being forced to 
use data-oriented processing tools to work with my document-oriented 
requirements, to the extent that I'm *losing* the document-oriented ability 
to look at the information in my documents as simple text, that is a *very* 
difficult pill to swallow.

Now it sounds like I'm whining/whinging ... sorry ... but you struck a 
nerve by belittling the difference between the two.

I hope this is considered constructive.

...................... Ken

Upcoming hands-on courses: (registration still open!)
-      (XSLT/XPath and/or XSL-FO) North America: June 16-20, 2003

G. Ken Holman                mailto:gkholman@CraneSoftwrights.com
Crane Softwrights Ltd.         http://www.CraneSoftwrights.com/x/
Box 266, Kars, Ontario CANADA K0A-2E0   +1(613)489-0999 (F:-0995)
ISBN 0-13-065196-6                      Definitive XSLT and XPath
ISBN 0-13-140374-5                              Definitive XSL-FO
ISBN 1-894049-08-X  Practical Transformation Using XSLT and XPath
ISBN 1-894049-11-X              Practical Formatting Using XSL-FO
Member of the XML Guild of Practitioners:    http://XMLGuild.info
Male Breast Cancer Awareness http://www.CraneSoftwrights.com/x/bc


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS