[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
RE: [xml-dev] Saxon and Sun Serializer problems?
- From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
- To: <xml-dev@lists.xml.org>
- Date: Sat, 30 May 2009 19:26:16 -0400
Thank you for engaging me on these details, Jim.
At 2009-05-30 15:10 -0700, Jim Tivy wrote:
>Hi Ken
>
>I read what you said below. The jist seems to be:
>
>Why would you want to do this?
I'm sorry I didn't make myself clear. My jist was: what feature(s)
does having the DOCTYPE give you? Which is different. There are so
many other reasons why an XML document cannot be round-tripped
through XSLT that just providing the DOCTYPE feature won't solve. I
cited the lack of preservation of CDATA sections, the lack of
preservation of the entity references (which includes numeric
character references (not even resolved by a DOCTYPE), internal
parsed general entities, external parsed general entities), and there
are others including no link to NOTATION declarations for processing
instruction target de-referencing (a very sore point of mine that the
designers of XML processing interfaces have never felt the need to support).
So given so many features for round-tripping that are not there, just
putting in the DOCTYPE won't fix any of the ones I've cited.
>I should point out the "this" had to do with using SAX in java with the jaxp
>Identity Transform. However, I now extend it more tentatively to include
>the "no DocType in the XDM" problem.
Yes, I saw that. I was trying to figure out what it was about the
DOCTYPE that you would get when you can't get other things left out
of the infoset or XDM.
>To give you some context of what I am doing - my need is primarily pragmatic
>- I am a java programmer trying to get from A to B.
Fine ... I won't hold that against you. :{)}
>In an Xml content management system users use a variety of Xml processors
>(or programs if you would prefer) like diverse Xml Editors - XMetal, Epic,
>XmlMind and the content management systems that have file Store and Retrieve
>capabilities as well as link extract and other Xml processing needs. All of
>these parts "process" Xml.
Actually, they process XML syntax, they don't process the information
in an XML document. XSLT and XQuery were designed to build new
structures from the information in structured sources. They were not
designed to process the syntax of an XML document. XML editors, in
particular, are designed to process the syntax of an XML document,
and as we old (er, long-time) SGML'ers learned long ago you can't
base an XML editor on an XML processor in the same way you can't base
an SGML editor on an SGML processor.
Now the DOM *does* have a few features that process some (not all!)
of the syntax of an XML document, but the perspective is
different. In the DOM the input tree *is* the output tree, unlike
XSLT and XQuery where the input tree is read-only and the output tree
is write-only: created, from scratch, in a single pass, without
backtrack or repair or inspection.
>All of these parts rely on the DocType for
>validation or element insertion help or just need it to "round trip" the Xml
>so other processors can use that DocType. Without the DocType, the
>serialization looses some serious part of its capability.
Well now you've lost me again, because the limited number of
serialization features in XSLT/XQuery renders the information found
in the DOCTYPE quite irrelevant. The XSLT feature of adding a SYSTEM
identifier is there as I see it really only for the validation
bit. Because what is serialized is the information that was used to
build the result tree ... not the syntax borrowed from the source tree.
>Most of these parts operate on the serialization of the Xml from time to
>time. Editors read serializations, users import serializations -
>serializations are the standard way of exchanging and making xml processors
>interoperable.
Ummmmm .... I can't agree for anything other than XML editors which
are XML syntax applications not XML information
applications. XML-based applications are interoperable because the
XML processors all deliver the same content information to the
applications using them. And the decision by designers of DOM to
include syntax related issues (note again, not all syntax related
issues) can enable many aspects of input syntax preservation because
the DOM is acting *on* the document. XSLT and XQuery are not acting
*on* the document, they are acting on the information found in the document.
>Not being able to use powerful tools like XSLT and Sax to process Xml when
>"round" tripping of the serialization is required, is restrictive to say the
>least, as these technologies have their own strengths - eg: DOM is not XSLT
>is not SAX.
>
>Fortunately SAX is usuable on Java - just make sure to use the Sun's Trax
>serializer which keeps the docType as the saxon one drops the docType. (see
>earlier post).
>
>Does this begin to motivate the reason why?
I hear what you are trying to say, and I had already interpreted the
need for syntax preservation to be to round trip the syntax of an XML
document, but I haven't yet heard a justification for adding the
DOCTYPE to XDM. Adding the DOCTYPE to XDM doesn't give you
round-tripping of an arbitrary XML document because so much more
would be needed. And all of it would be out of scope for XSLT/XQuery.
This comes up often in the classroom from students who thought XSLT
and XQuery could/should be used for XML document syntax
preservation. Because XSLT and XQuery are node-tree-transformation
tools and not XML syntax tools, they cannot be used for syntax
preservation. XSLT and XQuery are not angle-bracket processors, they
are node-tree processors. Serialization is not needed when the
processor is embedded in, say, an XSL-FO engine. Serialization is a
nice-to-have that allows one to create artefacts that can be useful
as input to other XML-based tools.
Consider source tree data projection: if I do an XSLT or XQuery
transformation on a source node tree created from a non-XML source,
what is the definition of the DOCTYPE? More to the point, what
information might there have been put into a DOCTYPE in the
interpretation of the projection to be useful in the node-tree
transformation? I claim there is no such information.
And I haven't found such information in the response that you've given.
Thank you again for trying to help me better understand what you
need. I really am trying to be supportive here to reveal what
specific features of DOCTYPE you will find helpful.
. . . . . . . . . . . . . . Ken
--
XQuery/XSLT/XSL-FO hands-on training - Los Angeles, USA 2009-06-08
Crane Softwrights Ltd. http://www.CraneSoftwrights.com/x/
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video lesson: http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18
Video overview: http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18
G. Ken Holman mailto:gkholman@CraneSoftwrights.com
Male Cancer Awareness Nov'07 http://www.CraneSoftwrights.com/x/bc
Legal business disclaimers: http://www.CraneSoftwrights.com/legal
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]