RE: [xml-dev] Saxon and Sun Serializer problems?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
From: "Jim Tivy" <jimt@bluestream.com>
To: "'G. Ken Holman'" <gkholman@CraneSoftwrights.com>,<xml-dev@lists.xml.org>
Date: Sat, 30 May 2009 15:10:15 -0700
Hi Ken

I read what you said below.  The jist seems to be:

Why would you want to do this?

I should point out the "this" had to do with using SAX in java with the jaxp
Identity Transform.  However, I now extend it more tentatively to include
the "no DocType in the XDM" problem.

To give you some context of what I am doing - my need is primarily pragmatic
- I am a java programmer trying to get from A to B.

In an Xml content management system users use a variety of Xml processors
(or programs if you would prefer) like diverse Xml Editors - XMetal, Epic,
XmlMind and the content management systems that have file Store and Retrieve
capabilities as well as link extract and other Xml processing needs.  All of
these parts "process" Xml.  All of these parts rely on the DocType for
validation or element insertion help or just need it to "round trip" the Xml
so other processors can use that DocType.  Without the DocType, the
serialization looses some serious part of its capability.

Most of these parts operate on the serialization of the Xml from time to
time.  Editors read serializations, users import serializations -
serializations are the standard way of exchanging and making xml processors
interoperable.

Not being able to use powerful tools like XSLT and Sax to process Xml when
"round" tripping of the serialization is required, is restrictive to say the
least, as these technologies have their own strengths - eg: DOM is not XSLT
is not SAX.

Fortunately SAX is usuable on Java - just make sure to use the Sun's Trax
serializer which keeps the docType as the saxon one drops the docType. (see
earlier post).

Does this begin to motivate the reason why?

Jim


-----Original Message-----
From: G. Ken Holman [mailto:gkholman@CraneSoftwrights.com] 
Sent: Saturday, May 30, 2009 10:54 AM
To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Saxon and Sun Serializer problems?

At 2009-05-29 13:38 -0700, Jim Tivy wrote:
>my DOCTYPE dissapears unless I use the the saxon serializer.
>...
>This causes problems for generalizing the code.

May I ask why you think you need the DOCTYPE?

Both XSLT and XQuery construct brand new documents ... these 
specifications are designed to create new documents and are not 
designed to preserve old documents.  New arrangements of your 
information are created from a completely resolved set of tree nodes 
(all entity references are resolved and not preserved, no DTD 
validation is being done, what other declarations from the DTD are 
left that you might need since your output won't be using those?  And 
all CDATA sections are resolved and not preserved.)  The input to the 
transformation process is a completely standalone resolved tree of 
elements, attributes and text.  The internal or external declaration 
subsets have been used to create the input tree to prepare for your 
transformation.
[<JT>] I really think it a travesty if XQuery/XPath/XSLT cannot access all
parts of the source document when they create these "new" result documents.

Using XSLT and XQuery you are creating a result from scratch.  There 
are no entity references that were preserved because there are none 
in the input.  It is a brand new document.
[<JT>] I really don't care about entity references, character references or
what you do with empty elements.

At 2009-05-30 09:25 -0700, Jim Tivy wrote:
>My proposal is add a DocType node to the XDM.

I see no justification.

>DocTypes are here to stay in Xml.

Absolutely they are ... I agree.

>There are XML based communities like DITA
>that happily use DTDs every day and will continue to use it for the next 10
>years.

Absolutely there are, and for sure they will be using DITA for a long time.

If you want, say for validation reasons, for the result of your 
transformation to be validated by an XML validating processor that 
requires the SYSTEM identifier, then XSLT offers you the opportunity 
to specify the SYSTEM identifier of the tree you are creating from 
scratch.  XQuery processors such as Saxon offer the same.  Or you can 
skip adding the reference to a SYSTEM identifier and use a validating 
processor that takes the DTD reference on the command line.
[<JT>] Most DITA XSLTs are used to process different types of documents with
different doc types since they use a class attribute scheme for processing -
not matching on elements!

If you are using XSLT's character maps in order to synthesize some 
entity references in the result tree, then you've got control over 
which entity references you'll be making, so you'll know which DTD 
you are going to need to resolve them.  Having a character map does 
not guarantee that you'll create the kinds of entity references that 
may have been there in the input, because that information is long gone.

In both cases you know the SYSTEM identifier that you need for your 
new document, because you are in total control of the creation of the 
new document.

>This identity transform is "lossy" with respect to the InfoSet - or lossy
>with respect to an XDM with a DocType node :)

But I cannot think of what is in the DOCTYPE's internal or external 
declarations that is needed by the result you create from scratch.
[<JT>] See comment on DITA above.  Why write XSLT programs with hard coded
DOCTYPE entries when you have the DOCTYPE in the input.

>Dropping the DocType should not be put in the same class as "<a/> ==
><a></a>" logical equivalencies.

Why not?

>DOM (Level 3 at least) supports DocType as well.

Because that is what is in scope for the DOM!  The DOM has entity 
references, so if your XML document has entity references satisfied 
by the DOCTYPE, then by gum you better preserve those entity definitions.

If DOM gives you what you want, and XSLT doesn't give you what you 
want, then use the DOM.
[<JT>] They are not the same thing.  One is a hammer, the other a screw
driver.  I hate using hammers for screwdrivers.

 From http://www.w3.org/TR/1999/REC-xslt-19991116

   XSLT is not intended as a completely general-purpose XML
   transformation language. Rather it is designed primarily
   for the kinds of transformations that are needed when XSLT
   is used as part of XSL.
[<JT>] Not sure what this means.  I am not talking about big changes.

 From http://www.w3.org/TR/2007/REC-xslt20-20070123/

   This specification defines the syntax and semantics of
   XSLT 2.0, a language for transforming XML documents into
   other XML documents.

Note that it is not in the scope of either specification to preserve 
input documents, just to create new documents.
[<JT>] Sure, just give me a bit more information from the input please.

 From http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407

   This specification defines the Document Object Model Core
   Level 3, a platform- and language-neutral interface that
   allows programs and scripts to dynamically access and update
   the content, structure and style of documents.

Note the use of the phrase "update the content" ... a very different 
objective than XSLT.  It is critically important to preserve the 
DOCTYPE because there is information in the DOCTYPE that may need to 
be preserved in the result, because the result is an update of the source.

So, horses for courses ... use whatever technology is appropriate for 
the scope of what you are trying to do.

I didn't bother responding to your initial post because I felt you 
didn't justify the need for preserving the DOCTYPE given the context 
that this is XSLT.

So what am I missing here?  And I'll admit I must be missing 
something, which is why I'm posting.  What is it you are looking for 
in the DOCTYPE of the result that you need in your DITA-based environment?

Thanks for helping me better understand your requirement.  Knowing 
precisely what is the problem you are trying to solve but cannot do 
so will help understand the role the new DOCTYPE preservation is 
playing in your proposed solution.
[<JT>] 
Fortunately I am not reliant on XSLT for doing this DocType preservation. My
initial post had to do precisely with the Identity Transform in Java jaxp in
which Sun and Saxon have come up with a different implementation.  So
perhaps my submission should be to the jaxp JSR first, then XDM people
second. XSLT is clearly on the heap of "cannot get there from here without
hack" - my issue, however, had to do with SAX and the use of Jaxp.

. . . . . . . . . . . . Ken

--
XQuery/XSLT/XSL-FO hands-on training - Los Angeles, USA 2009-06-08
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/x/
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video lesson:    http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18
Video overview:  http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/x/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal


_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
Follow-Ups:
- RE: [xml-dev] Saxon and Sun Serializer problems?
  - From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
References:
- Saxon and Sun Serializer problems?
  - From: "Jim Tivy" <jimt@bluestream.com>
- Re: [xml-dev] Saxon and Sun Serializer problems?
  - From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]