[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Saxon and Sun Serializer problems?
- From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
- To: <xml-dev@lists.xml.org>
- Date: Sat, 30 May 2009 13:53:49 -0400
At 2009-05-29 13:38 -0700, Jim Tivy wrote:
>my DOCTYPE dissapears unless I use the the saxon serializer.
>...
>This causes problems for generalizing the code.
May I ask why you think you need the DOCTYPE?
Both XSLT and XQuery construct brand new documents ... these
specifications are designed to create new documents and are not
designed to preserve old documents. New arrangements of your
information are created from a completely resolved set of tree nodes
(all entity references are resolved and not preserved, no DTD
validation is being done, what other declarations from the DTD are
left that you might need since your output won't be using those? And
all CDATA sections are resolved and not preserved.) The input to the
transformation process is a completely standalone resolved tree of
elements, attributes and text. The internal or external declaration
subsets have been used to create the input tree to prepare for your
transformation.
Using XSLT and XQuery you are creating a result from scratch. There
are no entity references that were preserved because there are none
in the input. It is a brand new document.
At 2009-05-30 09:25 -0700, Jim Tivy wrote:
>My proposal is add a DocType node to the XDM.
I see no justification.
>DocTypes are here to stay in Xml.
Absolutely they are ... I agree.
>There are XML based communities like DITA
>that happily use DTDs every day and will continue to use it for the next 10
>years.
Absolutely there are, and for sure they will be using DITA for a long time.
If you want, say for validation reasons, for the result of your
transformation to be validated by an XML validating processor that
requires the SYSTEM identifier, then XSLT offers you the opportunity
to specify the SYSTEM identifier of the tree you are creating from
scratch. XQuery processors such as Saxon offer the same. Or you can
skip adding the reference to a SYSTEM identifier and use a validating
processor that takes the DTD reference on the command line.
If you are using XSLT's character maps in order to synthesize some
entity references in the result tree, then you've got control over
which entity references you'll be making, so you'll know which DTD
you are going to need to resolve them. Having a character map does
not guarantee that you'll create the kinds of entity references that
may have been there in the input, because that information is long gone.
In both cases you know the SYSTEM identifier that you need for your
new document, because you are in total control of the creation of the
new document.
>This identity transform is "lossy" with respect to the InfoSet - or lossy
>with respect to an XDM with a DocType node :)
But I cannot think of what is in the DOCTYPE's internal or external
declarations that is needed by the result you create from scratch.
>Dropping the DocType should not be put in the same class as "<a/> ==
><a></a>" logical equivalencies.
Why not?
>DOM (Level 3 at least) supports DocType as well.
Because that is what is in scope for the DOM! The DOM has entity
references, so if your XML document has entity references satisfied
by the DOCTYPE, then by gum you better preserve those entity definitions.
If DOM gives you what you want, and XSLT doesn't give you what you
want, then use the DOM.
From http://www.w3.org/TR/1999/REC-xslt-19991116
XSLT is not intended as a completely general-purpose XML
transformation language. Rather it is designed primarily
for the kinds of transformations that are needed when XSLT
is used as part of XSL.
From http://www.w3.org/TR/2007/REC-xslt20-20070123/
This specification defines the syntax and semantics of
XSLT 2.0, a language for transforming XML documents into
other XML documents.
Note that it is not in the scope of either specification to preserve
input documents, just to create new documents.
From http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407
This specification defines the Document Object Model Core
Level 3, a platform- and language-neutral interface that
allows programs and scripts to dynamically access and update
the content, structure and style of documents.
Note the use of the phrase "update the content" ... a very different
objective than XSLT. It is critically important to preserve the
DOCTYPE because there is information in the DOCTYPE that may need to
be preserved in the result, because the result is an update of the source.
So, horses for courses ... use whatever technology is appropriate for
the scope of what you are trying to do.
I didn't bother responding to your initial post because I felt you
didn't justify the need for preserving the DOCTYPE given the context
that this is XSLT.
So what am I missing here? And I'll admit I must be missing
something, which is why I'm posting. What is it you are looking for
in the DOCTYPE of the result that you need in your DITA-based environment?
Thanks for helping me better understand your requirement. Knowing
precisely what is the problem you are trying to solve but cannot do
so will help understand the role the new DOCTYPE preservation is
playing in your proposed solution.
. . . . . . . . . . . . Ken
--
XQuery/XSLT/XSL-FO hands-on training - Los Angeles, USA 2009-06-08
Crane Softwrights Ltd. http://www.CraneSoftwrights.com/x/
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video lesson: http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18
Video overview: http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18
G. Ken Holman mailto:gkholman@CraneSoftwrights.com
Male Cancer Awareness Nov'07 http://www.CraneSoftwrights.com/x/bc
Legal business disclaimers: http://www.CraneSoftwrights.com/legal
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]