RE: [xml-dev] Saxon and Sun Serializer problems?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "Michael Kay" <mike@saxonica.com>
To: "'Jim Tivy'" <jimt@bluestream.com>,"'Amelia A Lewis'" <amyzing@talsever.com>
Date: Sat, 30 May 2009 11:19:02 +0100

> I would be interested in the "exact reason" why this has 
> happened in the specs.  As well, I would like to be 
> instructed as to why my concern is obscure or misdirected.

You'll find an essay on the subject in the section on data model in my XSLT
reference book.

The core reason is that the base XML spec (very deliberately, I believe, but
mistakenly in my view) confined itself almost entirely to saying what
constituted valid XML syntax, and saying almost nothing about the
information payload of an XML document. Apart from a few hints (for example
that the order of attributes is insignificant, or that a minimised tag <a/>
is informationally equivalent to <a></a>) it gives no definitive statement
about what distinctions are meaningful and what aren't. You can make
guesses, and in some cases everyone would agree with you (for example, that
the spaces around the equals sign in an attribute are immaterial), and in
other cases not everyone would agree (for example, CDATA sections).

So it was left to other people to make their own definitions, and they came
out different.

You can then ask questions about why individual decisions were made, for
example why the XSLT/XPath data model ignores DOCTYPE. I suspect a good part
of the answer to that particular questions is that at the time, DTDs were
perceived as obsolescent - they were a stop-gap measure provided until XML
Schema was available. (That also explains, I think, why DTDs were never made
namespace-aware.)
> 
> In the meantime, on a practical level the Trax API in java is 
> based on SAX and handles both the LexicalHandler and the 
> ContentHandler.  

Actually, the identity transformer recognizes three representations: lexical
XML, SAX events, and DOM trees, and allows any of these three to be
transformed into any other. There's no special recognition of any one of
them, and no guidance as to what should be preserved (e.g. entity
references) in the conversion. Since the Saxon engine is essentially based
on the XDM model, it converts between these three representations via the
XDM model, and there's certainly nothing in the JAXP spec that says that
isn't a reasonable interpretation. Another equally valid interpretation
would be to go via canonical XML, but the spec doesn't mandate one or the
other.

> 
> Since XSLT is a "pass through" kind of technology I don't see 
> the sense of being lossy. 

From that you could argue that it should preserve the order of attributes
and the whitespace around the equals signs. The word "lossy" can only be
interpreted relative to some information model.

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay

Follow-Ups:
- RE: [xml-dev] Saxon and Sun Serializer problems?
  - From: "Jim Tivy" <jimt@bluestream.com>

References:
- Saxon and Sun Serializer problems?
  - From: "Jim Tivy" <jimt@bluestream.com>
- RE: [xml-dev] Saxon and Sun Serializer problems?
  - From: "Michael Kay" <mike@saxonica.com>
- RE: [xml-dev] Saxon and Sun Serializer problems?
  - From: "Jim Tivy" <jimt@bluestream.com>
- RE: [xml-dev] Saxon and Sun Serializer problems?
  - From: Amelia A Lewis <amyzing@talsever.com>
- RE: [xml-dev] Saxon and Sun Serializer problems?
  - From: "Jim Tivy" <jimt@bluestream.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]