xml-dev - RE: [xml-dev] XML-enabled databases, XQuery APIs

RE: [xml-dev] XML-enabled databases, XQuery APIs

[ Lists Home | Date Index | Thread Index ]

To: "Bullard, Claude L \(Len\)" <len.bullard@intergraph.com>,"Michael Kay" <mike@saxonica.com>,"Ronald Bourret" <rpbourret@rpbourret.com>
Subject: RE: [xml-dev] XML-enabled databases, XQuery APIs
From: "Michael Rys" <mrys@microsoft.com>
Date: Mon, 18 Apr 2005 10:57:42 -0700
Cc: <xml-dev@lists.xml.org>
Thread-index: AcVEPeLNIFvfRYqoTwuVjlVLKj+1hwAALsMA
Thread-topic: [xml-dev] XML-enabled databases, XQuery APIs

See below.

Best regards
Michael

> -----Original Message-----
> From: Bullard, Claude L (Len) [mailto:len.bullard@intergraph.com]
> Sent: Monday, April 18, 2005 10:42 AM
> To: Michael Rys; 'Michael Kay'; Ronald Bourret
> Cc: xml-dev@lists.xml.org
> Subject: RE: [xml-dev] XML-enabled databases, XQuery APIs
> 
> Michael and Michael:
> 
> Thanks.  I've been wondering about overall performance
> of XML-enabled systems using non-traditional document
> types, eg, spatial data, and I had an XML expert with
> MS MVP status sleeping on my couch this weekend.
> 
> 1)  The effect of the binary is to increase the parser space.
> The relational system doesn't care because it uses an internal
> representation.
> 
> However, the binary is reputed to create a faster parse.  So
> while there is no query performance effect, isn't the shredder faster,
> that is, assuming XML on input?  Wouldn't speeding up the
> pipeline be useful given 2)?

[Michael Rys] That depends on the binary format. For example, the
fastest way for us to get XML data would have been if the
closely-coupled client protocols (such as OLEDB, ADO.Net, ODBC, JDBC
etc) would send us our internal binary XML. However, since we have to
make sure that the binary XML is hardened against malicious attacks, we
currently do not do this, since we are not going to get as much of a
perf improvement anymore. We may still do so in a future release, but
assuming that we could gain 20% in speed that way, I think for our
scenario, the added complexity is not really worth it given Moore's
"law".

The same often holds for other binary formats. Also, these performance
improvements may not materialize in a standardized binary XML format, if
the format is trying to address other scenarios.

> 2)  The effect of a document type varies by document type
> and the operation.  Intuitively true for any XML document type.
> 
> Since value handling depends on the value shape (the difference
> in working with unstructured text vs delimited text (the case
> for say long strings of vector values)), the relational model
> isn't the issue: microparsing is, and again, it might be better
> to be getting a binary back for some document types.  The
> increase in parser space could be worth it if one handles a
> lot of documents of that type.

[Michael Rys] Note that indices are meant to address the micro-parsing
issue in the sense that you micro-parse to create the index and then
don't have to during query time (unless it is more cost-effective). And
the internal format of the stored XML is in a binary format, that also
should address some of the micro-parsing issues (such as that xs:double
is stored as a double value and not a string in the binary format).

> It seems that if one were to enable SQL Server spatially,
> one would look for means to speed up that pipeline if the
> spatial data is coming in and out as XML.  I have my doubts
> about articles that claim XML datatypes herald the end of
> middleware
[Michael Rys] Me too :-). 

> and about experts that tell me Microsoft
> is dumping XQuery for anything except SQL Server. 

[Michael Rys] If you refer to us not shipping XQuery on the mid-tier:
This is currently the case, since we have not seen enough user-scenarios
that satisfy the investment (besides the obvious scheduling issue). I am
interested in seeing your scenarios though (feel free to send me and
Mike Champion private mail on this, if you like).

> Those
> seem to be opinions based on business document types, and
> not very large real-time maps or dynamic concept
> sets.
> 
> len
> 
> 
> From: Michael Rys [mailto:mrys@microsoft.com]
> 
> See below.
> 
> Best regards
> Michael
> 
> 
> > From: Bullard, Claude L (Len) [mailto:len.bullard@intergraph.com]
> 
> > What would be the consequences to XML datatypes in relational
> > databases should an XML binary standard be created?
> 
> [Michael Rys] See my other response.
> 
> > What is the efficiency or effectiveness of the indices and
> > queries given radically different kinds of XML document types
> > (eg, querying over a business document in XML vs a vector
> > graphic in XML)?  Or put another way, how does the document
> > type as instanced in the XML datatype affect querying performance?
> 
> [Michael Rys] Well, that depends heavily on the type of queries that
you
> plan on running. The answer is about the same as if you would ask me
> about how efficient or effective the indices and the query optimizer
is
> for different kinds of relational schemata (OLTP, OLAP, lots of tables
> with lots of columns etc.).
> 
> It certainly does affect it, but the question at the moment should be
> more along the lines of: If your data fits the relational model and
all
> you need is a relational processor, should you store the data as XML
or
> should you shred it. And my answer today is: Just shred it.

Follow-Ups:
- Re: [xml-dev] XML-enabled databases, XQuery APIs
  - From: Michael Champion <michaelc.champion@gmail.com>

Prev by Date: RE: [xml-dev] XML-enabled databases, XQuery APIs
Next by Date: RE: [xml-dev] XML-enabled databases, XQuery APIs
Previous by thread: RE: [xml-dev] XML-enabled databases, XQuery APIs
Next by thread: Re: [xml-dev] XML-enabled databases, XQuery APIs
Index(es):
- Date
- Thread