RE: [xml-dev] HTML5 and almost no namespaces

Its "trivial" yes, but its not "right" IMHO :)

Nor is it necessarily efficient.

I wouldn't bet a case of beer that for a large value of attribute x that

points = fn:tokenize( $x , "[ ,]")

is more efficient then for a node x with point children

points = $x/point

I can imagine in some processors for some size of $x one or the other is more efficient.

But is that a reason to make the design decision for a (potentially) widely used standard schema ?

This is a serious question, not rhetorical.

----------------------------------------

David A. Lee

dlee@calldei.com

http://www.xmlsh.org

From: Kurt Cagle [mailto:kurt.cagle@gmail.com]
Sent: Friday, June 03, 2011 11:45 AM
To: David Lee
Cc: Michael Sokolov; Andrew Welch; John Cowan; Pete Cordell; Mukul Gandhi; stephengreenubl@gmail.com; Jesper Tverskov; xml-dev@lists.xml.org
Subject: Re: [xml-dev] HTML5 and almost no namespaces

David,

I brought up the very question of point set optimization with the SVG working group when the SVG 1.0 spec was still in development. Adobe was essentially calling the shots at that point with the only real working implementation, and they found that for their processing parsing lists of points was preferable to querying an XML document with sets of nodes. In retrospect, they were probably right - even in XQuery, retrieving point lists is relatively trivial.

Kurt Cagle

Managing Editor, XMLToday.org

kurt.cagle@gmail.com

443-837-8725

On Fri, Jun 3, 2011 at 9:09 AM, David Lee <dlee@calldei.com> wrote:

Agree 50% . Certianly you can optimize a tagset for a particular processor.

But does that mean you *should* ?

Once you go down the route of optimizing your XML for a particular processor
all sorts of tricks become useful.
For example MarkLogic works best on lots of small documents instead of very
large ones, so for optimization I split up my 500MB XML file into about a
million small ones. Other processors have other tricks needed to get them
to work optimally.

My personal opinion is that shouldn't dictate the source schema design. But
rather be a post-processing phase optimized for a particular processor.
Micro-designing XML schema for optimization on one processor can eventually
bite you... say when you change processors or they come out with new
performance characteristics in V(n+1).

A good non-processor-specific example is SVG.
I just started using SVG this month as an experiment and am 'horrified' that
it 'abuses' attributes to represent lists of points.
A single graph might have a hundred thousand points stored in a single
attribute value !
While I wasnt there when it was invented, I can guess that this was done
with the eye to compactness/optimization with the assumption that small is
better.
i.e.

<svg:polyline points="1 0,2 120.46,3 97.95,4 104.97,5 124.5,6 97.81,7
97.94,8 92.37,9 100.15,10 99.2,11 ....
1000000 bytes later
...
"/>

This is certainly more *compact* then

<svg:polyline>
<p x="1" y="0"/>
....
1000000 bytes later
</svg:polyline>

But is it *better* ? I actually found an article about EXI discussing this
exact issue

http://www.svgopen.org/2010/papers/3-Compressing_SVG_with_EXI/index.html

I find this a good example to demonstrate the woes of prematurely optimizing
source data formats for assumption of performance.

And consequently I propose that in general one should not do that. But
rather design an XML schema for clarity not performance on a particular
version of a particular processor (or imagined one in the case above).

You can *usually* post-process data to be optimized for your current
processor at the point of injest rather than make the world suffer with
predictive optimization.

(by "usually" I mean there are always exceptions. No statement is always
right, even this one)

-David

----------------------------------------
David A. Lee
dlee@calldei.com
http://www.xmlsh.org

-----Original Message-----
From: Michael Sokolov [mailto:sokolov@ifactory.com]
Sent: Friday, June 03, 2011 8:36 AM
To: David Lee
Cc: Andrew Welch; John Cowan; Pete Cordell; Mukul Gandhi;
stephengreenubl@gmail.com; Jesper Tverskov; xml-dev@lists.xml.org
Subject: Re: [xml-dev] HTML5 and almost no namespaces

On 6/2/2011 10:22 PM, David Lee wrote:
> I do ( use MarkLogic )
> And it appears to work perfectly fine using context sensitive duplicate
names
> It's true that if you want to fine tune fragmentation or create special
range indexes it bites you but overall I've had no problems
>
>
> Sent from my iPad (excuse the terseness)
That's ok David - after all, brevity is the soul of wit, as the bard put
it. Still it is the case that MarkLogic's built-in term indexes (not
the range ones) are based on element (and attribute) names, and although
there are also contextual (parent/child) indexes, you will not get best
performance there if you rely on context sensitivity; eg queries for
//name can be resolved straight out of the indexes accurately and don't
require additional filtering, wheras //person/name and //place/name
require (some) extra processing. For example, to get an accurate count
there, ML has to filter every possible result returned by the indexes.
ML is spiffy and does this really fast, so you usually don't notice, but
if you have 1M docs and want to know exactly how many have a person name
"Lee", you really will notice the difference.

I'm not trying to run down MarkLogic - it's a great system for XML work;
merely pointing out that in some cases practical considerations that
have little to do with semantic correctness may inform the design of
your tag set.

-Mike

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php