RE: [xml-dev] HTML5 and almost no namespaces

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "David Lee" <dlee@calldei.com>
To: "'Michael Sokolov'" <sokolov@ifactory.com>
Date: Fri, 3 Jun 2011 09:09:59 -0400

Agree 50% .  Certianly you can optimize a tagset for a particular processor.

But does that mean you *should* ? 

Once you go down the route of optimizing your XML for a particular processor
all sorts of tricks become useful.
For example MarkLogic works best on lots of small documents instead of very
large ones, so for optimization I split up my 500MB XML file into about a
million small ones.    Other processors have other tricks needed to get them
to work optimally.

My personal opinion is that shouldn't dictate the source schema design.  But
rather be a post-processing phase optimized for a particular processor.
Micro-designing XML schema for optimization on one processor can eventually
bite you... say when you change processors or they come out with new
performance characteristics in V(n+1).

A good non-processor-specific example is SVG.
I just started using SVG this month as an experiment and am 'horrified' that
it 'abuses' attributes to represent lists of points.
A single graph might have a hundred thousand points stored in a single
attribute value !
While I wasnt there when it was invented, I can guess that this was done
with the eye to compactness/optimization with the assumption that small is
better.
i.e. 

<svg:polyline points="1 0,2 120.46,3 97.95,4 104.97,5 124.5,6 97.81,7
97.94,8 92.37,9 100.15,10 99.2,11 ....
1000000  bytes later 
...
"/>

This is certainly more *compact* then 

<svg:polyline>
     <p x="1" y="0"/> 
.... 
1000000  bytes later
</svg:polyline>

But is it *better* ?   I actually found an article about EXI discussing this
exact issue

http://www.svgopen.org/2010/papers/3-Compressing_SVG_with_EXI/index.html

I find this a good example to demonstrate the woes of prematurely optimizing
source data formats for assumption of performance.

And consequently I propose that in general one should not do that.  But
rather design an XML schema for clarity not performance on a particular
version of a particular processor (or imagined one in the case above).

You can *usually* post-process data to be optimized for your current
processor at the point of injest rather than make the world suffer with
predictive optimization.

(by "usually" I mean there are always exceptions.  No statement is always
right, even this one)

-David

----------------------------------------
David A. Lee
dlee@calldei.com
http://www.xmlsh.org

-----Original Message-----
From: Michael Sokolov [mailto:sokolov@ifactory.com] 
Sent: Friday, June 03, 2011 8:36 AM
To: David Lee
Cc: Andrew Welch; John Cowan; Pete Cordell; Mukul Gandhi;
stephengreenubl@gmail.com; Jesper Tverskov; xml-dev@lists.xml.org
Subject: Re: [xml-dev] HTML5 and almost no namespaces

On 6/2/2011 10:22 PM, David Lee wrote:
> I do ( use MarkLogic )
> And it appears to work perfectly fine using context sensitive duplicate
names
> It's true that if you want to fine tune fragmentation or create special
range indexes it bites you but overall I've had no problems
>
>
> Sent from my iPad (excuse the terseness)
That's ok David - after all, brevity is the soul of wit, as the bard put 
it.  Still it is the case that MarkLogic's built-in term indexes (not 
the range ones) are based on element (and attribute) names, and although 
there are also contextual (parent/child) indexes, you will not get best 
performance there if you rely on context sensitivity; eg queries for 
//name can be resolved straight out of the indexes accurately and don't 
require additional filtering, wheras //person/name and //place/name 
require (some) extra processing.  For example, to get an accurate count 
there, ML has to filter every possible result returned by the indexes.  
ML is spiffy and does this really fast, so you usually don't notice, but 
if you have 1M docs and want to know exactly how many have a person name 
"Lee", you really will notice the difference.

I'm not trying to run down MarkLogic - it's a great system for XML work; 
merely pointing out that in some cases practical considerations that 
have little to do with semantic correctness may inform the design of 
your tag set.

-Mike

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

Follow-Ups:
- Re: [xml-dev] HTML5 and almost no namespaces
  - From: Kurt Cagle <kurt.cagle@gmail.com>
- Re: [xml-dev] HTML5 and almost no namespaces
  - From: Michael Sokolov <sokolov@ifactory.com>

References:
- HTML5 and almost no namespaces
  - From: Jesper Tverskov <jesper.tverskov@gmail.com>
- Re: [xml-dev] HTML5 and almost no namespaces
  - From: Stephen D Green <stephengreenubl@gmail.com>
- Re: [xml-dev] HTML5 and almost no namespaces
  - From: Mukul Gandhi <gandhi.mukul@gmail.com>
- Re: [xml-dev] HTML5 and almost no namespaces
  - From: "Pete Cordell" <petexmldev@codalogic.com>
- Re: [xml-dev] HTML5 and almost no namespaces
  - From: John Cowan <cowan@mercury.ccil.org>
- Re: [xml-dev] HTML5 and almost no namespaces
  - From: Andrew Welch <andrew.j.welch@gmail.com>
- RE: [xml-dev] HTML5 and almost no namespaces
  - From: "David Lee" <dlee@calldei.com>
- Re: [xml-dev] HTML5 and almost no namespaces
  - From: Michael Sokolov <sokolov@ifactory.com>
- Re: [xml-dev] HTML5 and almost no namespaces
  - From: David Lee <dlee@calldei.com>
- Re: [xml-dev] HTML5 and almost no namespaces
  - From: Michael Sokolov <sokolov@ifactory.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]