xml-dev - RE: [xml-dev] Is HTML structured or unstructured information?

RE: [xml-dev] Is HTML structured or unstructured information?

[ Lists Home | Date Index | Thread Index ]

To: 'Peter Hunsberger' <peter.hunsberger@gmail.com>
Subject: RE: [xml-dev] Is HTML structured or unstructured information?
From: "Bullard, Claude L (Len)" <len.bullard@intergraph.com>
Date: Wed, 10 Aug 2005 09:27:45 -0500
Cc: xml-dev@lists.xml.org

Good call.  Scalability and rate of uptake are different 
and have been confused often in articles defending the 
designs and architectures of the web.  Scalability 
and interoperability touch on the remarks I made in 
an earlier thread about the problem of polite inclusion 
of standards (playing the standards game for fame and 
profit and power).  I don't think we have design 
principles for the web or it's architecture that work 
if applied mindlessly/politically.  They inform the design, 
but are not designs or architectures.  When one tosses 
the nitro of 'we are the smart people because we 
built the web', we get some truly thought-numbing 
results, so I am leery of code-with-a-cause.

If the Principle of Least Power were applied strictly:

o DTDs are the preferred means for schema design.  RelaxNG 
  is next.  XML Schema is dead last.

o REST is preferred to Web Services.  Dumb web pages with 
  embedded links are preferred to both.

o Windows is preferred to MAC but a CT220 is better than 
  both.  A pencil and paper are better than all of these 
  if you add a xerox machine and some mail stamps.

o Fortran is better than Java.  Java is better than C++.  
  Assembly language is too powerful for anyone and the 
  reach isn't good but how else will you kick start a 
  compilable program?  Oh I forget, hex switches on the 
  front panel are superior to all of these.

and so on.

Better to look at the job at hand, the budget at hand, and 
the likely consequences of the product then pick the tools 
and design accordingly.  It is worthless to buy a Masserati 
to drive two-lane roads with strictly enforced speed limits.

If we can steer this back to the topic, how would one rate 
tools for turning unstructured information (whatever that 
means (It varies by notation and content type))?  The UIMA 
contribution provides an architecture and some code.  

A rough metric I use is to ask myself, is it worth recoding that 
into another language? 

len

From: Peter Hunsberger [mailto:peter.hunsberger@gmail.com]

On 8/9/05, Bullard, Claude L (Len) <len.bullard@intergraph.com> wrote:

<snip/> 
> 
> >I'd like to believe that if you can find models (markup, DB, OO or
> >otherwise) that have wide applicability (and result in advantage for
> >the computer) you'll find that you have models that have a good chance
> >of being being widely accepted by the humans involved.  See below...
> 
> Wide applicability:  that's a good metric.  At the very least, it
> takes the audience/listener into account.  On the other hand, as
> noted below, when something is widely applicable, is it semantically
> strong, that is, very meaningful?  

I think the axis of precision and general understanding are
orthogonal.  That doesn't mean it's easy to discover the models that
capture high degrees of both.  Rather, it seems fiendishly difficult. 
One big hurdle is the amount of time that it takes for complex
knowledge to be generally accepted.  As a poor example, at one point
Einsteins relatively was considered near impossible for most people to
understand, now-a-days we've got string theory covering that
territory...

> (wandering off topic but maybe
> there is a measure of structure (however we define that) that
> can be applied to determine when a markup design is widely applicable).

If there is, it's going to be similar to the metrics used for
analysing code complexity: number of external references, degree of
separation between references, number of different terms, degree of
encapsulation, number of layers of inheritance/dependency, etc. 
Probably a PHD thesis or two hiding in that mess somewhere (though I'm
sure it's already been done)...

<snip/>

> Tell me who gets to name the names so we can
> get on with this" trope is recommended to markup professionals.
> In other words, can we ever really separate the politics of naming
> from the craft?   

Yes, that is the $64,000,000 (inflation adjusted) question.  Given the
complexity and opaqueness of many of the XML "standards" I think we're
a long way from having anything like trusted experts in the field for
the most part.

> >Almost forgot to answer your question: if a good organic model needs
> >"fixing" then it wasn't that good in the first place; too much assumed
> >knowledge. So, IOW, I'd vote no...
> 
> Interesting POV.  The problem is, good for whom (see last para)?

Good for everybody (he, he)..

> HTML and XML demonstrate something I find fascinating:  scalability
> is inversely proportional to semantic load.  

I don't think it's scalability, I think it's rate of uptake.  That's
common sense: make things easy to understand and many people will be
able to use them.   That doesn't necessarily give us scalability, for
that you need good interoperability. If anything, the 1000's of
competing XML standards demonstrate that at an
semantic/Ontological/common understanding level we have not even
scratched the surface of scalability.

> The more it means, the
> less useful it is for the greatest number.  That is somewhat the
> Principle of Least Power, so we have to be very careful how we
> apply some principles.   Things of general utility tend to be
> few because one doesn't need many, so differentiation becomes
> cosmetic.  Thus, branding.

Great, now we'll get the Nike and Rebock business interchange
languages to add to the mix...

Prev by Date: RE: [xml-dev] Is Web 2.0 the new XML?
Next by Date: RE: [xml-dev] Is Web 2.0 the new XML?
Previous by thread: Re: [xml-dev] Is HTML structured or unstructured information?
Next by thread: RE: [xml-dev] Is HTML structured or unstructured information?
Index(es):
- Date
- Thread