Re: [xml-dev] "Maximize the ratio of content to markup" What's the under

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] "Maximize the ratio of content to markup" What's the underlying principle?

From: "bryan rasmussen" <rasmussen.bryan@gmail.com>
To: "Costello, Roger L." <costello@mitre.org>
Date: Wed, 5 Mar 2008 13:42:32 +0100

I will find the time to write what I think are the relevant principles
tonight after work.

Cheers,
Bryan Rasmussen

On Wed, Mar 5, 2008 at 1:40 PM, bryan rasmussen
<rasmussen.bryan@gmail.com> wrote:
> >
>  >  For example, this is not good design:
>  >
>  >  <div>
>  >     <div id="Main">
>  >         <p>Hello World</p>
>  >     </div>
>  >  </div>
>
>  Sure, but the technique of having a wrapping outer div is pretty
>  traditional given the need to make content viewable across multiple
>  browsers with various levels of support for various standards.
>
>  Second of all there are different levels of semantic markup. Search
>  engines sure don't look for high levels of semantic meaning (mainly
>  because there isn't such a widespread level of it) by which I mean
>  they don't look for XML.
>
>
>
>  >  The outer div is providing no benefit.  It can be more simply expressed
>  >  as:
>  >
>  >  <div id="Main">
>  >     <p>Hello World</p>
>  >  </div>
>  >
>  >  The later version provides a higher ratio of content to code (tags).
>  >  And from the quote above, search engines rank higher documents with a
>  >  higher ratio of content to code.
>
>  SVG. The content is the code.
>
>
>
>
>
>  >  What is the underlying principle?  Why do search engines prefer
>  >  documents with a higher ratio of content to markup?
>  Because they are free text search engines in an untrustable
>  environment, where they must use complicated techniques to find out
>  what the meaning of things are buy the text content of html pages.
>
>  Furthermore modern search engines seem to prefer a weighting of
>  linking relative to content and markup. So the original statement does
>  not apply.
>
> >  Can the principle be applied to XML data design?
>  >
>  Not really.
>
> >  For example,
>  >
>  >  This is not good design:
>  >
>  >  <Author>
>  >     <Name>Paul McCartney</Name>
>  >  </Author>
>  >
>  >  The Name element is providing no benefit.  It can be more simply
>  >  expressed as:
>  >
>  >  <Author>Paul McCartney</Author>
>  >
>  >  The later version provides a higher ratio of content to code (tags).
>  >
>
>  I have to say no.
>
>  I would argue this is not a good design:
>  <Book>
>  <Author>
>  <AuthorName>Paul McCartney</AuthorName>
>
>  </Author>
>  <BookName>Sir Paul wrote a book!?</BookName>
>  </Author>
>
>  but it is a design one sees  a lot which I think is actually based on
>  the needs of maintaining large XML Schema libraries.(personal opinion,
>  no one agrees)
>
>  The requirements for what makes good semantic markup, whether for
>  semantic markup aware search and what makes good free text search that
>  tries to use some semantic rules to figure out the level of importance
>  of particular documents are pretty different from each other.
>
>
>  >  What do you think?  Is there a principle of data design being
>  >  illustrated here?
>  No, there is are several principles of datamining in a large untrusted
>  hypermedia environment (where you can not trust the quality of the
>  data at all) being illustrated
>
> >  Can you articulate the principle?
>
>  Probably not without being fired given I have a large non-functioning
>  rendering server to deliver in two days and I should be working :(
>
>  Cheers,
>  Bryan Rasmussen
>

References:
- "Maximize the ratio of content to markup" What's the underlying principle?
  - From: "Costello, Roger L." <costello@mitre.org>
- Re: [xml-dev] "Maximize the ratio of content to markup" What's the underlying principle?
  - From: "bryan rasmussen" <rasmussen.bryan@gmail.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]