[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] "Maximize the ratio of content to markup" What's the underlying principle?
- From: "bryan rasmussen" <rasmussen.bryan@gmail.com>
- To: "Costello, Roger L." <costello@mitre.org>
- Date: Wed, 5 Mar 2008 13:42:32 +0100
I will find the time to write what I think are the relevant principles
tonight after work.
Cheers,
Bryan Rasmussen
On Wed, Mar 5, 2008 at 1:40 PM, bryan rasmussen
<rasmussen.bryan@gmail.com> wrote:
> >
> > For example, this is not good design:
> >
> > <div>
> > <div id="Main">
> > <p>Hello World</p>
> > </div>
> > </div>
>
> Sure, but the technique of having a wrapping outer div is pretty
> traditional given the need to make content viewable across multiple
> browsers with various levels of support for various standards.
>
> Second of all there are different levels of semantic markup. Search
> engines sure don't look for high levels of semantic meaning (mainly
> because there isn't such a widespread level of it) by which I mean
> they don't look for XML.
>
>
>
> > The outer div is providing no benefit. It can be more simply expressed
> > as:
> >
> > <div id="Main">
> > <p>Hello World</p>
> > </div>
> >
> > The later version provides a higher ratio of content to code (tags).
> > And from the quote above, search engines rank higher documents with a
> > higher ratio of content to code.
>
> SVG. The content is the code.
>
>
>
>
>
> > What is the underlying principle? Why do search engines prefer
> > documents with a higher ratio of content to markup?
> Because they are free text search engines in an untrustable
> environment, where they must use complicated techniques to find out
> what the meaning of things are buy the text content of html pages.
>
> Furthermore modern search engines seem to prefer a weighting of
> linking relative to content and markup. So the original statement does
> not apply.
>
> > Can the principle be applied to XML data design?
> >
> Not really.
>
> > For example,
> >
> > This is not good design:
> >
> > <Author>
> > <Name>Paul McCartney</Name>
> > </Author>
> >
> > The Name element is providing no benefit. It can be more simply
> > expressed as:
> >
> > <Author>Paul McCartney</Author>
> >
> > The later version provides a higher ratio of content to code (tags).
> >
>
> I have to say no.
>
> I would argue this is not a good design:
> <Book>
> <Author>
> <AuthorName>Paul McCartney</AuthorName>
>
> </Author>
> <BookName>Sir Paul wrote a book!?</BookName>
> </Author>
>
> but it is a design one sees a lot which I think is actually based on
> the needs of maintaining large XML Schema libraries.(personal opinion,
> no one agrees)
>
> The requirements for what makes good semantic markup, whether for
> semantic markup aware search and what makes good free text search that
> tries to use some semantic rules to figure out the level of importance
> of particular documents are pretty different from each other.
>
>
> > What do you think? Is there a principle of data design being
> > illustrated here?
> No, there is are several principles of datamining in a large untrusted
> hypermedia environment (where you can not trust the quality of the
> data at all) being illustrated
>
> > Can you articulate the principle?
>
> Probably not without being fired given I have a large non-functioning
> rendering server to deliver in two days and I should be working :(
>
> Cheers,
> Bryan Rasmussen
>
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]