XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] "Maximize the ratio of content to markup" What's the underlying principle?


> The later version provides a higher ratio of content to code (tags).

some semantically rich markup has a ratio of 0.....

http://www.openmath.org/cd/arith1.xhtml#lcm


<OMOBJ xmlns="http://www.openmath.org/OpenMath"; version="2.0" cdbase="http://www.openmath.org/cd";>
  <OMA>
    <OMS cd="relation1" name="eq"/>
    <OMA>
      <OMS cd="arith1" name="lcm"/>
      <OMV name="a"/>
      <OMV name="b"/>
    </OMA>
    <OMA>
      <OMS cd="arith1" name="divide"/>
      <OMA>
        <OMS cd="arith1" name="times"/>
	<OMV name="a"/>
	<OMV name="b"/>
      </OMA>
      <OMA>
        <OMS cd="arith1" name="gcd"/>
        <OMV name="a"/>
        <OMV name="b"/>
      </OMA>
    </OMA>
  </OMA>
</OMOBJ>


 > Why do search engines prefer documents with a higher ratio of content
 > to markup?

A search engine can get a long way by ignoring the markup and just
indexing the text, if it is there. In something like the Openmath
example above, the search engine needs to know what the elements mean
otherwise it can't really index anything useful. On the other hand
a search engine that does understand OpenMath could get a lot more
reliable information from the XML than from the plain text version
 lcm(a,b) = a*b/gcd(a,b)


> What do you think?  Is there a principle of data design being
> illustrated here?  Can you articulate the principle?

No general principle other than that markup is needed when it's needed,
and not needed when it's not.

David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS