OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] the challenges of 'extensible'

On 04/12/2014 06:16 PM, Michael Sokolov wrote:
Thanks for that, Simon

On 4/11/2014 9:16 AM, Simon St.Laurent wrote:
(Also, at this point I think most search engines are still treating
HTML as annotated text, though I hear rumors of DOM-building.)

I had heard they were doing at least some rendering in order to deal
with the problem of invisible text spam (text rendered in white or tiny
fonts or using some other trick to make it invisible to readers, and
intended primarily to deceive the search engine). Maybe that doesn't
require a DOM, but at the very least it requires knowing how to apply
CSS to the elements,
It's been a very long time since I talked about this with anyone who might know, but long ago the key was CSS selectors. There was some processing of the stylesheet to look for problems. Then the search engine watched for matches of those selectors as it read the documents. No detailed tree building, but tracing that flagged common problems.

and when you let Javascript in the mix, you pretty
well have to create a DOM to operate on.
Given what they'll be testing, yes, they'll likely need a classic DOM, like Phantom.js provides. Fortunately for search engine managers, the price of memory has declined a lot since that at least decade-old conversation.

Simon St.Laurent

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS