XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Comprehensive list of issues with dereferencing a link?Stress-testing link crawlers

Hi Folks,

There are links in external entities.

There are links in XInclude elements.

The value of schemaLocation is a link.

The xsl:import and xsl:include elements use links.

Links are everywhere.

Obtaining the resource pointed to by a link is not necessarily trivial.

Suppose you are implementing a tool that follows links and fetches their resources (i.e., a link crawler). How will you stress-test your tool? What test cases must you consider?

Here is a list of test cases:

1. The link references a resource and it is immediately returned.

2. The link references a resource and its server returns a redirect [1]. The redirect is followed and then the resource is returned.

3. The link references a resource and its server returns a redirect. The redirect is followed and its server returns another redirect. And so forth. The redirects must be recursively followed until either (a) the resource is finally found or (b) an infinite loop in the redirects is detected.

4. The link references a resource and its server returns a 404 Resource Not Found.

5. Repeat 1-4 but with the link using HTTPS.

What else?

/Roger

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS