Re: [xml-dev] Comprehensive list of issues with dereferencing a link?Stress-testing link crawlers
Hi Folks,
There are links in external entities.
There are links in XInclude elements.
The value of schemaLocation is a link.
The xsl:import and xsl:include elements use links.
Links are everywhere.
Obtaining the resource pointed to by a link is not necessarily trivial.
Suppose you are implementing a tool that follows links and fetches their resources (i.e., a link crawler). How will you stress-test your tool? What test cases must you consider?
Here is a list of test cases:
1. The link references a resource and it is immediately returned.
2. The link references a resource and its server returns a redirect [1]. The redirect is followed and then the resource is returned.
3. The link references a resource and its server returns a redirect. The redirect is followed and its server returns another redirect. And so forth. The redirects must be recursively followed until either (a) the resource is finally found or (b) an infinite loop in the redirects is detected.
4. The link references a resource and its server returns a 404 Resource Not Found.
5. Repeat 1-4 but with the link using HTTPS.
What else?
/Roger