Re: [xml-dev] Comprehensive list of issues with dereferencing a link?Str

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] Comprehensive list of issues with dereferencing a link?Stress-testing link crawlers

From: Thomas Passin <list1@tompassin.net>
To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
Date: Fri, 26 Jun 2015 15:49:58 -0600

> What else?

The resource's server returns one or another error other than 404.

TomP

On 6/26/2015 3:07 PM, Costello, Roger L. wrote:

Hi Folks,

There are links in external entities.

There are links in XInclude elements.

The value of schemaLocation is a link.

The xsl:import and xsl:include elements use links.

Links are everywhere.

Obtaining the resource pointed to by a link is not necessarily trivial.

Suppose you are implementing a tool that follows links and fetches their resources (i.e., a link crawler). How will you stress-test your tool? What test cases must you consider?

Here is a list of test cases:

1. The link references a resource and it is immediately returned.

2. The link references a resource and its server returns a redirect [1]. The redirect is followed and then the resource is returned.

3. The link references a resource and its server returns a redirect. The redirect is followed and its server returns another redirect. And so forth. The redirects must be recursively followed until either (a) the resource is finally found or (b) an infinite loop in the redirects is detected.

4. The link references a resource and its server returns a 404 Resource Not Found.

5. Repeat 1-4 but with the link using HTTPS.

What else?

/Roger

Follow-Ups:
- Re: [xml-dev] Comprehensive list of issues with dereferencing a link? Stress-testing link crawlers
  - From: Michael Kay <mike@saxonica.com>

References:
- Comprehensive list of issues with dereferencing a link?Stress-testing link crawlers
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]