XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Comprehensive list of issues with dereferencing a link?Stress-testing link crawlers

I don't see an important case here: Spider trap:

https://en.wikipedia.org/wiki/Spider_trap



On Fri, Jun 26, 2015 at 2:07 PM, Costello, Roger L. <costello@mitre.org> wrote:
> Hi Folks,
>
> There are links in external entities.
>
> There are links in XInclude elements.
>
> The value of schemaLocation is a link.
>
> The xsl:import and xsl:include elements use links.
>
> Links are everywhere.
>
> Obtaining the resource pointed to by a link is not necessarily trivial.
>
> Suppose you are implementing a tool that follows links and fetches their resources (i.e., a link crawler). How will you stress-test your tool? What test cases must you consider?
>
> Here is a list of test cases:
>
> 1. The link references a resource and it is immediately returned.
>
> 2. The link references a resource and its server returns a redirect [1]. The redirect is followed and then the resource is returned.
>
> 3. The link references a resource and its server returns a redirect. The redirect is followed and its server returns another redirect. And so forth. The redirects must be recursively followed until either (a) the resource is finally found or (b) an infinite loop in the redirects is detected.
>
> 4. The link references a resource and its server returns a 404 Resource Not Found.
>
> 5. Repeat 1-4 but with the link using HTTPS.
>
> What else?
>
> /Roger
>
> [1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php



-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
To achieve the impossible dream, try going to sleep.
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
-------------------------------------
Sanity is madness put to good use.
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS