Spider

The spider is a tool than is used to automatically discover new resources (URLs) on a particular Site. It begins with a list of URLs to visit, called the seeds, which depends on how the Spider is started. The Spider then visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit and the process continues recursively as long as new resources are found.

There are 4 methods of starting the Spider, differentiated by the seed list with which it starts:

More details can be found below, in the "Accessed via" section

During the processing of an URL, the Spider makes a request to fetch the resource and then parses the response, identifying hyperlinks. It currently has the following behavior when processing types of responses:

HTML

Processes the specific tags, identifying links to new resources:

Robots.txt file

If set in the Options Spider screen, it also analyzes the 'Robots.txt' file and tries to identify new resources using the specified rules. It has to be mentioned that the Spider does not follow the rules specified in the 'Robots.txt' file.

Non-HTML Text Response

Text responses are parsed scanning for the URL pattern

Non-Text response

Currently, the Spider does not process this type of resources.

Other aspects

The spider is configured using the Spider Options screen.

Accessed via

     Spider tab
     Sites tab 'Attack -> Spider Site' right click menu item 'Attack -> Spider URL' right click menu item 'Attack -> Spider Subtree' right click menu item 'Attack -> Spider all in Scope' right click menu item 'Attack -> Spider all in Context...' right click menu item

See also

     UI Overview for an overview of the user interface
     Features provided by ZAP
     Spider Options screen for an overview of the Spider Options