THTTPSCAN analyzes recursively HTML pages and reports all the links it finds to a text file: html, mail, jpg, mpeg, mp3, etc...
THttpScan extracts links through HTML pages in the neighborhood of the initial URL. The html links found are added in a download queue. THttpScan downloads each related page, extracts the links found, and so on...
- the LinkScan property limits the scanning to the initial site or the initial URL path,
- the LinkReport property lets report only links owned by the current site, or the links under the subfolders of the initial link.
- the DepthSearchLevel property allows to limit the level of pages scanned, starting from the initial page, especially when the scanning is not limited to a web site.
By using the LinkScan and LinkReport properties combined with an high DephSearchLevel value, you can easily scan a whole site or only a subdirectory from a web site.
Events occur for each link found and each page read, returning URL, meta tags, document type, referrer, host name...
According to the line speed, thousands of links may be extract from a starting URL in a few minutes.
Most common parameters can be simply set from the Object Inspector.