How ProxyPy Crawls Your Site

ProxyPy’s crawl process starts with a list of webpage URLs. When ProxyPy visits these URLs, it saves hyperlinks from the page for further crawling. This list, also known as the "crawl frontier", is repeatedly visited according to a set of ProxyPy policies to effectively map a site for updates: content changes, new pages, and dead links.


How To Block ProxyPy From Crawling Your Site

A. robots.txt

Bots are crawling your web pages to help parse your site content, so the relevant information within your site is easily indexed and more readily available to users searching for the content you provide. Although most bots are harmless and even quite beneficial, you may still want to prevent them from crawling your site (please note, however, that not everyone on the web is using a bot to help index your site). The easiest and quickest way to do this is to use the robots.txt file. This text file contains instructions on how a bot should process your site data. Important: The robots.txt file must be placed in the top directory of the website host to which it applies. Otherwise, it will have no effect on the ProxyPy behavior. To stop ProxyPy from crawling your site, add the following rules to your robots.txt file: To block ProxyPy from crawling your site for a webgraph of links:
User-agent: ProxyPyBot
Disallow: /

Important details:

If you have subdomains, you need to place a robots.txt file on each subdomain. Otherwise, ProxyPy will not address any other file in your domain, and will consider that it is allowed to crawl everything on your subdomain. The robots.txt file must always return an HTTP 200 status code. If a 4xx status code is returned, ProxyPy will assume that no robots.txt exists and there are no crawl restrictions. Returning a 5xx status code for your robots.txt file will prevent ProxyPy from crawling your entire site. Our crawler can handle robots.txt files with a 3xx status code. Please note that it may take up to one hour or 100 requests for ProxyPy to discover changes made to your robots.txt. Do not try to block ProxyPy via IP as we do not use any consecutive IP blocks.

B. Submit online!

ProxyPy is a free&powerful online web proxy service. It acts as the middle person between you and the Internet.Allows users to surf the Internet anonymously as the online anonymizer hides the IP address of the user. Why it's important to you? Content across the Internet is regionally protected due to license issues and copyright concerns in most cases..

© 2020 ProxyPy.org

Powered by ProxyPy v4.0