README: Fix GCE name
This commit is contained in:
parent
08a90865a2
commit
6189c7c124
@ -262,7 +262,7 @@ If you pay no attention to your crawls, a crawl may head down some infinite bot
|
||||
|
||||
grab-site does not respect `robots.txt` files, because they frequently [whitelist only approved robots](https://github.com/robots.txt), [hide embarassing news stories](https://web.archive.org/web/20140401024610/http://www.thecrimson.com/robots.txt), or block image or stylesheet resources needed for proper archival. [See also](http://www.archiveteam.org/index.php?title=Robots.txt). Because of this, very rarely you might run into a robot honeypot and receive an abuse@ complaint.
|
||||
|
||||
Do not run grab-site from Google Cloud Engine; as happened to me, your entire API project will probably get nuked after a few days of crawling the web, with no recourse. Good alternatives include OVH (sold under [OVH](https://www.ovh.com/us/dedicated-servers/), [So You Start](http://www.soyoustart.com/us/essential-servers/), and [Kimsufi](http://www.kimsufi.com/us/en/index.xml)) and online.net (with [dedicated](https://www.online.net/en/dedicated-server) or [puny ARM server](https://www.scaleway.com/) offerings).
|
||||
Do not run grab-site on GCE (Google Compute Engine); as happened to me, your entire API project will probably get nuked after a few days of crawling the web, with no recourse. Good alternatives include OVH (sold under [OVH](https://www.ovh.com/us/dedicated-servers/), [So You Start](http://www.soyoustart.com/us/essential-servers/), and [Kimsufi](http://www.kimsufi.com/us/en/index.xml)) and online.net (with [dedicated](https://www.online.net/en/dedicated-server) or [puny ARM server](https://www.scaleway.com/) offerings).
|
||||
|
||||
### Tips for specific websites
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user