Commit Graph

30 Commits (master)

Author SHA1 Message Date
Daniel Oaks 95012d1e0c gs-server: Use env instead of py3 directly, makes virtualenvs nicer 2016-02-16 17:24:20 +00:00
Ivan Kozik 637929ab76 First take on converting grab-site to a Python program 2015-07-27 06:32:08 +00:00
Ivan Kozik 1fce3af4a0 Add --1 option for turning off recursion; document options 2015-07-20 08:23:35 +00:00
Ivan Kozik 0e38441234 Add OS X support 2015-07-20 06:35:32 +00:00
Ivan Kozik 877b170fde Move some code around 2015-07-18 11:57:34 +00:00
Ivan Kozik 318fb3c03d Set --max-redirect 8 like ArchiveBot 2015-07-18 11:52:31 +00:00
Ivan Kozik 63e8f1813b Add --page-requisites-level= and --concurrency= options; use default concurrency of 2 2015-07-18 11:45:11 +00:00
Ivan Kozik 5331f4c9fe Require 400MB disk free 2015-07-18 11:32:38 +00:00
Ivan Kozik f848b28810 Use --level inf by default; add --level option 2015-07-18 11:32:28 +00:00
Ivan Kozik 43d8a9594f Move everything and make grab-site installable with pip3 2015-07-18 10:39:04 +00:00
Ivan Kozik 8a8ea70a7d On ctrl-c, touch 'stop' file instead of letting wpull handle it, so that server gets notified of stop 2015-07-18 09:21:14 +00:00
Ivan Kozik f4f445b7dd igoff by default 2015-07-18 08:23:56 +00:00
Ivan Kozik f1100e7223 Try to send stdout/stderr to dashboard and fail at it 2015-07-18 05:24:54 +00:00
Ivan Kozik e804f7171e Show job URLs on dashboard 2015-07-18 04:14:50 +00:00
Ivan Kozik 18a192739b Make WebSocket client/server sort of work; rename ignore_sets to igsets 2015-07-18 02:11:18 +00:00
Ivan Kozik db21e530e2 Generate a grab id and put in the dir name; add some temporary print debugging 2015-07-18 01:06:56 +00:00
Ivan Kozik 8353827a02 Update UA 2015-07-17 23:45:18 +00:00
Ivan Kozik 62cba3a0e7 Update UA 2015-05-19 20:52:29 +00:00
Ivan Kozik 2d9b1395f1 Put the date into the DIR name and WARC name 2015-05-19 18:13:03 +00:00
Ivan Kozik 66d22b1556 Update UA 2015-04-08 20:42:10 +00:00
Ivan Kozik c83c89b0cf Remove --no-skip-getaddrinfo to match ArchiveBot 2015-04-07 07:26:54 +00:00
Ivan Kozik 2983615d50 Send Accept-Language to avoid 500 Internal Server Error when sending Firefox UA to reddit.com 2015-03-30 03:07:23 +00:00
Ivan Kozik b785e9b1b7 Pause the crawl when running low on disk or memory 2015-03-09 06:02:24 +00:00
Ivan Kozik cbaccd9e02 Avoid creating directories with ? or & in the filename, which breaks
sqlalchemy when it tries to parse arguments from the filename.

Fixes https://github.com/ludios/grab-site/issues/1
2015-03-09 05:16:54 +00:00
Ivan Kozik 85f02d2055 Include path/query components in directory name 2015-02-23 03:03:15 +00:00
Ivan Kozik 65e096a035 Support --ignore-sets= instead of the space-separated version 2015-02-05 06:05:54 +00:00
Ivan Kozik 2ccb8b4d6f Add support for --no-offsite-links 2015-02-05 05:15:46 +00:00
Ivan Kozik 52d0acc3b5 Another html5lib comment 2015-02-05 05:08:03 +00:00
Ivan Kozik 64d027da2c Fix comment 2015-02-05 05:07:05 +00:00
Ivan Kozik 6f8ef82efb Rename script 2015-02-05 05:05:53 +00:00