Daniel Oaks
|
95012d1e0c
|
gs-server: Use env instead of py3 directly, makes virtualenvs nicer
|
2016-02-16 17:24:20 +00:00 |
Ivan Kozik
|
637929ab76
|
First take on converting grab-site to a Python program
|
2015-07-27 06:32:08 +00:00 |
Ivan Kozik
|
1fce3af4a0
|
Add --1 option for turning off recursion; document options
|
2015-07-20 08:23:35 +00:00 |
Ivan Kozik
|
0e38441234
|
Add OS X support
|
2015-07-20 06:35:32 +00:00 |
Ivan Kozik
|
877b170fde
|
Move some code around
|
2015-07-18 11:57:34 +00:00 |
Ivan Kozik
|
318fb3c03d
|
Set --max-redirect 8 like ArchiveBot
|
2015-07-18 11:52:31 +00:00 |
Ivan Kozik
|
63e8f1813b
|
Add --page-requisites-level= and --concurrency= options; use default concurrency of 2
|
2015-07-18 11:45:11 +00:00 |
Ivan Kozik
|
5331f4c9fe
|
Require 400MB disk free
|
2015-07-18 11:32:38 +00:00 |
Ivan Kozik
|
f848b28810
|
Use --level inf by default; add --level option
|
2015-07-18 11:32:28 +00:00 |
Ivan Kozik
|
43d8a9594f
|
Move everything and make grab-site installable with pip3
|
2015-07-18 10:39:04 +00:00 |
Ivan Kozik
|
8a8ea70a7d
|
On ctrl-c, touch 'stop' file instead of letting wpull handle it, so that server gets notified of stop
|
2015-07-18 09:21:14 +00:00 |
Ivan Kozik
|
f4f445b7dd
|
igoff by default
|
2015-07-18 08:23:56 +00:00 |
Ivan Kozik
|
f1100e7223
|
Try to send stdout/stderr to dashboard and fail at it
|
2015-07-18 05:24:54 +00:00 |
Ivan Kozik
|
e804f7171e
|
Show job URLs on dashboard
|
2015-07-18 04:14:50 +00:00 |
Ivan Kozik
|
18a192739b
|
Make WebSocket client/server sort of work; rename ignore_sets to igsets
|
2015-07-18 02:11:18 +00:00 |
Ivan Kozik
|
db21e530e2
|
Generate a grab id and put in the dir name; add some temporary print debugging
|
2015-07-18 01:06:56 +00:00 |
Ivan Kozik
|
8353827a02
|
Update UA
|
2015-07-17 23:45:18 +00:00 |
Ivan Kozik
|
62cba3a0e7
|
Update UA
|
2015-05-19 20:52:29 +00:00 |
Ivan Kozik
|
2d9b1395f1
|
Put the date into the DIR name and WARC name
|
2015-05-19 18:13:03 +00:00 |
Ivan Kozik
|
66d22b1556
|
Update UA
|
2015-04-08 20:42:10 +00:00 |
Ivan Kozik
|
c83c89b0cf
|
Remove --no-skip-getaddrinfo to match ArchiveBot
|
2015-04-07 07:26:54 +00:00 |
Ivan Kozik
|
2983615d50
|
Send Accept-Language to avoid 500 Internal Server Error when sending Firefox UA to reddit.com
|
2015-03-30 03:07:23 +00:00 |
Ivan Kozik
|
b785e9b1b7
|
Pause the crawl when running low on disk or memory
|
2015-03-09 06:02:24 +00:00 |
Ivan Kozik
|
cbaccd9e02
|
Avoid creating directories with ? or & in the filename, which breaks
sqlalchemy when it tries to parse arguments from the filename.
Fixes https://github.com/ludios/grab-site/issues/1
|
2015-03-09 05:16:54 +00:00 |
Ivan Kozik
|
85f02d2055
|
Include path/query components in directory name
|
2015-02-23 03:03:15 +00:00 |
Ivan Kozik
|
65e096a035
|
Support --ignore-sets= instead of the space-separated version
|
2015-02-05 06:05:54 +00:00 |
Ivan Kozik
|
2ccb8b4d6f
|
Add support for --no-offsite-links
|
2015-02-05 05:15:46 +00:00 |
Ivan Kozik
|
52d0acc3b5
|
Another html5lib comment
|
2015-02-05 05:08:03 +00:00 |
Ivan Kozik
|
64d027da2c
|
Fix comment
|
2015-02-05 05:07:05 +00:00 |
Ivan Kozik
|
6f8ef82efb
|
Rename script
|
2015-02-05 05:05:53 +00:00 |