259 Commits

Author SHA1 Message Date
Ivan Kozik
2e7d928614 Update README 2015-07-27 06:50:48 +00:00
Ivan Kozik
84b183ec84 Write proper --help text and use aliased inputs too 2015-07-27 06:44:55 +00:00
Ivan Kozik
637929ab76 First take on converting grab-site to a Python program 2015-07-27 06:32:08 +00:00
Ivan Kozik
915ed0eeae Use cchardet for faster encoding detection (imported by wpull/thirdparty/dammit.py) 2015-07-21 03:25:56 +00:00
Ivan Kozik
8d2acd669a README: minor tweaks 2015-07-20 09:53:13 +00:00
Ivan Kozik
a7f2ee7684 Document webarchiveplayer for viewing your WARCs 2015-07-20 09:47:22 +00:00
Ivan Kozik
5e85e00201 README: document --concurrency= 2015-07-20 09:30:51 +00:00
Ivan Kozik
c35b388677 Allow archiving archive.org content despite it being in the global ignore set 2015-07-20 09:02:19 +00:00
Ivan Kozik
08933f60e2 Clarify ?host= dashboard option 2015-07-20 08:50:47 +00:00
Ivan Kozik
35d6d780bd Bump version 2015-07-20 08:37:46 +00:00
Ivan Kozik
3f78e5f4bf README: use an <h3> 2015-07-20 08:30:57 +00:00
Ivan Kozik
58b560257a README: improve docs for options 2015-07-20 08:29:37 +00:00
Ivan Kozik
9af02f122b Unbreak README 2015-07-20 08:25:33 +00:00
Ivan Kozik
1fce3af4a0 Add --1 option for turning off recursion; document options 2015-07-20 08:23:35 +00:00
Ivan Kozik
e83375382d README: there are control files in DIR too 2015-07-20 08:04:14 +00:00
Ivan Kozik
210c3d03b5 README: include suggestions from @ethus3h (thanks!) and wrap long lines 2015-07-20 07:50:49 +00:00
Ivan Kozik
c7a272d7ba Document how to fix your PATH for grab-site 2015-07-20 07:25:06 +00:00
Ivan Kozik
0e38441234 Add OS X support 2015-07-20 06:35:32 +00:00
Ivan Kozik
cd893cb1e3 Keeping your crawling problems in perspective
Spanish scriptorium? (Madrid, Biblioteca de San Lorenzo de El Escorial, 14th century).

Credit: https://medievalfragments.wordpress.com/2013/11/05/where-are-the-scriptoria/
2015-07-20 03:43:33 +00:00
Ivan Kozik
6c6c3197e7 Accept more exit codes from wpull as clean exit 2015-07-19 22:11:01 +00:00
Ivan Kozik
a95ee28c8d Tell user where the output files are 2015-07-19 20:47:39 +00:00
Ivan Kozik
a5cc1d84c6 Bump version 2015-07-19 20:44:11 +00:00
Ivan Kozik
7566de05e3 Mark finished jobs as finished on dashboard 2015-07-19 20:42:29 +00:00
Ivan Kozik
8e2e1c5f58 Camelcase 2015-07-19 20:20:56 +00:00
Ivan Kozik
3ffed7dfbb Tell people to use GitHub issues 2015-07-19 20:15:23 +00:00
Ivan Kozik
a66b970bfb Enable faulthandler 2015-07-18 21:32:27 +00:00
Ivan Kozik
227052371e Allow only grabbers to announce download/stdout/stderr/ignore 2015-07-18 13:35:11 +00:00
Ivan Kozik
fe659e21a6 Don't allow setting mode more than once 2015-07-18 13:32:26 +00:00
Ivan Kozik
6a866ad530 Don't assume WebSocket clients are dashboards by default; announce user agents 2015-07-18 13:29:19 +00:00
Ivan Kozik
55e3507122 Tweak README 2015-07-18 12:09:51 +00:00
Ivan Kozik
9f872f4fae Recommend starting gs-server first 2015-07-18 12:06:00 +00:00
Ivan Kozik
877b170fde Move some code around 2015-07-18 11:57:34 +00:00
Ivan Kozik
318fb3c03d Set --max-redirect 8 like ArchiveBot 2015-07-18 11:52:31 +00:00
Ivan Kozik
63e8f1813b Add --page-requisites-level= and --concurrency= options; use default concurrency of 2 2015-07-18 11:45:11 +00:00
Ivan Kozik
5331f4c9fe Require 400MB disk free 2015-07-18 11:32:38 +00:00
Ivan Kozik
f848b28810 Use --level inf by default; add --level option 2015-07-18 11:32:28 +00:00
Ivan Kozik
210baaa156 Tweak README 2015-07-18 11:25:00 +00:00
Ivan Kozik
b1d5f677b0 Link to raw.githubusercontent.com for screenshot 2015-07-18 11:22:28 +00:00
Ivan Kozik
bec8615d46 Add dashboard screenshot 2015-07-18 11:19:07 +00:00
Ivan Kozik
5da054a837 Report concurrency level 2015-07-18 11:18:50 +00:00
Ivan Kozik
d5d2d49f5f chmod +x 2015-07-18 11:02:24 +00:00
Ivan Kozik
d3715fe888 Dup -> Dupe 2015-07-18 11:00:33 +00:00
Ivan Kozik
2222aafa74 Spaces -> tabs 2015-07-18 11:00:08 +00:00
Ivan Kozik
e42e33d82f My* -> Grabber* 2015-07-18 10:57:57 +00:00
Ivan Kozik
47940fd09e Explain how to stop a crawl 2015-07-18 10:51:17 +00:00
Ivan Kozik
dc7fe9ed06 Update install and usage instructions 2015-07-18 10:41:24 +00:00
Ivan Kozik
43d8a9594f Move everything and make grab-site installable with pip3 2015-07-18 10:39:04 +00:00
Ivan Kozik
1266cf6c97 Fix typo 2015-07-18 10:02:10 +00:00
Ivan Kozik
bcd29c1837 Mention duplicate page detection 2015-07-18 10:01:25 +00:00
Ivan Kozik
4aeb715c0f Mention ignore sets 2015-07-18 09:58:17 +00:00