201 Commits

Author SHA1 Message Date
Ivan Kozik
b3d84e505c Add ?host= URL param for testing dashboard.html locally 2015-07-17 23:13:51 +00:00
David Yip
c1f12411ca Show remaining queue length in dashboard. #96.
Also show total number of queued and downloaded items as a title
attribute.
2015-07-17 23:13:51 +00:00
Ivan Kozik
de4ea44e9c Link to the new docs 2015-07-17 23:13:51 +00:00
Ivan Kozik
b261297c1d Remove link to dashboard2 2015-07-17 23:13:51 +00:00
Ivan Kozik
ddafec8c27 Fix ws:// and /logs/recent URLs 2015-07-17 23:13:51 +00:00
Ivan Kozik
fa4f193af9 Import the first revision of dashboard 2.0 in the ArchiveBot repo 2015-07-17 23:13:41 +00:00
Ivan Kozik
5229ddf5dc Start work on websocket server for future dashboard integration 2015-07-17 22:42:25 +00:00
Ivan Kozik
03d1efc2ce Clarify argument order requirement 2015-07-17 03:59:42 +00:00
Ivan Kozik
62cba3a0e7 Update UA 2015-05-19 20:52:29 +00:00
Ivan Kozik
2d9b1395f1 Put the date into the DIR name and WARC name 2015-05-19 18:13:03 +00:00
Ivan Kozik
66d22b1556 Update UA 2015-04-08 20:42:10 +00:00
Ivan Kozik
c83c89b0cf Remove --no-skip-getaddrinfo to match ArchiveBot 2015-04-07 07:26:54 +00:00
Ivan Kozik
2983615d50 Send Accept-Language to avoid 500 Internal Server Error when sending Firefox UA to reddit.com 2015-03-30 03:07:23 +00:00
Ivan Kozik
8bf1b00c46 Copy in latest dupespotter 2015-03-14 20:36:16 +00:00
Ivan Kozik
040afe0d92 Copy in latest dupespotter 2015-03-14 19:35:55 +00:00
Ivan Kozik
9fbe0bdb6f Copy in latest dupespotter 2015-03-11 22:54:35 +00:00
Ivan Kozik
d14d8135e6 Copy in latest dupespotter 2015-03-11 20:22:58 +00:00
Ivan Kozik
c140f1caf0 Copy in latest dupespotter 2015-03-09 07:35:55 +00:00
Ivan Kozik
b785e9b1b7 Pause the crawl when running low on disk or memory 2015-03-09 06:02:24 +00:00
Ivan Kozik
cbaccd9e02 Avoid creating directories with ? or & in the filename, which breaks
sqlalchemy when it tries to parse arguments from the filename.

Fixes https://github.com/ludios/grab-site/issues/1
2015-03-09 05:16:54 +00:00
Ivan Kozik
f80df6944f Describe arguments more 2015-03-09 05:06:44 +00:00
Ivan Kozik
611a0be845 Cleanup 2015-03-09 04:53:38 +00:00
Ivan Kozik
820e2aeef4 Mention WARC files; clarify 2015-03-09 04:52:18 +00:00
Ivan Kozik
a1cbcb9ea9 Describe what this is 2015-03-09 04:48:27 +00:00
Ivan Kozik
3fe4774c2c Copy in latest dupespotter 2015-03-04 04:37:39 +00:00
Ivan Kozik
1f9f80dff0 Copy in latest dupespotter 2015-03-04 04:23:06 +00:00
Ivan Kozik
fbbfa3c0b4 Copy in latest dupespotter 2015-03-01 23:43:53 +00:00
Ivan Kozik
5b2b68061d Copy in latest dupespotter 2015-02-24 05:03:45 +00:00
Ivan Kozik
85f02d2055 Include path/query components in directory name 2015-02-23 03:03:15 +00:00
Ivan Kozik
62866f5336 Copy in latest dupespotter 2015-02-17 01:58:56 +00:00
Ivan Kozik
ccaee25497 Link to global ignore set 2015-02-05 19:32:47 +00:00
Ivan Kozik
e2118bbea4 Clarify 2015-02-05 19:31:50 +00:00
Ivan Kozik
4a22b4d593 Tell user to install git as well 2015-02-05 19:27:19 +00:00
Ivan Kozik
65e096a035 Support --ignore-sets= instead of the space-separated version 2015-02-05 06:05:54 +00:00
Ivan Kozik
2d7125951f Link to pythex 2015-02-05 05:39:44 +00:00
Ivan Kozik
f815920a83 Document file formats 2015-02-05 05:37:34 +00:00
Ivan Kozik
d73ee5ba27 Make it real obvious 2015-02-05 05:34:49 +00:00
Ivan Kozik
2f7ae834bb Add ArchiveBot LICENSE 2015-02-05 05:22:18 +00:00
Ivan Kozik
0699689a14 Add igoff feature 2015-02-05 05:19:34 +00:00
Ivan Kozik
2ccb8b4d6f Add support for --no-offsite-links 2015-02-05 05:15:46 +00:00
Ivan Kozik
52d0acc3b5 Another html5lib comment 2015-02-05 05:08:03 +00:00
Ivan Kozik
64d027da2c Fix comment 2015-02-05 05:07:05 +00:00
Ivan Kozik
6f8ef82efb Rename script 2015-02-05 05:05:53 +00:00
Ivan Kozik
979b843458 Load changes from DIR/ignores and DIR/ignore_sets while the crawl is running 2015-02-05 04:59:28 +00:00
Ivan Kozik
5f7593fda2 Refactor 2015-02-05 04:39:52 +00:00
Ivan Kozik
429b2032ff Improve README 2015-02-05 04:27:38 +00:00
Ivan Kozik
1705174fb2 CRLF -> LF 2015-02-05 04:25:49 +00:00
Ivan Kozik
eea440422d Allow specifying --ignore-sets NAME1,NAME2,... 2015-02-05 04:24:05 +00:00
Ivan Kozik
a61ed949ca Use global ignore set and also ignore Icecast sites like ArchiveBot 2015-02-05 04:03:19 +00:00
Ivan Kozik
2986ae8a31 Use cookies.txt 2015-02-05 03:45:34 +00:00