663 Commits

Author SHA1 Message Date
Ivan Kozik
76ba117d34 Document DIR/max_content_length 2015-08-10 13:15:20 +00:00
Ivan Kozik
bf080c7cb4 Implement --max-content-length=N for skipping large responses 2015-08-10 13:12:34 +00:00
Ivan Kozik
8b1791475d Remove unused import 2015-08-10 13:00:37 +00:00
Ivan Kozik
dfd1e8cd47 singletumblr igset: explain 2015-08-10 11:51:33 +00:00
Ivan Kozik
1cb9331939 nosortedindex igset: add comment 2015-08-10 11:49:45 +00:00
Ivan Kozik
33cc3040ed mediawiki igset: add comments 2015-08-10 11:48:53 +00:00
Ivan Kozik
40cae40dc5 blogs igset: comment more 2015-08-10 11:46:32 +00:00
Ivan Kozik
4e517e2994 blogs igset: remove ignores that are already covered by 'global' 2015-08-10 11:45:28 +00:00
Ivan Kozik
4d570d88bd Add some comments to 'blogs' ignore set 2015-08-10 11:44:20 +00:00
Ivan Kozik
6f03c5137d Move pixel.redditmedia.com from reddit to global ignore set 2015-08-10 11:42:03 +00:00
Ivan Kozik
e304c60586 Describe why various ignores are in the 'global' ignore set; add support for comments in ignore sets 2015-08-10 11:41:16 +00:00
Ivan Kozik
aa9b877843 Don't crash with "error: unrecognized arguments" if cwd contains space
Closes #32.
2015-08-02 03:51:37 +00:00
Ivan Kozik
9f071a706d setup.py: specify minimum version for all dependencies
Specifically, this solves a problem where trollius is too old to have
ensure_future.
2015-08-02 01:47:03 +00:00
Ivan Kozik
e55fa13004 Make wpull write .cdx file (its impl does one .cdx covering all WARC files) 2015-07-31 23:55:27 +00:00
Ivan Kozik
e1bb1ec749 README: tweak 2015-07-31 22:47:58 +00:00
Ivan Kozik
ed869864d4 README: link to ArchiveBot 2015-07-31 03:52:42 +00:00
Ivan Kozik
6cd50f9688 README: tweak 2015-07-31 03:50:39 +00:00
Ivan Kozik
412ea7791f README: changes to ignores may take up to 3 seconds to apply 2015-07-30 23:36:12 +00:00
Ivan Kozik
19f6971261 dashboard: don't handle ctrl-f, alt-f, and other ctrl/alt- key combinations 2015-07-29 23:04:20 +00:00
Ivan Kozik
d72e4094d1 Bump version 2015-07-29 18:38:31 +00:00
Ivan Kozik
91ed7689a2 Remove unused local 2015-07-29 18:37:43 +00:00
Ivan Kozik
73d9c03e5e Remove unused import 2015-07-29 18:35:46 +00:00
Ivan Kozik
a418beaff8 README: tweak for the non-ArchiveBot audience 2015-07-29 08:55:06 +00:00
Ivan Kozik
4f437ae2d0 dashboard: remove mentions of ignore sets 2015-07-29 08:46:35 +00:00
Ivan Kozik
deb05d981d README: link to correct ignore sets 2015-07-29 08:45:25 +00:00
Ivan Kozik
b806316cb1 Use built-in ignore sets; don't crash if invalid ignore set is specified 2015-07-29 08:36:36 +00:00
Ivan Kozik
22835a5ddc igsets: global: don't exclude archive.org (that ignore made sense for ArchiveBot, which sent WARCs to IA) 2015-07-29 08:24:42 +00:00
Ivan Kozik
51d3b1f794 igsets: rm internetcentrum - it is long gone 2015-07-29 07:45:58 +00:00
Ivan Kozik
5276fec1a9 Convert JSON ignore sets to plain text to avoid the backslash doubling 2015-07-29 07:44:12 +00:00
Ivan Kozik
68f5fc0dd2 igsets: noonion: fix backslash 2015-07-29 07:40:20 +00:00
Ivan Kozik
4c0f60cf06 Don't try to install patched-wpull as it doesn't exist 2015-07-29 07:37:52 +00:00
Ivan Kozik
e53f4465e5 db/ignore_patterns -> libgrabsite/ignore_sets 2015-07-29 07:37:19 +00:00
Ivan Kozik
5e70cd4acc Remove questionable /(.*)/(\1/){3,} ignore 2015-07-29 07:33:51 +00:00
David Yip
62d1dbc0ad Revert "Temporarily ignore voat.co, as it is not responding"
This reverts commit f6fb34ad5b46cf730d5e07475b1c1fc73b3570a8.

voat.co is back up.
2015-07-29 07:33:51 +00:00
Ivan Kozik
04a6c18054 Add .kr TLD for blogspot 2015-07-29 07:33:51 +00:00
Ivan Kozik
a6f8d510c0 Ignore simple.reddit.com 2015-07-29 07:33:51 +00:00
Ivan Kozik
7c4c5e42cd Ignore /.mobile on reddit 2015-07-29 07:33:51 +00:00
Ivan Kozik
ed5fb60cce Temporarily ignore voat.co, as it is not responding
Please revert this when it comes back up
2015-07-29 07:33:51 +00:00
Ivan Kozik
8aae334c25 Ignore another streaming site 2015-07-29 07:33:51 +00:00
Ivan Kozik
4d2a496fbb Ignore another share link 2015-07-29 07:33:51 +00:00
Ivan Kozik
1d388ae969 Ignore another streaming site 2015-07-29 07:33:51 +00:00
Ivan Kozik
e014c48215 Fix filename 2015-07-29 07:33:51 +00:00
Ivan Kozik
45ec93cc1a Add noonion ignore set to ignore .onion sites 2015-07-29 07:33:51 +00:00
Ivan Kozik
7320865fd7 Ignore another share link 2015-07-29 07:33:51 +00:00
Ivan Kozik
f34ed18ce6 Ignore Yahoo beacon 2015-07-29 07:33:51 +00:00
Ivan Kozik
6cb0fa49f5 Ignore more ?sort= pages on reddit 2015-07-29 07:33:51 +00:00
Ivan Kozik
cec78653cb Ignore another share link 2015-07-29 07:33:51 +00:00
Ivan Kozik
9face53dba Ignore Special:Diff and Special:MobileDiff 2015-07-29 07:33:51 +00:00
Ivan Kozik
8110d41ac4 Ignore another Google Analytics endpoint 2015-07-29 07:33:51 +00:00
Start
3b86cb984e minor improvements 2015-07-29 07:33:51 +00:00