74 Commits

Author SHA1 Message Date
Ivan Kozik
28f5652404 Bump version 2015-08-12 05:29:44 +00:00
Ivan Kozik
ba823a34f8 Print ignores without doubling up backslashes 2015-08-12 05:26:11 +00:00
Ivan Kozik
668c03d5d2 Implement -i / --input-file, supporting both local input files and URLs 2015-08-12 05:24:09 +00:00
Ivan Kozik
9989eb5b70 Pretend to be Firefox 40; it's out tomorrow 2015-08-10 13:38:54 +00:00
Ivan Kozik
b7743e780a Implement --ua= for setting the User-Agent 2015-08-10 13:38:00 +00:00
Ivan Kozik
ee4dbe162e Implement --igon / --igoff 2015-08-10 13:23:43 +00:00
Ivan Kozik
bf080c7cb4 Implement --max-content-length=N for skipping large responses 2015-08-10 13:12:34 +00:00
Ivan Kozik
8b1791475d Remove unused import 2015-08-10 13:00:37 +00:00
Ivan Kozik
dfd1e8cd47 singletumblr igset: explain 2015-08-10 11:51:33 +00:00
Ivan Kozik
1cb9331939 nosortedindex igset: add comment 2015-08-10 11:49:45 +00:00
Ivan Kozik
33cc3040ed mediawiki igset: add comments 2015-08-10 11:48:53 +00:00
Ivan Kozik
40cae40dc5 blogs igset: comment more 2015-08-10 11:46:32 +00:00
Ivan Kozik
4e517e2994 blogs igset: remove ignores that are already covered by 'global' 2015-08-10 11:45:28 +00:00
Ivan Kozik
4d570d88bd Add some comments to 'blogs' ignore set 2015-08-10 11:44:20 +00:00
Ivan Kozik
6f03c5137d Move pixel.redditmedia.com from reddit to global ignore set 2015-08-10 11:42:03 +00:00
Ivan Kozik
e304c60586 Describe why various ignores are in the 'global' ignore set; add support for comments in ignore sets 2015-08-10 11:41:16 +00:00
Ivan Kozik
aa9b877843 Don't crash with "error: unrecognized arguments" if cwd contains space
Closes #32.
2015-08-02 03:51:37 +00:00
Ivan Kozik
9f071a706d setup.py: specify minimum version for all dependencies
Specifically, this solves a problem where trollius is too old to have
ensure_future.
2015-08-02 01:47:03 +00:00
Ivan Kozik
e55fa13004 Make wpull write .cdx file (its impl does one .cdx covering all WARC files) 2015-07-31 23:55:27 +00:00
Ivan Kozik
19f6971261 dashboard: don't handle ctrl-f, alt-f, and other ctrl/alt- key combinations 2015-07-29 23:04:20 +00:00
Ivan Kozik
d72e4094d1 Bump version 2015-07-29 18:38:31 +00:00
Ivan Kozik
91ed7689a2 Remove unused local 2015-07-29 18:37:43 +00:00
Ivan Kozik
73d9c03e5e Remove unused import 2015-07-29 18:35:46 +00:00
Ivan Kozik
4f437ae2d0 dashboard: remove mentions of ignore sets 2015-07-29 08:46:35 +00:00
Ivan Kozik
b806316cb1 Use built-in ignore sets; don't crash if invalid ignore set is specified 2015-07-29 08:36:36 +00:00
Ivan Kozik
22835a5ddc igsets: global: don't exclude archive.org (that ignore made sense for ArchiveBot, which sent WARCs to IA) 2015-07-29 08:24:42 +00:00
Ivan Kozik
51d3b1f794 igsets: rm internetcentrum - it is long gone 2015-07-29 07:45:58 +00:00
Ivan Kozik
5276fec1a9 Convert JSON ignore sets to plain text to avoid the backslash doubling 2015-07-29 07:44:12 +00:00
Ivan Kozik
68f5fc0dd2 igsets: noonion: fix backslash 2015-07-29 07:40:20 +00:00
Ivan Kozik
e53f4465e5 db/ignore_patterns -> libgrabsite/ignore_sets 2015-07-29 07:37:19 +00:00
Ivan Kozik
85f7be1936 meta referrer: use content="no-referrer" instead of the obsolete content="never" 2015-07-29 05:36:46 +00:00
Ivan Kozik
e6f830764e Allow changing concurrency using DIR/concurrency file 2015-07-28 14:21:28 +00:00
Ivan Kozik
47c9a20ba7 Bump version 2015-07-28 14:01:42 +00:00
Ivan Kozik
7ac5b07a99 Add --delay option 2015-07-28 13:57:42 +00:00
Ivan Kozik
3c28b53620 Allow changing delay (in milliseconds) using DIR/delay file 2015-07-28 13:44:51 +00:00
Ivan Kozik
4f5fb8f108 Print IGNOR messages more nicely in the console 2015-07-28 13:26:09 +00:00
Ivan Kozik
cae516eb5d Cache these control files for 3 seconds to reduce stat calls: ignores, igsets, igoff, stop 2015-07-28 13:23:00 +00:00
Ivan Kozik
4b174ee94f Remove unused imports 2015-07-28 12:52:42 +00:00
Ivan Kozik
4c84312462 Undo my camelCase mistake 2015-07-28 12:51:59 +00:00
Ivan Kozik
4eb2805df0 Format DUPE/OF messages more nicely in terminal 2015-07-28 12:33:59 +00:00
Ivan Kozik
37d1f2e473 directory name gen: don't try and fail to create directory with > 255 chars when given a long URL 2015-07-28 12:16:32 +00:00
Ivan Kozik
a82e4017fe directory name gen: whitelist instead of blacklist characters 2015-07-28 12:12:35 +00:00
Ivan Kozik
2418ea04e8 dashboard: don't include '!ig ID' in the context menu regexp helper, since these are designed to be pasted into a DIR/ignores file 2015-07-28 12:01:20 +00:00
Ivan Kozik
0f1bdfd738 Don't spawn wpull in a subprocess, just import it and call its main() 2015-07-28 11:53:47 +00:00
Ivan Kozik
6bbe9fb3bb Fix formatting 2015-07-28 11:27:50 +00:00
Ivan Kozik
e506d6a103 Add gs-dump-urls, a utility to dump URLs from a wpull.db file 2015-07-28 11:26:10 +00:00
Ivan Kozik
991718b2e2 hooks: better ws:// connect messages, slow down reconnects exponentially 2015-07-27 14:01:22 +00:00
Ivan Kozik
36f24b03b3 hooks: print which ws:// server it can't connect to 2015-07-27 13:45:04 +00:00
Ivan Kozik
d34c1c5f34 Fix formatting 2015-07-27 07:31:43 +00:00
Ivan Kozik
472edf5ebc Put all temporary files in DIR/temp; don't let ctrl-c exit grab-site before wpull 2015-07-27 07:26:54 +00:00