Ivan Kozik
|
28f5652404
|
Bump version
|
2015-08-12 05:29:44 +00:00 |
|
Ivan Kozik
|
ba823a34f8
|
Print ignores without doubling up backslashes
|
2015-08-12 05:26:11 +00:00 |
|
Ivan Kozik
|
668c03d5d2
|
Implement -i / --input-file, supporting both local input files and URLs
|
2015-08-12 05:24:09 +00:00 |
|
Ivan Kozik
|
9989eb5b70
|
Pretend to be Firefox 40; it's out tomorrow
|
2015-08-10 13:38:54 +00:00 |
|
Ivan Kozik
|
b7743e780a
|
Implement --ua= for setting the User-Agent
|
2015-08-10 13:38:00 +00:00 |
|
Ivan Kozik
|
ee4dbe162e
|
Implement --igon / --igoff
|
2015-08-10 13:23:43 +00:00 |
|
Ivan Kozik
|
bf080c7cb4
|
Implement --max-content-length=N for skipping large responses
|
2015-08-10 13:12:34 +00:00 |
|
Ivan Kozik
|
8b1791475d
|
Remove unused import
|
2015-08-10 13:00:37 +00:00 |
|
Ivan Kozik
|
dfd1e8cd47
|
singletumblr igset: explain
|
2015-08-10 11:51:33 +00:00 |
|
Ivan Kozik
|
1cb9331939
|
nosortedindex igset: add comment
|
2015-08-10 11:49:45 +00:00 |
|
Ivan Kozik
|
33cc3040ed
|
mediawiki igset: add comments
|
2015-08-10 11:48:53 +00:00 |
|
Ivan Kozik
|
40cae40dc5
|
blogs igset: comment more
|
2015-08-10 11:46:32 +00:00 |
|
Ivan Kozik
|
4e517e2994
|
blogs igset: remove ignores that are already covered by 'global'
|
2015-08-10 11:45:28 +00:00 |
|
Ivan Kozik
|
4d570d88bd
|
Add some comments to 'blogs' ignore set
|
2015-08-10 11:44:20 +00:00 |
|
Ivan Kozik
|
6f03c5137d
|
Move pixel.redditmedia.com from reddit to global ignore set
|
2015-08-10 11:42:03 +00:00 |
|
Ivan Kozik
|
e304c60586
|
Describe why various ignores are in the 'global' ignore set; add support for comments in ignore sets
|
2015-08-10 11:41:16 +00:00 |
|
Ivan Kozik
|
aa9b877843
|
Don't crash with "error: unrecognized arguments" if cwd contains space
Closes #32.
|
2015-08-02 03:51:37 +00:00 |
|
Ivan Kozik
|
9f071a706d
|
setup.py: specify minimum version for all dependencies
Specifically, this solves a problem where trollius is too old to have
ensure_future.
|
2015-08-02 01:47:03 +00:00 |
|
Ivan Kozik
|
e55fa13004
|
Make wpull write .cdx file (its impl does one .cdx covering all WARC files)
|
2015-07-31 23:55:27 +00:00 |
|
Ivan Kozik
|
19f6971261
|
dashboard: don't handle ctrl-f, alt-f, and other ctrl/alt- key combinations
|
2015-07-29 23:04:20 +00:00 |
|
Ivan Kozik
|
d72e4094d1
|
Bump version
|
2015-07-29 18:38:31 +00:00 |
|
Ivan Kozik
|
91ed7689a2
|
Remove unused local
|
2015-07-29 18:37:43 +00:00 |
|
Ivan Kozik
|
73d9c03e5e
|
Remove unused import
|
2015-07-29 18:35:46 +00:00 |
|
Ivan Kozik
|
4f437ae2d0
|
dashboard: remove mentions of ignore sets
|
2015-07-29 08:46:35 +00:00 |
|
Ivan Kozik
|
b806316cb1
|
Use built-in ignore sets; don't crash if invalid ignore set is specified
|
2015-07-29 08:36:36 +00:00 |
|
Ivan Kozik
|
22835a5ddc
|
igsets: global: don't exclude archive.org (that ignore made sense for ArchiveBot, which sent WARCs to IA)
|
2015-07-29 08:24:42 +00:00 |
|
Ivan Kozik
|
51d3b1f794
|
igsets: rm internetcentrum - it is long gone
|
2015-07-29 07:45:58 +00:00 |
|
Ivan Kozik
|
5276fec1a9
|
Convert JSON ignore sets to plain text to avoid the backslash doubling
|
2015-07-29 07:44:12 +00:00 |
|
Ivan Kozik
|
68f5fc0dd2
|
igsets: noonion: fix backslash
|
2015-07-29 07:40:20 +00:00 |
|
Ivan Kozik
|
e53f4465e5
|
db/ignore_patterns -> libgrabsite/ignore_sets
|
2015-07-29 07:37:19 +00:00 |
|
Ivan Kozik
|
85f7be1936
|
meta referrer: use content="no-referrer" instead of the obsolete content="never"
|
2015-07-29 05:36:46 +00:00 |
|
Ivan Kozik
|
e6f830764e
|
Allow changing concurrency using DIR/concurrency file
|
2015-07-28 14:21:28 +00:00 |
|
Ivan Kozik
|
47c9a20ba7
|
Bump version
|
2015-07-28 14:01:42 +00:00 |
|
Ivan Kozik
|
7ac5b07a99
|
Add --delay option
|
2015-07-28 13:57:42 +00:00 |
|
Ivan Kozik
|
3c28b53620
|
Allow changing delay (in milliseconds) using DIR/delay file
|
2015-07-28 13:44:51 +00:00 |
|
Ivan Kozik
|
4f5fb8f108
|
Print IGNOR messages more nicely in the console
|
2015-07-28 13:26:09 +00:00 |
|
Ivan Kozik
|
cae516eb5d
|
Cache these control files for 3 seconds to reduce stat calls: ignores, igsets, igoff, stop
|
2015-07-28 13:23:00 +00:00 |
|
Ivan Kozik
|
4b174ee94f
|
Remove unused imports
|
2015-07-28 12:52:42 +00:00 |
|
Ivan Kozik
|
4c84312462
|
Undo my camelCase mistake
|
2015-07-28 12:51:59 +00:00 |
|
Ivan Kozik
|
4eb2805df0
|
Format DUPE/OF messages more nicely in terminal
|
2015-07-28 12:33:59 +00:00 |
|
Ivan Kozik
|
37d1f2e473
|
directory name gen: don't try and fail to create directory with > 255 chars when given a long URL
|
2015-07-28 12:16:32 +00:00 |
|
Ivan Kozik
|
a82e4017fe
|
directory name gen: whitelist instead of blacklist characters
|
2015-07-28 12:12:35 +00:00 |
|
Ivan Kozik
|
2418ea04e8
|
dashboard: don't include '!ig ID' in the context menu regexp helper, since these are designed to be pasted into a DIR/ignores file
|
2015-07-28 12:01:20 +00:00 |
|
Ivan Kozik
|
0f1bdfd738
|
Don't spawn wpull in a subprocess, just import it and call its main()
|
2015-07-28 11:53:47 +00:00 |
|
Ivan Kozik
|
6bbe9fb3bb
|
Fix formatting
|
2015-07-28 11:27:50 +00:00 |
|
Ivan Kozik
|
e506d6a103
|
Add gs-dump-urls, a utility to dump URLs from a wpull.db file
|
2015-07-28 11:26:10 +00:00 |
|
Ivan Kozik
|
991718b2e2
|
hooks: better ws:// connect messages, slow down reconnects exponentially
|
2015-07-27 14:01:22 +00:00 |
|
Ivan Kozik
|
36f24b03b3
|
hooks: print which ws:// server it can't connect to
|
2015-07-27 13:45:04 +00:00 |
|
Ivan Kozik
|
d34c1c5f34
|
Fix formatting
|
2015-07-27 07:31:43 +00:00 |
|
Ivan Kozik
|
472edf5ebc
|
Put all temporary files in DIR/temp; don't let ctrl-c exit grab-site before wpull
|
2015-07-27 07:26:54 +00:00 |
|