Ivan Kozik
|
17c1b9caaa
|
Update default user agent
|
2015-09-25 20:32:08 +00:00 |
|
Ivan Kozik
|
f1548521ec
|
Write URLs skipped by --max-content-length= to DIR/skipped_max_content_length
|
2015-09-02 19:15:00 +00:00 |
|
Ivan Kozik
|
3def2a79bc
|
Fix for 32-bit machines: don't crash on startup with lmdb.MemoryError
lmdb.MemoryError: [...]/dupes_db: Cannot allocate memory
|
2015-09-02 19:04:56 +00:00 |
|
Ivan Kozik
|
e0ad2e9a25
|
Bump version
|
2015-08-28 04:29:24 +00:00 |
|
Ivan Kozik
|
5a04a38f59
|
Prevent twitter crawls from endlessly downloading [\?&]nav=
Credit to garyrh
|
2015-08-28 04:28:19 +00:00 |
|
Ivan Kozik
|
c28c593a83
|
Explain imdb ignore set
|
2015-08-21 08:35:57 +00:00 |
|
Ivan Kozik
|
b782c23389
|
Add --no-video option to skip the download of videos
|
2015-08-21 08:28:27 +00:00 |
|
Arkiver2
|
6f6754f81e
|
Add --warc-max-size=BYTES option for controlling WARC size
|
2015-08-21 07:47:12 +00:00 |
|
Ivan Kozik
|
ee2684941d
|
Add support for passing multiple URLs to grab-site
|
2015-08-21 07:18:31 +00:00 |
|
Ivan Kozik
|
524cdf2cec
|
Fix blogspot search? ignore
|
2015-08-21 05:35:30 +00:00 |
|
Ivan Kozik
|
f379264ed1
|
Don't crash on --igsets=blogs even though it's gone
|
2015-08-21 05:31:51 +00:00 |
|
Ivan Kozik
|
6a6dff0083
|
Add comment to reddit ignore set
|
2015-08-21 05:18:35 +00:00 |
|
Ivan Kozik
|
4b50c6de67
|
Migrate all other ignores from blogs to the global set
|
2015-08-21 05:16:26 +00:00 |
|
Ivan Kozik
|
954ab31acb
|
Remove ignores that probably only wget needed
|
2015-08-21 04:48:47 +00:00 |
|
Ivan Kozik
|
b83bce1f2a
|
Migrate some ignores from blogs to global set
|
2015-08-21 04:48:19 +00:00 |
|
Ivan Kozik
|
1b8b4b0077
|
Ignore per-post and per-comment Atom feeds on blogspot.com
|
2015-08-21 04:23:54 +00:00 |
|
Ivan Kozik
|
291b3e939b
|
Fix: --offsite-links should be on by default
|
2015-08-13 12:29:11 +00:00 |
|
Ivan Kozik
|
a3f1ff7ed9
|
Cache control files for just 1.5 sec instead of 3 sec
|
2015-08-12 08:52:56 +00:00 |
|
Ivan Kozik
|
050dbc44d8
|
Fix very recent regression: report the pattern instead of the regexp
|
2015-08-12 08:51:41 +00:00 |
|
Ivan Kozik
|
b8b6248aab
|
Remove no-longer-needed workaround in ignore sets
|
2015-08-12 07:55:08 +00:00 |
|
Ivan Kozik
|
1d52a28fac
|
Increase size of compiled regexp cache; remove unused code
|
2015-08-12 07:52:24 +00:00 |
|
Ivan Kozik
|
26c7ea84d8
|
Implement --wpull-args for passing additional arguments to wpull
|
2015-08-12 06:39:49 +00:00 |
|
Ivan Kozik
|
1674751b1c
|
Don't crash if DIR/concurrency is set to 0
|
2015-08-12 05:57:56 +00:00 |
|
Ivan Kozik
|
28f5652404
|
Bump version
|
2015-08-12 05:29:44 +00:00 |
|
Ivan Kozik
|
ba823a34f8
|
Print ignores without doubling up backslashes
|
2015-08-12 05:26:11 +00:00 |
|
Ivan Kozik
|
668c03d5d2
|
Implement -i / --input-file, supporting both local input files and URLs
|
2015-08-12 05:24:09 +00:00 |
|
Ivan Kozik
|
9989eb5b70
|
Pretend to be Firefox 40; it's out tomorrow
|
2015-08-10 13:38:54 +00:00 |
|
Ivan Kozik
|
b7743e780a
|
Implement --ua= for setting the User-Agent
|
2015-08-10 13:38:00 +00:00 |
|
Ivan Kozik
|
ee4dbe162e
|
Implement --igon / --igoff
|
2015-08-10 13:23:43 +00:00 |
|
Ivan Kozik
|
bf080c7cb4
|
Implement --max-content-length=N for skipping large responses
|
2015-08-10 13:12:34 +00:00 |
|
Ivan Kozik
|
8b1791475d
|
Remove unused import
|
2015-08-10 13:00:37 +00:00 |
|
Ivan Kozik
|
dfd1e8cd47
|
singletumblr igset: explain
|
2015-08-10 11:51:33 +00:00 |
|
Ivan Kozik
|
1cb9331939
|
nosortedindex igset: add comment
|
2015-08-10 11:49:45 +00:00 |
|
Ivan Kozik
|
33cc3040ed
|
mediawiki igset: add comments
|
2015-08-10 11:48:53 +00:00 |
|
Ivan Kozik
|
40cae40dc5
|
blogs igset: comment more
|
2015-08-10 11:46:32 +00:00 |
|
Ivan Kozik
|
4e517e2994
|
blogs igset: remove ignores that are already covered by 'global'
|
2015-08-10 11:45:28 +00:00 |
|
Ivan Kozik
|
4d570d88bd
|
Add some comments to 'blogs' ignore set
|
2015-08-10 11:44:20 +00:00 |
|
Ivan Kozik
|
6f03c5137d
|
Move pixel.redditmedia.com from reddit to global ignore set
|
2015-08-10 11:42:03 +00:00 |
|
Ivan Kozik
|
e304c60586
|
Describe why various ignores are in the 'global' ignore set; add support for comments in ignore sets
|
2015-08-10 11:41:16 +00:00 |
|
Ivan Kozik
|
aa9b877843
|
Don't crash with "error: unrecognized arguments" if cwd contains space
Closes #32.
|
2015-08-02 03:51:37 +00:00 |
|
Ivan Kozik
|
9f071a706d
|
setup.py: specify minimum version for all dependencies
Specifically, this solves a problem where trollius is too old to have
ensure_future.
|
2015-08-02 01:47:03 +00:00 |
|
Ivan Kozik
|
e55fa13004
|
Make wpull write .cdx file (its impl does one .cdx covering all WARC files)
|
2015-07-31 23:55:27 +00:00 |
|
Ivan Kozik
|
19f6971261
|
dashboard: don't handle ctrl-f, alt-f, and other ctrl/alt- key combinations
|
2015-07-29 23:04:20 +00:00 |
|
Ivan Kozik
|
d72e4094d1
|
Bump version
|
2015-07-29 18:38:31 +00:00 |
|
Ivan Kozik
|
91ed7689a2
|
Remove unused local
|
2015-07-29 18:37:43 +00:00 |
|
Ivan Kozik
|
73d9c03e5e
|
Remove unused import
|
2015-07-29 18:35:46 +00:00 |
|
Ivan Kozik
|
4f437ae2d0
|
dashboard: remove mentions of ignore sets
|
2015-07-29 08:46:35 +00:00 |
|
Ivan Kozik
|
b806316cb1
|
Use built-in ignore sets; don't crash if invalid ignore set is specified
|
2015-07-29 08:36:36 +00:00 |
|
Ivan Kozik
|
22835a5ddc
|
igsets: global: don't exclude archive.org (that ignore made sense for ArchiveBot, which sent WARCs to IA)
|
2015-07-29 08:24:42 +00:00 |
|
Ivan Kozik
|
51d3b1f794
|
igsets: rm internetcentrum - it is long gone
|
2015-07-29 07:45:58 +00:00 |
|