Ivan Kozik
|
fe2530e667
|
global igset: ignore amp%3Bamp%3Bamp%3B loops
|
2016-04-22 03:55:09 +00:00 |
|
Ivan Kozik
|
01ac84da06
|
global igset: tumblr serves 16px avatars on https now as well
|
2016-04-04 18:38:59 +00:00 |
|
Ivan Kozik
|
ffecfcabda
|
global igset: Ignore instapaper share links
|
2016-03-30 17:11:03 +00:00 |
|
Ivan Kozik
|
316db6eec4
|
grab-site 0.11
|
2016-02-25 01:08:17 +00:00 |
|
Ivan Kozik
|
506a7604ef
|
Rename --which-wpull-args-full to --which-wpull-command
|
2016-02-21 04:49:53 +00:00 |
|
Ivan Kozik
|
5805e4c155
|
Implement --which-wpull-args-partial and --which-wpull-args-full for figuring out which wpull arguments grab-site would use, without actually starting wpull
|
2016-02-21 04:33:14 +00:00 |
|
Ivan Kozik
|
bda4d8cf6d
|
Pass maybe_log_ignore and print_to_terminal as globals to custom_hooks.py as well
|
2016-02-21 00:53:03 +00:00 |
|
Ivan Kozik
|
c37b32bd1c
|
Implement --custom-hooks so that users can modify wpull_hook
|
2016-02-21 00:23:18 +00:00 |
|
Ivan Kozik
|
292682a48f
|
Bump version
|
2016-02-16 17:25:11 +00:00 |
|
Ivan Kozik
|
7c1afbefe0
|
Use wpull 1.2.3
|
2016-02-05 19:15:25 +00:00 |
|
Ivan Kozik
|
ef5137ae86
|
Update UA
|
2016-02-01 16:19:49 +00:00 |
|
Ivan Kozik
|
7ec2f90534
|
global igset: Ignore /CSI/CSI/ loops on blogspot
|
2016-01-12 22:44:57 +00:00 |
|
Ivan Kozik
|
0214558d5e
|
global igset: ignore bogus /search/label/CSI/ links on blogspot
|
2016-01-12 03:20:44 +00:00 |
|
Ivan Kozik
|
3b9f8c1a4c
|
global igset: ignore /CaptchaImage.axd
|
2016-01-09 21:46:35 +00:00 |
|
Ivan Kozik
|
dff87eba2f
|
global igset: also ignore www.digg.com/submit
|
2016-01-09 02:19:18 +00:00 |
|
Ivan Kozik
|
01bf6d527b
|
Bump version
|
2016-01-05 01:08:14 +00:00 |
|
Ivan Kozik
|
111ffca643
|
global igset: ignore two more loops
|
2016-01-03 02:00:26 +00:00 |
|
Ivan Kozik
|
5f14263070
|
global igset: ignore livejournal.com/identity/login.bml
|
2015-12-30 05:35:57 +00:00 |
|
Ivan Kozik
|
2acc826d56
|
lstrip '-' to avoid creating filenames that must be --'ed or quoted
|
2015-12-17 16:54:55 +00:00 |
|
Ivan Kozik
|
4ea80eec80
|
global igset: Ignore a loop on archive.org
|
2015-12-16 13:11:16 +00:00 |
|
Ivan Kozik
|
38f733f9d2
|
global igset: ignore /wp-admin/
|
2015-12-16 11:00:54 +00:00 |
|
Ivan Kozik
|
adb35ee4e3
|
Add --id=, --dir=, and --finished-warc-dir= options
|
2015-12-12 19:52:32 +00:00 |
|
Ivan Kozik
|
f8fece9ebb
|
Set NCR=1 cookie for .blogspot.com to avoid getting redirected
|
2015-12-12 18:29:53 +00:00 |
|
Ivan Kozik
|
86e92d684c
|
Send an over18=1 cookie to reddit.com to avoid the age gate on many subreddits
|
2015-12-12 16:49:42 +00:00 |
|
Ivan Kozik
|
6a647f637e
|
If using Python 3.4.0, depend on an older version of aiohttp that works on Python 3.4.0
|
2015-12-12 10:08:02 +00:00 |
|
Ivan Kozik
|
8a6eaea16c
|
Bump Firefox version in UA string
|
2015-12-04 11:45:28 +00:00 |
|
Ivan Kozik
|
58a2711058
|
Use Roboto font if installed
|
2015-11-30 05:16:29 +00:00 |
|
Ivan Kozik
|
e72c5fc3a7
|
Don't crash if psutil is not available on non-Windows OS (it is no longer installed by wpull 1.2.2)
|
2015-11-21 19:54:13 +00:00 |
|
Ivan Kozik
|
ec9f7bdb43
|
setup.py: if GRAB_SITE_NO_CCHARDET env var set, don't require cchardet; wpull will fall back on chardet
|
2015-10-28 16:11:23 +00:00 |
|
Ivan Kozik
|
1bb8bcc4d8
|
global igset: also ignore recaptcha /mailhide/d links
|
2015-10-23 00:41:41 +00:00 |
|
Ivan Kozik
|
40ca80638d
|
global igset: Ignore /impixu on a new tumblr domain
|
2015-10-22 15:10:58 +00:00 |
|
Ivan Kozik
|
4487c43c83
|
Use wpull>=1.2.2
|
2015-10-21 22:38:47 +00:00 |
|
Ivan Kozik
|
b3c433a60b
|
Fix: new Click gives us () instead of None when no start_url's are given
|
2015-10-03 22:53:20 +00:00 |
|
Ivan Kozik
|
7a63a3dcd1
|
Add --no-dupespotter for turning off dupespotter which sometimes has false positives
|
2015-09-30 22:16:56 +00:00 |
|
Ivan Kozik
|
17c1b9caaa
|
Update default user agent
|
2015-09-25 20:32:08 +00:00 |
|
Ivan Kozik
|
f1548521ec
|
Write URLs skipped by --max-content-length= to DIR/skipped_max_content_length
|
2015-09-02 19:15:00 +00:00 |
|
Ivan Kozik
|
3def2a79bc
|
Fix for 32-bit machines: don't crash on startup with lmdb.MemoryError
lmdb.MemoryError: [...]/dupes_db: Cannot allocate memory
|
2015-09-02 19:04:56 +00:00 |
|
Ivan Kozik
|
e0ad2e9a25
|
Bump version
|
2015-08-28 04:29:24 +00:00 |
|
Ivan Kozik
|
b782c23389
|
Add --no-video option to skip the download of videos
|
2015-08-21 08:28:27 +00:00 |
|
Arkiver2
|
6f6754f81e
|
Add --warc-max-size=BYTES option for controlling WARC size
|
2015-08-21 07:47:12 +00:00 |
|
Ivan Kozik
|
ee2684941d
|
Add support for passing multiple URLs to grab-site
|
2015-08-21 07:18:31 +00:00 |
|
Ivan Kozik
|
1b8b4b0077
|
Ignore per-post and per-comment Atom feeds on blogspot.com
|
2015-08-21 04:23:54 +00:00 |
|
Ivan Kozik
|
291b3e939b
|
Fix: --offsite-links should be on by default
|
2015-08-13 12:29:11 +00:00 |
|
Ivan Kozik
|
a3f1ff7ed9
|
Cache control files for just 1.5 sec instead of 3 sec
|
2015-08-12 08:52:56 +00:00 |
|
Ivan Kozik
|
1d52a28fac
|
Increase size of compiled regexp cache; remove unused code
|
2015-08-12 07:52:24 +00:00 |
|
Ivan Kozik
|
26c7ea84d8
|
Implement --wpull-args for passing additional arguments to wpull
|
2015-08-12 06:39:49 +00:00 |
|
Ivan Kozik
|
1674751b1c
|
Don't crash if DIR/concurrency is set to 0
|
2015-08-12 05:57:56 +00:00 |
|
Ivan Kozik
|
28f5652404
|
Bump version
|
2015-08-12 05:29:44 +00:00 |
|
Ivan Kozik
|
bf080c7cb4
|
Implement --max-content-length=N for skipping large responses
|
2015-08-10 13:12:34 +00:00 |
|
Ivan Kozik
|
e304c60586
|
Describe why various ignores are in the 'global' ignore set; add support for comments in ignore sets
|
2015-08-10 11:41:16 +00:00 |
|