84 Commits

Author SHA1 Message Date
Ivan Kozik
febee9c85e global igset: Add /%3Ca%20href= pattern 2016-05-29 14:00:15 +00:00
Ivan Kozik
fb6e01caa7 global igset: Add /%20https?:/ pattern 2016-05-27 14:03:28 +00:00
Ivan Kozik
aa366bbb27 Update grab-site URL in setup.py and dashboard 2016-05-27 13:53:35 +00:00
Ivan Kozik
842fab4b23 Stop listening on legacy ws port 29001 2016-05-22 11:06:38 +00:00
Ivan Kozik
f0bb696dc8 Actually install the favicon.ico 2016-05-22 11:01:53 +00:00
Ivan Kozik
b3c75b0ffb dashboard: Add a favicon 2016-05-22 10:59:32 +00:00
Ivan Kozik
7df1761bf0 dashboard: Allow for another digit in the MB stat 2016-05-22 10:22:11 +00:00
Ivan Kozik
8d93776742 dashboard: Align the req/s stat properly 2016-05-22 10:18:20 +00:00
Ivan Kozik
38877106ef Don't raise an exception if client lacks User-Agent 2016-05-03 07:16:48 +00:00
Ivan Kozik
fe2530e667 global igset: ignore amp%3Bamp%3Bamp%3B loops 2016-04-22 03:55:09 +00:00
Ivan Kozik
01ac84da06 global igset: tumblr serves 16px avatars on https now as well 2016-04-04 18:38:59 +00:00
Ivan Kozik
ffecfcabda global igset: Ignore instapaper share links 2016-03-30 17:11:03 +00:00
Ivan Kozik
316db6eec4 grab-site 0.11 2016-02-25 01:08:17 +00:00
Ivan Kozik
506a7604ef Rename --which-wpull-args-full to --which-wpull-command 2016-02-21 04:49:53 +00:00
Ivan Kozik
5805e4c155 Implement --which-wpull-args-partial and --which-wpull-args-full for figuring out which wpull arguments grab-site would use, without actually starting wpull 2016-02-21 04:33:14 +00:00
Ivan Kozik
bda4d8cf6d Pass maybe_log_ignore and print_to_terminal as globals to custom_hooks.py as well 2016-02-21 00:53:03 +00:00
Ivan Kozik
c37b32bd1c Implement --custom-hooks so that users can modify wpull_hook 2016-02-21 00:23:18 +00:00
Ivan Kozik
292682a48f Bump version 2016-02-16 17:25:11 +00:00
Ivan Kozik
7c1afbefe0 Use wpull 1.2.3 2016-02-05 19:15:25 +00:00
Ivan Kozik
ef5137ae86 Update UA 2016-02-01 16:19:49 +00:00
Ivan Kozik
7ec2f90534 global igset: Ignore /CSI/CSI/ loops on blogspot 2016-01-12 22:44:57 +00:00
Ivan Kozik
0214558d5e global igset: ignore bogus /search/label/CSI/ links on blogspot 2016-01-12 03:20:44 +00:00
Ivan Kozik
3b9f8c1a4c global igset: ignore /CaptchaImage.axd 2016-01-09 21:46:35 +00:00
Ivan Kozik
dff87eba2f global igset: also ignore www.digg.com/submit 2016-01-09 02:19:18 +00:00
Ivan Kozik
01bf6d527b Bump version 2016-01-05 01:08:14 +00:00
Ivan Kozik
111ffca643 global igset: ignore two more loops 2016-01-03 02:00:26 +00:00
Ivan Kozik
5f14263070 global igset: ignore livejournal.com/identity/login.bml 2015-12-30 05:35:57 +00:00
Ivan Kozik
2acc826d56 lstrip '-' to avoid creating filenames that must be --'ed or quoted 2015-12-17 16:54:55 +00:00
Ivan Kozik
4ea80eec80 global igset: Ignore a loop on archive.org 2015-12-16 13:11:16 +00:00
Ivan Kozik
38f733f9d2 global igset: ignore /wp-admin/ 2015-12-16 11:00:54 +00:00
Ivan Kozik
adb35ee4e3 Add --id=, --dir=, and --finished-warc-dir= options 2015-12-12 19:52:32 +00:00
Ivan Kozik
f8fece9ebb Set NCR=1 cookie for .blogspot.com to avoid getting redirected 2015-12-12 18:29:53 +00:00
Ivan Kozik
86e92d684c Send an over18=1 cookie to reddit.com to avoid the age gate on many subreddits 2015-12-12 16:49:42 +00:00
Ivan Kozik
6a647f637e If using Python 3.4.0, depend on an older version of aiohttp that works on Python 3.4.0 2015-12-12 10:08:02 +00:00
Ivan Kozik
8a6eaea16c Bump Firefox version in UA string 2015-12-04 11:45:28 +00:00
Ivan Kozik
58a2711058 Use Roboto font if installed 2015-11-30 05:16:29 +00:00
Ivan Kozik
e72c5fc3a7 Don't crash if psutil is not available on non-Windows OS (it is no longer installed by wpull 1.2.2) 2015-11-21 19:54:13 +00:00
Ivan Kozik
ec9f7bdb43 setup.py: if GRAB_SITE_NO_CCHARDET env var set, don't require cchardet; wpull will fall back on chardet 2015-10-28 16:11:23 +00:00
Ivan Kozik
1bb8bcc4d8 global igset: also ignore recaptcha /mailhide/d links 2015-10-23 00:41:41 +00:00
Ivan Kozik
40ca80638d global igset: Ignore /impixu on a new tumblr domain 2015-10-22 15:10:58 +00:00
Ivan Kozik
4487c43c83 Use wpull>=1.2.2 2015-10-21 22:38:47 +00:00
Ivan Kozik
b3c433a60b Fix: new Click gives us () instead of None when no start_url's are given 2015-10-03 22:53:20 +00:00
Ivan Kozik
7a63a3dcd1 Add --no-dupespotter for turning off dupespotter which sometimes has false positives 2015-09-30 22:16:56 +00:00
Ivan Kozik
17c1b9caaa Update default user agent 2015-09-25 20:32:08 +00:00
Ivan Kozik
f1548521ec Write URLs skipped by --max-content-length= to DIR/skipped_max_content_length 2015-09-02 19:15:00 +00:00
Ivan Kozik
3def2a79bc Fix for 32-bit machines: don't crash on startup with lmdb.MemoryError
lmdb.MemoryError: [...]/dupes_db: Cannot allocate memory
2015-09-02 19:04:56 +00:00
Ivan Kozik
e0ad2e9a25 Bump version 2015-08-28 04:29:24 +00:00
Ivan Kozik
b782c23389 Add --no-video option to skip the download of videos 2015-08-21 08:28:27 +00:00
Arkiver2
6f6754f81e Add --warc-max-size=BYTES option for controlling WARC size 2015-08-21 07:47:12 +00:00
Ivan Kozik
ee2684941d Add support for passing multiple URLs to grab-site 2015-08-21 07:18:31 +00:00