Ivan Kozik
|
e6d81e81c5
|
Require ludios/wpull 3.0.2 and remove the install_requires that are in wpull
|
2018-10-05 08:15:16 +00:00 |
|
Ivan Kozik
|
e1e60c6072
|
Bump version
|
2018-10-05 07:54:13 +00:00 |
|
Ivan Kozik
|
f27c61bdfe
|
wpull_hooks: remove method confusion
|
2018-10-05 07:54:13 +00:00 |
|
Ivan Kozik
|
82c94683d4
|
Remove unused html5lib dependency
|
2018-10-05 07:54:13 +00:00 |
|
Ivan Kozik
|
4d8218db7f
|
Use wpull 1.3.1
|
2018-10-05 07:54:13 +00:00 |
|
Ivan Kozik
|
8ec6659cc8
|
.travis.yml: test only on Python 3.7
|
2018-10-05 07:54:13 +00:00 |
|
Ivan Kozik
|
01d0fc28b8
|
Bump version
|
2018-10-05 07:54:13 +00:00 |
|
Ivan Kozik
|
b7dfb14dd8
|
Upgrade to and require Python 3.7.0
|
2018-10-05 07:53:58 +00:00 |
|
Ivan Kozik
|
7ec5accb22
|
Remove trollius
|
2018-10-04 14:58:03 +00:00 |
|
Ivan Kozik
|
b32da83a0f
|
Install ludios/pyre2, to be used soon for processing ignores
|
2018-10-04 14:13:05 +00:00 |
|
Ivan Kozik
|
f7ed026010
|
wpull_hooks: "Picked up the changes to" -> "Imported"
|
2018-10-04 14:09:16 +00:00 |
|
Ivan Kozik
|
837551c201
|
README: link to ludios/wpull
|
2018-10-04 13:40:28 +00:00 |
|
Ivan Kozik
|
b6127f2077
|
Use ludios/wpull 1.2.5
|
2018-10-04 12:32:47 +00:00 |
|
Ivan Kozik
|
bc512c696d
|
README: fix pip3 install step
|
2018-10-04 11:27:53 +00:00 |
|
Ivan Kozik
|
7c14a909ef
|
README: Python 3.4.8 -> 3.4.9
|
2018-10-04 11:22:41 +00:00 |
|
Ivan Kozik
|
ba960e0ea8
|
README: fix pip3 install step for new setup.py
|
2018-10-04 11:22:21 +00:00 |
|
Ivan Kozik
|
eaaf0ec06e
|
Use ludios/wpull for html5-parser support
|
2018-10-04 11:13:35 +00:00 |
|
Ivan Kozik
|
29b9825dc5
|
Bump version
|
2018-10-02 04:07:16 +00:00 |
|
Ivan Kozik
|
9575ed4ec2
|
global igset: remove Google Finance ignore as the site no longer exists
|
2018-10-02 04:05:31 +00:00 |
|
Ivan Kozik
|
fa68cc68f0
|
global igset: combine some ignores
|
2018-10-02 04:05:08 +00:00 |
|
Ivan Kozik
|
2ad9d18d41
|
global igset: ignore telegram share URL
|
2018-10-02 04:02:22 +00:00 |
|
Ivan Kozik
|
6ea44ae862
|
global igset: combine some ignores
|
2018-10-02 04:01:58 +00:00 |
|
Ivan Kozik
|
a045da3b82
|
default_cookies.txt: skip the quarantine gate on reddit.com
|
2018-09-28 04:27:15 +00:00 |
|
Ivan Kozik
|
e664e4fd54
|
README: mention cookies.txt extension for Firefox
|
2018-09-08 04:30:01 +00:00 |
|
Ivan Kozik
|
424e58a173
|
README: document DIR/scrape
|
2018-08-28 01:30:35 +00:00 |
|
Ivan Kozik
|
eabcf70141
|
README: tweak wording
|
2018-08-28 01:27:28 +00:00 |
|
Ivan Kozik
|
cdd7928750
|
Use DIR/scrape file to control whether to scrape for new URLs in responses
present = do scrape
missing = don't scrape
|
2018-08-28 01:22:19 +00:00 |
|
Ivan Kozik
|
90c37526e1
|
reddit igset: apply to old.reddit.com as well
|
2018-08-25 14:29:14 +00:00 |
|
Ivan Kozik
|
bf0d7d28a9
|
README: using Googlebot UA on tumblr no longer works
|
2018-08-24 00:41:35 +00:00 |
|
Ivan Kozik
|
0ea3d40938
|
Add default get_urls hook to get :orig images on Twitter and ?share=1 pages on Quora
|
2018-08-20 15:11:10 +00:00 |
|
Ivan Kozik
|
a3f1c51f55
|
global igset: ignore amazon logging
|
2018-08-16 15:48:56 +00:00 |
|
Ivan Kozik
|
4899dcd51b
|
global igset: ignore sitemeter.com counters
|
2018-08-15 17:14:18 +00:00 |
|
Ivan Kozik
|
ca8fd22c02
|
singletumblr igset: don't ignore non-tumblr domains; don't apply ignores to start URLs
https://github.com/ludios/grab-site/issues/126
|
2018-08-06 23:36:50 +00:00 |
|
Ivan Kozik
|
fbc0475157
|
dashboard: keep table aligned when a crawl has > 9 connections
|
2018-08-04 21:54:55 +00:00 |
|
Ivan Kozik
|
6d76cf5903
|
dashboard: keep stats rows aligned when using San Francisco font
|
2018-08-02 18:40:04 +00:00 |
|
Ivan Kozik
|
398c0cf8e6
|
grab-site --help: link to README.md
|
2018-07-28 12:36:29 +00:00 |
|
Ivan Kozik
|
644260c479
|
README: document how to bypass tumblr's GDPR consent page
|
2018-07-07 12:05:34 +00:00 |
|
Ivan Kozik
|
a3537c7f2c
|
Revert Googlebot UA to avoid breaking reddit crawls
With Googlebot in the UA, reddit says:
429 Too Many Requests https://www.reddit.com/...
|
2018-07-07 12:03:23 +00:00 |
|
Ivan Kozik
|
aa01eb8293
|
README: mention updated UA
|
2018-06-25 02:13:09 +00:00 |
|
Ivan Kozik
|
5bc2069d9b
|
Bump Firefox version in UA string and add Googlebot to UA to archive tumblr blogs from Europe without GDPR cookie
|
2018-06-25 01:51:57 +00:00 |
|
Ivan Kozik
|
1069dedfcd
|
global igset: ignore two more share links
|
2018-05-25 06:42:22 +00:00 |
|
Ivan Kozik
|
f47fc0a899
|
global igset: ignore beacon.wikia-services.com
|
2018-05-24 20:03:38 +00:00 |
|
Ivan Kozik
|
a2e751f9dc
|
README: Ubuntu 17.10 -> 18.04; show newer-distro instructions first
|
2018-05-19 19:31:19 +00:00 |
|
Ivan Kozik
|
e79cbac070
|
README: fix macOS install steps for PyPI now requiring TLS 1.2 support
Fixes https://github.com/ludios/grab-site/issues/121
|
2018-05-15 20:57:38 +00:00 |
|
Ivan Kozik
|
b97414c5a4
|
README: Python 3.4.7 -> 3.4.8
|
2018-05-15 20:51:02 +00:00 |
|
Ivan Kozik
|
8e8cd5895b
|
global igset: block more reddit tracking pixels
|
2018-05-06 03:28:29 +00:00 |
|
Ivan Kozik
|
1bfb5eca99
|
global igset: ignore new reddit tracking pixel
|
2018-05-05 21:09:21 +00:00 |
|
Ivan Kozik
|
42ba39afb4
|
global igset: ignore getpocket.com/edit
|
2018-04-12 08:44:40 +00:00 |
|
Ivan Kozik
|
bbe36cbe39
|
global igset: ignore jp.pinterest.com/pin/create/
|
2018-04-12 08:43:15 +00:00 |
|
Ivan Kozik
|
fe5dd47df8
|
Bump UA lie to Firefox 59
|
2018-04-08 07:09:59 +00:00 |
|