Ivan Kozik
4d127249af
reddit igset: ignore new simple.reddit.com
2018-10-09 11:57:30 +00:00
Ivan Kozik
c38e931485
mediawiki igset: use {any_start_netloc} and add [\?&]lqt_method=
2018-10-08 11:04:54 +00:00
Ivan Kozik
cbf202fb44
global igset: ignore another endless mp3 stream
...
https://aechai9hib.cdn.dvmr.fr/franceinfo-midfi.mp3
2018-10-08 09:52:33 +00:00
Ivan Kozik
3316c048b7
wpull_hooks: implement support for {any_start_netloc} (previously {primary_netloc})
2018-10-08 09:51:27 +00:00
Ivan Kozik
9575ed4ec2
global igset: remove Google Finance ignore as the site no longer exists
2018-10-02 04:05:31 +00:00
Ivan Kozik
fa68cc68f0
global igset: combine some ignores
2018-10-02 04:05:08 +00:00
Ivan Kozik
2ad9d18d41
global igset: ignore telegram share URL
2018-10-02 04:02:22 +00:00
Ivan Kozik
6ea44ae862
global igset: combine some ignores
2018-10-02 04:01:58 +00:00
Ivan Kozik
90c37526e1
reddit igset: apply to old.reddit.com as well
2018-08-25 14:29:14 +00:00
Ivan Kozik
a3f1c51f55
global igset: ignore amazon logging
2018-08-16 15:48:56 +00:00
Ivan Kozik
4899dcd51b
global igset: ignore sitemeter.com counters
2018-08-15 17:14:18 +00:00
Ivan Kozik
ca8fd22c02
singletumblr igset: don't ignore non-tumblr domains; don't apply ignores to start URLs
...
https://github.com/ludios/grab-site/issues/126
2018-08-06 23:36:50 +00:00
Ivan Kozik
1069dedfcd
global igset: ignore two more share links
2018-05-25 06:42:22 +00:00
Ivan Kozik
f47fc0a899
global igset: ignore beacon.wikia-services.com
2018-05-24 20:03:38 +00:00
Ivan Kozik
8e8cd5895b
global igset: block more reddit tracking pixels
2018-05-06 03:28:29 +00:00
Ivan Kozik
1bfb5eca99
global igset: ignore new reddit tracking pixel
2018-05-05 21:09:21 +00:00
Ivan Kozik
42ba39afb4
global igset: ignore getpocket.com/edit
2018-04-12 08:44:40 +00:00
Ivan Kozik
bbe36cbe39
global igset: ignore jp.pinterest.com/pin/create/
2018-04-12 08:43:15 +00:00
Ivan Kozik
2eeab5b2bc
reddit igset: ignore out.reddit.com; appears to be safe to ignore because the tracking links are redundant with the non-tracking links
2017-12-13 04:23:02 +00:00
Ivan Kozik
a5b13a8393
global igset: ignore another /search.*updated-(min|max)= pattern on blogspot:
...
*.blogspot.com/search?q=QUERY&updated-max=2011-08-23T15:10:00-07:00&max-results=20&start=79&by-date=true
2017-12-12 03:40:15 +00:00
Ivan Kozik
9e24731262
global igset: ignore 16x16 tumblr avatars with .pnj extension (typo-prone tumblr programmer?)
2017-12-12 03:11:09 +00:00
Ivan Kozik
703534a0ee
reddit igset: ignore URLs with [\?&]utm_
2017-12-11 02:22:01 +00:00
Ivan Kozik
c677c29aaf
global igset: ignore new facebook like.php links
...
e.g. https://www.facebook.com/v2.9/plugins/like.php?href=
2017-11-16 20:23:28 +00:00
Ivan Kozik
6119aef9ed
global igset: ignore pixel.wp.com tracking pixels
2017-11-09 16:52:54 +00:00
Ivan Kozik
a8a50f523c
youtube igset: remove redundant ignore
2017-11-02 02:58:50 +00:00
Ivan Kozik
5442414d28
Remove googleplus ignore set and add accounts.google.com-related ignores to global igset
2017-11-02 02:31:30 +00:00
Ivan Kozik
eefb6a3eba
global igset: ignore another unwanted medium.com URL
2017-09-30 13:36:04 +00:00
Ivan Kozik
0c2f160db6
global igset: ignore unwanted medium.com URLs
2017-09-30 13:29:52 +00:00
Ivan Kozik
a941fdfb9c
global igset: ignore more incorrectly extracted links on YouTube
...
e.g.
404 Not Found https://www.youtube.com/{{data}}
2017-09-22 06:05:02 +00:00
Ivan Kozik
7863f344a7
global igset: ignore incorrectly extracted YouTube links
...
e.g.
404 Not Found https://www.youtube.com/[[data.videoNavigationEndpoint]]
404 Not Found https://www.youtube.com/[[menuRendererData]]
404 Not Found https://www.youtube.com/[[videoReportActionResultRenderer_]]
404 Not Found https://www.youtube.com/[[speedyGData_.videoQualityPromoRenderer]]
2017-09-22 06:02:55 +00:00
Ivan Kozik
a6d5bd0227
global igset: also handle http://finance.google.com/finance
2017-09-18 01:02:41 +00:00
Ivan Kozik
f1b3501505
global igset: ignore another never-ending video stream
2017-08-29 19:02:09 +00:00
Ivan Kozik
2d7222cc5f
Remove completely ineffective protection against crawling sites on localhost
...
Any hostname can resolve to 127.0.0.1, 192.168.x.y, etc.
If you care about this protection, run grab-site in a container or use
iptables/ferm to block outbound traffic on loopback for the user that runs
grab-site.
2017-04-27 16:32:34 +00:00
Ivan Kozik
90d747d9bc
global igset: Add getpocket.com/save
2016-08-17 20:03:34 +00:00
Ivan Kozik
9e360a8a13
global igset: Ignore another loop
2016-08-11 15:59:53 +00:00
Ivan Kozik
19738bf603
global igset: remove some icecast stations that are covered by the icecast skipper in wpull_hooks
2016-08-03 11:06:05 +00:00
Ivan Kozik
6685bf51a8
global igset: fix addtoany.com ignore and use https?:// for all ignores
2016-08-01 22:08:42 +00:00
Ivan Kozik
e7fcec6f85
global igset: make libsyn ignore libsyn-specific
2016-06-21 06:39:15 +00:00
Ivan Kozik
c08f229e1d
global igset: Add facebook login.php
2016-06-03 01:02:36 +00:00
Ivan Kozik
febee9c85e
global igset: Add /%3Ca%20href= pattern
2016-05-29 14:00:15 +00:00
Ivan Kozik
fb6e01caa7
global igset: Add /%20https?:/ pattern
2016-05-27 14:03:28 +00:00
Ivan Kozik
fe2530e667
global igset: ignore amp%3Bamp%3Bamp%3B loops
2016-04-22 03:55:09 +00:00
Ivan Kozik
01ac84da06
global igset: tumblr serves 16px avatars on https now as well
2016-04-04 18:38:59 +00:00
Ivan Kozik
ffecfcabda
global igset: Ignore instapaper share links
2016-03-30 17:11:03 +00:00
Ivan Kozik
7ec2f90534
global igset: Ignore /CSI/CSI/ loops on blogspot
2016-01-12 22:44:57 +00:00
Ivan Kozik
0214558d5e
global igset: ignore bogus /search/label/CSI/ links on blogspot
2016-01-12 03:20:44 +00:00
Ivan Kozik
3b9f8c1a4c
global igset: ignore /CaptchaImage.axd
2016-01-09 21:46:35 +00:00
Ivan Kozik
dff87eba2f
global igset: also ignore www.digg.com/submit
2016-01-09 02:19:18 +00:00
Ivan Kozik
111ffca643
global igset: ignore two more loops
2016-01-03 02:00:26 +00:00
Ivan Kozik
5f14263070
global igset: ignore livejournal.com/identity/login.bml
2015-12-30 05:35:57 +00:00