884 Commits

Author SHA1 Message Date
Ivan Kozik
7863f344a7 global igset: ignore incorrectly extracted YouTube links
e.g.

404 Not Found https://www.youtube.com/[[data.videoNavigationEndpoint]]
404 Not Found https://www.youtube.com/[[menuRendererData]]
404 Not Found https://www.youtube.com/[[videoReportActionResultRenderer_]]
404 Not Found https://www.youtube.com/[[speedyGData_.videoQualityPromoRenderer]]
2017-09-22 06:02:55 +00:00
Ivan Kozik
a6d5bd0227 global igset: also handle http://finance.google.com/finance 2017-09-18 01:02:41 +00:00
Ivan Kozik
e43bdcbf24 Bump Firefox UA 2017-08-29 19:03:16 +00:00
Ivan Kozik
f1b3501505 global igset: ignore another never-ending video stream 2017-08-29 19:02:09 +00:00
Ivan Kozik
a72272d5c3 Bump Firefox UA 2017-06-26 18:17:01 +00:00
Ivan Kozik
de32a1bfe9 Bump version 2017-06-26 18:14:05 +00:00
Ivan Kozik
0c43dc11f3 dashboard: opt out of DNS prefetching to avoid making DNS lookups on every host
Before this fix, if "Use a prediction service to load pages more quickly" was
enabled in Chrome, it would make DNS lookups on the hostname in every URL that
flew by in the dashboard.

https://www.chromium.org/developers/design-documents/dns-prefetching

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-DNS-Prefetch-Control
2017-06-26 18:12:06 +00:00
Ivan Kozik
2d7222cc5f Remove completely ineffective protection against crawling sites on localhost
Any hostname can resolve to 127.0.0.1, 192.168.x.y, etc.

If you care about this protection, run grab-site in a container or use
iptables/ferm to block outbound traffic on loopback for the user that runs
grab-site.
2017-04-27 16:32:34 +00:00
Ivan Kozik
25a19d1dc3 Update install instructions for Ubuntu 17.04 and fold Ubuntu 16.10 instructions into 16.04 instructions 2017-04-09 08:18:59 +00:00
Ivan Kozik
ae400137d3 README: update Help section 2017-03-09 07:58:28 +00:00
Ivan Kozik
69d1dab393 Mention grab-site 'URL' instead of grab-site URL to avoid issues with ? or & 2017-02-26 21:42:53 +00:00
Ivan Kozik
d88dccac27 Fix link to Python installer for OS X (there is no 3.4.5 installer) 2017-02-17 23:33:11 +00:00
Ivan Kozik
bd3c896146 Fix .travis.yml 2017-02-08 20:45:21 +00:00
Ivan Kozik
1f2d915fef Rename a metavar 2017-02-08 20:25:40 +00:00
Ivan Kozik
57e3455189 Bump Firefox UA 2017-02-08 20:24:07 +00:00
Ivan Kozik
94e486c7cf Document --permanent-error-status-codes 2017-02-08 20:23:12 +00:00
Ivan Kozik
bf6382d724 Add --permanent-error-status-codes argument
https://github.com/ludios/grab-site/issues/97
2017-02-08 20:16:31 +00:00
Ivan Kozik
4fd740e815 Point to Python 3.4.5 instead of 3.4.3 2017-02-04 13:39:46 +00:00
Ivan Kozik
32544d096e Add install instructions for Ubuntu 16.10 2017-02-04 13:36:47 +00:00
Ivan Kozik
75452363d0 README: Tweak 2016-11-12 22:42:15 +00:00
Ivan Kozik
ec5cc3f287 Opt out of Chrome's misbehaving Scroll Anchoring 2016-10-10 13:10:41 +00:00
Ivan Kozik
c30e92ee02 Bump Firefox UA 2016-10-07 08:57:51 +00:00
Ivan Kozik
76e173f1c9 chmod +x gs-dump-urls 2016-09-04 02:50:08 +00:00
Ivan Kozik
957c8f8aee chmod +x pause_resume_grab_sites.sh 2016-09-04 02:49:27 +00:00
Ivan Kozik
90d747d9bc global igset: Add getpocket.com/save 2016-08-17 20:03:34 +00:00
Ivan Kozik
9e360a8a13 global igset: Ignore another loop 2016-08-11 15:59:53 +00:00
Ivan Kozik
2136bae9be Bump version 2016-08-04 22:30:53 +00:00
Ivan Kozik
8a9a0b0d9f Revert "dashboard: Tweak font stack and size"
This reverts commit 3f31e251784fa4986751ce5613ef4884fe7656ec.
2016-08-04 22:29:33 +00:00
Ivan Kozik
4ef946300f Revert "dashboard: Improve alignment when using a font with variable-width numbers like San Francisco"
This reverts commit 154e99349ca4233d5233582c806719fa0564e1e8.
2016-08-04 22:29:23 +00:00
Ivan Kozik
154e99349c dashboard: Improve alignment when using a font with variable-width numbers like San Francisco 2016-08-04 19:26:09 +00:00
Ivan Kozik
3f31e25178 dashboard: Tweak font stack and size 2016-08-04 16:37:09 +00:00
Ivan Kozik
19738bf603 global igset: remove some icecast stations that are covered by the icecast skipper in wpull_hooks 2016-08-03 11:06:05 +00:00
Ivan Kozik
ecf9b2e717 .travis.yml: Change URL to try to get an exit code of 0 2016-08-03 10:30:10 +00:00
Ivan Kozik
659b25481e Set each timeout individually and use a session-timeout of two days
(we want to avoid hanging crawls forever, but we don't want to prevent the
downloading of large files from very slow hosts.)
2016-08-03 10:23:24 +00:00
Ivan Kozik
016a166f14 README: Advise downgrading tmux, not upgrading with some ppa 2016-08-02 18:53:07 +00:00
Ivan Kozik
6685bf51a8 global igset: fix addtoany.com ignore and use https?:// for all ignores 2016-08-01 22:08:42 +00:00
Ivan Kozik
d66011e86a README: Fix note; wpull 2.0.1 does work on Python 3.5 2016-07-15 05:41:27 +00:00
Ivan Kozik
63fdb9d5c6 Lock html5lib version to work around https://github.com/chfoo/wpull/issues/332 2016-07-15 05:38:38 +00:00
Ivan Kozik
f4230097eb .travis.yml: Upgrade setuptools to try to fix html5lib install failing due to old setuptools 2016-07-15 05:23:27 +00:00
Ivan Kozik
20bee1bdf5 .travis.yml: Upgrade pip3 to try to fix html5lib install failing due to old setuptools 2016-07-15 05:07:55 +00:00
Ivan Kozik
32b68e9342 dashboard: fix '?' shortcut key 2016-07-13 01:07:25 +00:00
Ivan Kozik
ca7bc71045 README: Add warning about tmux 2.1 2016-06-21 06:42:46 +00:00
Ivan Kozik
e7fcec6f85 global igset: make libsyn ignore libsyn-specific 2016-06-21 06:39:15 +00:00
Ivan Kozik
76f8b2cf48 Lock wpull dependency to 1.2.3 for now 2016-06-12 08:47:36 +00:00
Ivan Kozik
c313bfb2a1 README: bugs and questions 2016-06-12 08:46:04 +00:00
Ivan Kozik
c08f229e1d global igset: Add facebook login.php 2016-06-03 01:02:36 +00:00
Ivan Kozik
450a5c394f Bump Firefox UA 2016-06-01 02:16:36 +00:00
Ivan Kozik
febee9c85e global igset: Add /%3Ca%20href= pattern 2016-05-29 14:00:15 +00:00
Ivan Kozik
fb6e01caa7 global igset: Add /%20https?:/ pattern 2016-05-27 14:03:28 +00:00
Ivan Kozik
aa366bbb27 Update grab-site URL in setup.py and dashboard 2016-05-27 13:53:35 +00:00