935 Commits

Author SHA1 Message Date
Ivan Kozik
bbe36cbe39 global igset: ignore jp.pinterest.com/pin/create/ 2018-04-12 08:43:15 +00:00
Ivan Kozik
fe5dd47df8 Bump UA lie to Firefox 59 2018-04-08 07:09:59 +00:00
Ivan Kozik
5a05fa9761 Lock tornado version to 4.5.3 to avoid 5.0, which breaks with:
File "[...]/lib/python3.4/site-packages/wpull/abstract/client.py", line 9, in <module>
    from wpull.connection import ConnectionPool
  File "[...]/lib/python3.4/site-packages/wpull/connection.py", line 11, in <module>
    from tornado.netutil import SSLCertificateError
ImportError: cannot import name 'SSLCertificateError'
2018-03-06 06:29:07 +00:00
Ivan Kozik
82de2f2b2b Add --import-ignores for starting with a non-empty DIR/ignores file 2017-12-27 13:48:20 +00:00
Ivan Kozik
6b6d5785e2 README: adjust logo size 2017-12-27 13:36:32 +00:00
Ivan Kozik
cea5a1f90d default_cookies.txt: skip the age gate on store.steampowered.com 2017-12-14 12:14:06 +00:00
Ivan Kozik
6d1b24f903 extra_docs/pause_resume_grab_sites.sh: only resume grab-sites if we paused the grab-sites 2017-12-14 05:25:48 +00:00
Ivan Kozik
97caf59705 README: add BrowserStack logo per terms 2017-12-13 23:05:41 +00:00
Ivan Kozik
fe38081834 README: thank BrowserStack 2017-12-13 23:00:26 +00:00
Ivan Kozik
2eeab5b2bc reddit igset: ignore out.reddit.com; appears to be safe to ignore because the tracking links are redundant with the non-tracking links 2017-12-13 04:23:02 +00:00
Ivan Kozik
a5b13a8393 global igset: ignore another /search.*updated-(min|max)= pattern on blogspot:
*.blogspot.com/search?q=QUERY&updated-max=2011-08-23T15:10:00-07:00&max-results=20&start=79&by-date=true
2017-12-12 03:40:15 +00:00
Ivan Kozik
9e24731262 global igset: ignore 16x16 tumblr avatars with .pnj extension (typo-prone tumblr programmer?) 2017-12-12 03:11:09 +00:00
Ivan Kozik
2f95d7f652 Bump UA lie to Firefox 57 on Windows 10 2017-12-11 11:06:37 +00:00
Ivan Kozik
703534a0ee reddit igset: ignore URLs with [\?&]utm_ 2017-12-11 02:22:01 +00:00
Ivan Kozik
ff33ab8295 dashboard: adjust color to make it more obvious that stats line is a click target 2017-12-07 05:57:44 +00:00
Ivan Kozik
4568dd46f4 dashboard: help text: job -> crawl; 'job' is ArchiveBot terminology 2017-12-07 05:54:42 +00:00
Ivan Kozik
c6c5bdefc7 dashboard: for Chrome 63+, use the faster overscroll-behavior: contain instead of attaching an onwheel event. 2017-12-07 05:19:15 +00:00
Ivan Kozik
70dc5cbe0b dashboard: add a subtle box-shadow to the log windows 2017-12-07 05:11:26 +00:00
Ivan Kozik
3b787cda83 dashboard: make the background a little less saturated 2017-12-07 05:01:25 +00:00
Ivan Kozik
4699e581fc README: add install steps for Debian 8 (jessie) 2017-12-07 02:36:14 +00:00
Ivan Kozik
26655fb28c README: switch from PPA-based python3.4 install to pyenv-based install; add install steps for Debian 9 and 10 2017-12-07 02:28:45 +00:00
Ivan Kozik
95e98ecefe README: link to wpull v1.2.3 2017-11-22 18:34:50 +00:00
Ivan Kozik
b3c83f203c README: add note about gs-server listening on all interfaces by default 2017-11-22 18:09:49 +00:00
Ivan Kozik
62d4575b0c README: point to the newer ppa:deadsnakes/ppa PPA with Python 3.4.7 2017-11-22 17:57:36 +00:00
Ivan Kozik
2276adefe8 README: be less confusing about "start a new shell" 2017-11-22 17:25:31 +00:00
Ivan Kozik
fc09d22028 README: ask users to file issues 2017-11-19 04:11:57 +00:00
Ivan Kozik
c677c29aaf global igset: ignore new facebook like.php links
e.g. https://www.facebook.com/v2.9/plugins/like.php?href=
2017-11-16 20:23:28 +00:00
Ivan Kozik
6119aef9ed global igset: ignore pixel.wp.com tracking pixels 2017-11-09 16:52:54 +00:00
Ivan Kozik
297c5b1b8d Patch dns.inet.is_multicast to not crash wpull 2017-11-09 11:33:14 +00:00
Ivan Kozik
90300f0f57 Document how to grab a website that requires login / cookies 2017-11-09 11:10:54 +00:00
Ivan Kozik
469974864e Rename some unused bindings 2017-11-09 09:44:46 +00:00
Ivan Kozik
82a5fa6650 Use wpull v3 hooks so that custom hooks get more information passed into wait_time 2017-11-09 09:37:26 +00:00
Ivan Kozik
a8a50f523c youtube igset: remove redundant ignore 2017-11-02 02:58:50 +00:00
Ivan Kozik
5442414d28 Remove googleplus ignore set and add accounts.google.com-related ignores to global igset 2017-11-02 02:31:30 +00:00
Ivan Kozik
7200878118 extra_docs/custom_hooks_sample.py: add a hook that queues additional URLs 2017-10-25 20:53:51 +00:00
Ivan Kozik
2b56a73aaa dashboard: adjust code formatting 2017-10-24 18:15:44 +00:00
Ivan Kozik
87e4bd79a6 dashboard: enable context menu for all browsers (Safari 10+ has document.execCommand('copy').) 2017-10-24 18:13:48 +00:00
Ivan Kozik
d9f75f5ae3 README: update "Install on a non-Ubuntu distribution" steps to also use a virtualenv 2017-10-24 17:55:48 +00:00
Ivan Kozik
ad5c4d2449 README: OS X -> macOS and update instructions to use virtualenv 2017-10-24 17:42:33 +00:00
Ivan Kozik
d5698bc08a README: fix TOC order 2017-10-24 17:25:33 +00:00
Ivan Kozik
112a3175c2 Bump version to 1.3.0 2017-10-24 17:24:57 +00:00
Ivan Kozik
d9b89f551b README: rework instructions to not require activating the virtualenv 2017-10-24 17:24:27 +00:00
Ivan Kozik
be5db3f397 README: rework the Ubuntu 14.04 install steps to use virtualenv; assume grab-site and related executables are in PATH 2017-10-24 17:14:57 +00:00
Ivan Kozik
0ad6bdf89f README: ancient non-LTS Ubuntu releases are not supported 2017-10-24 17:02:33 +00:00
Ivan Kozik
a954a0caca README: "Python 3.5 or newer" 2017-10-24 16:45:40 +00:00
Ivan Kozik
cd3931b5fc Add install instructions for Windows 10 2017-10-24 16:43:53 +00:00
Ivan Kozik
96e1f229dc dashboard: adjust the font stacks; add Segoe UI for Windows 2017-10-24 16:31:23 +00:00
Ivan Kozik
6680cf7e50 README: add install steps for Ubuntu 17.10 2017-10-24 01:02:42 +00:00
Ivan Kozik
eefb6a3eba global igset: ignore another unwanted medium.com URL 2017-09-30 13:36:04 +00:00
Ivan Kozik
0c2f160db6 global igset: ignore unwanted medium.com URLs 2017-09-30 13:29:52 +00:00