Commit Graph

128 Commits (95012d1e0c233ce6ebcd448584fed45d3477b19a)

Author SHA1 Message Date
Daniel Oaks 95012d1e0c gs-server: Use env instead of py3 directly, makes virtualenvs nicer 2016-02-16 17:24:20 +00:00
Ivan Kozik 7c1afbefe0 Use wpull 1.2.3 2016-02-05 19:15:25 +00:00
Ivan Kozik ef5137ae86 Update UA 2016-02-01 16:19:49 +00:00
Ivan Kozik 7ec2f90534 global igset: Ignore /CSI/CSI/ loops on blogspot 2016-01-12 22:44:57 +00:00
Ivan Kozik 0214558d5e global igset: ignore bogus /search/label/CSI/ links on blogspot 2016-01-12 03:20:44 +00:00
Ivan Kozik 3b9f8c1a4c global igset: ignore /CaptchaImage.axd 2016-01-09 21:46:35 +00:00
Ivan Kozik dff87eba2f global igset: also ignore www.digg.com/submit 2016-01-09 02:19:18 +00:00
Ivan Kozik 01bf6d527b Bump version 2016-01-05 01:08:14 +00:00
Daniel Oaks 04179a2a99 server: Fix exception when clients visit ws port via a browser 2016-01-05 10:53:37 +10:00
Ivan Kozik 111ffca643 global igset: ignore two more loops 2016-01-03 02:00:26 +00:00
Ivan Kozik 5f14263070 global igset: ignore livejournal.com/identity/login.bml 2015-12-30 05:35:57 +00:00
Ivan Kozik 2acc826d56 lstrip '-' to avoid creating filenames that must be --'ed or quoted 2015-12-17 16:54:55 +00:00
Ivan Kozik 4ea80eec80 global igset: Ignore a loop on archive.org 2015-12-16 13:11:16 +00:00
Ivan Kozik 38f733f9d2 global igset: ignore /wp-admin/ 2015-12-16 11:00:54 +00:00
Ivan Kozik adb35ee4e3 Add --id=, --dir=, and --finished-warc-dir= options 2015-12-12 19:52:32 +00:00
Ivan Kozik f8fece9ebb Set NCR=1 cookie for .blogspot.com to avoid getting redirected 2015-12-12 18:29:53 +00:00
Ivan Kozik 86e92d684c Send an over18=1 cookie to reddit.com to avoid the age gate on many subreddits 2015-12-12 16:49:42 +00:00
Ivan Kozik 6a647f637e If using Python 3.4.0, depend on an older version of aiohttp that works on Python 3.4.0 2015-12-12 10:08:02 +00:00
Ivan Kozik 8a6eaea16c Bump Firefox version in UA string 2015-12-04 11:45:28 +00:00
Ivan Kozik 58a2711058 Use Roboto font if installed 2015-11-30 05:16:29 +00:00
Ivan Kozik e72c5fc3a7 Don't crash if psutil is not available on non-Windows OS (it is no longer installed by wpull 1.2.2) 2015-11-21 19:54:13 +00:00
Ivan Kozik 8c306aed50 Don't use --debug-manhole on Windows to avoid a crash
File "C:\Python34\lib\site-packages\click\core.py", line 680, in main
    rv = self.invoke(ctx)
  File "C:\Python34\lib\site-packages\click\core.py", line 873, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Python34\lib\site-packages\click\core.py", line 508, in invoke
    return callback(*args, **kwargs)
  File "C:\Python34\lib\site-packages\libgrabsite\main.py", line 276, in main
    wpull.__main__.main()
  File "C:\Python34\lib\site-packages\wpull\__main__.py", line 38, in main
    manhole.install()
  File "C:\Python34\lib\site-packages\manhole.py", line 565, in install
    _MANHOLE.configure(**kwargs)  # Threads might be started here
  File "C:\Python34\lib\site-packages\manhole.py", line 412, in configure
    self.patch_os_fork_functions()
  File "C:\Python34\lib\site-packages\manhole.py", line 505, in patch_os_fork_functions
    self.original_os_fork, os.fork = os.fork, self.patched_fork
AttributeError: 'module' object has no attribute 'fork'
Exception in thread Manhole:
Traceback (most recent call last):
  File "C:\Python34\lib\threading.py", line 920, in _bootstrap_inner
    self.run()
  File "C:\Python34\lib\site-packages\manhole.py", line 192, in run
    sock = self.get_socket()
  File "C:\Python34\lib\site-packages\manhole.py", line 445, in get_socket
    sock = _ORIGINAL_SOCKET(socket.AF_UNIX, socket.SOCK_STREAM)
AttributeError: 'module' object has no attribute 'AF_UNIX'
2015-10-28 16:31:39 +00:00
Ivan Kozik ca5c4611fa Don't crash on lack of SIGINT support on Windows 2015-10-28 16:29:24 +00:00
Ivan Kozik 7a0c1b73bc dupes: Also catch lmdb.Error for Windows, which complains about lacking disk space 2015-10-28 16:24:28 +00:00
Ivan Kozik e6ebbd7b51 Don't try to use the unavailable --monitor- options on Windows 2015-10-28 16:19:11 +00:00
Ivan Kozik ec9f7bdb43 setup.py: if GRAB_SITE_NO_CCHARDET env var set, don't require cchardet; wpull will fall back on chardet 2015-10-28 16:11:23 +00:00
Ivan Kozik 1bb8bcc4d8 global igset: also ignore recaptcha /mailhide/d links 2015-10-23 00:41:41 +00:00
Ivan Kozik 40ca80638d global igset: Ignore /impixu on a new tumblr domain 2015-10-22 15:10:58 +00:00
Ivan Kozik 4487c43c83 Use wpull>=1.2.2 2015-10-21 22:38:47 +00:00
Ivan Kozik b3c433a60b Fix: new Click gives us () instead of None when no start_url's are given 2015-10-03 22:53:20 +00:00
Ivan Kozik 7a63a3dcd1 Add --no-dupespotter for turning off dupespotter which sometimes has false positives 2015-09-30 22:16:56 +00:00
Ivan Kozik 17c1b9caaa Update default user agent 2015-09-25 20:32:08 +00:00
Ivan Kozik f1548521ec Write URLs skipped by --max-content-length= to DIR/skipped_max_content_length 2015-09-02 19:15:00 +00:00
Ivan Kozik 3def2a79bc Fix for 32-bit machines: don't crash on startup with lmdb.MemoryError
lmdb.MemoryError: [...]/dupes_db: Cannot allocate memory
2015-09-02 19:04:56 +00:00
Ivan Kozik e0ad2e9a25 Bump version 2015-08-28 04:29:24 +00:00
Ivan Kozik 5a04a38f59 Prevent twitter crawls from endlessly downloading [\?&]nav=
Credit to garyrh
2015-08-28 04:28:19 +00:00
Ivan Kozik c28c593a83 Explain imdb ignore set 2015-08-21 08:35:57 +00:00
Ivan Kozik b782c23389 Add --no-video option to skip the download of videos 2015-08-21 08:28:27 +00:00
Arkiver2 6f6754f81e Add --warc-max-size=BYTES option for controlling WARC size 2015-08-21 07:47:12 +00:00
Ivan Kozik ee2684941d Add support for passing multiple URLs to grab-site 2015-08-21 07:18:31 +00:00
Ivan Kozik 524cdf2cec Fix blogspot search? ignore 2015-08-21 05:35:30 +00:00
Ivan Kozik f379264ed1 Don't crash on --igsets=blogs even though it's gone 2015-08-21 05:31:51 +00:00
Ivan Kozik 6a6dff0083 Add comment to reddit ignore set 2015-08-21 05:18:35 +00:00
Ivan Kozik 4b50c6de67 Migrate all other ignores from blogs to the global set 2015-08-21 05:16:26 +00:00
Ivan Kozik 954ab31acb Remove ignores that probably only wget needed 2015-08-21 04:48:47 +00:00
Ivan Kozik b83bce1f2a Migrate some ignores from blogs to global set 2015-08-21 04:48:19 +00:00
Ivan Kozik 1b8b4b0077 Ignore per-post and per-comment Atom feeds on blogspot.com 2015-08-21 04:23:54 +00:00
Ivan Kozik 291b3e939b Fix: --offsite-links should be on by default 2015-08-13 12:29:11 +00:00
Ivan Kozik a3f1ff7ed9 Cache control files for just 1.5 sec instead of 3 sec 2015-08-12 08:52:56 +00:00
Ivan Kozik 050dbc44d8 Fix very recent regression: report the pattern instead of the regexp 2015-08-12 08:51:41 +00:00