Ivan Kozik
4ea80eec80
global igset: Ignore a loop on archive.org
2015-12-16 13:11:16 +00:00
Ivan Kozik
38f733f9d2
global igset: ignore /wp-admin/
2015-12-16 11:00:54 +00:00
Ivan Kozik
adb35ee4e3
Add --id=, --dir=, and --finished-warc-dir= options
2015-12-12 19:52:32 +00:00
Ivan Kozik
f8fece9ebb
Set NCR=1 cookie for .blogspot.com to avoid getting redirected
2015-12-12 18:29:53 +00:00
Ivan Kozik
86e92d684c
Send an over18=1 cookie to reddit.com to avoid the age gate on many subreddits
2015-12-12 16:49:42 +00:00
Ivan Kozik
6a647f637e
If using Python 3.4.0, depend on an older version of aiohttp that works on Python 3.4.0
2015-12-12 10:08:02 +00:00
Ivan Kozik
8a6eaea16c
Bump Firefox version in UA string
2015-12-04 11:45:28 +00:00
Ivan Kozik
58a2711058
Use Roboto font if installed
2015-11-30 05:16:29 +00:00
Ivan Kozik
e72c5fc3a7
Don't crash if psutil is not available on non-Windows OS (it is no longer installed by wpull 1.2.2)
2015-11-21 19:54:13 +00:00
Ivan Kozik
8c306aed50
Don't use --debug-manhole on Windows to avoid a crash
...
File "C:\Python34\lib\site-packages\click\core.py", line 680, in main
rv = self.invoke(ctx)
File "C:\Python34\lib\site-packages\click\core.py", line 873, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Python34\lib\site-packages\click\core.py", line 508, in invoke
return callback(*args, **kwargs)
File "C:\Python34\lib\site-packages\libgrabsite\main.py", line 276, in main
wpull.__main__.main()
File "C:\Python34\lib\site-packages\wpull\__main__.py", line 38, in main
manhole.install()
File "C:\Python34\lib\site-packages\manhole.py", line 565, in install
_MANHOLE.configure(**kwargs) # Threads might be started here
File "C:\Python34\lib\site-packages\manhole.py", line 412, in configure
self.patch_os_fork_functions()
File "C:\Python34\lib\site-packages\manhole.py", line 505, in patch_os_fork_functions
self.original_os_fork, os.fork = os.fork, self.patched_fork
AttributeError: 'module' object has no attribute 'fork'
Exception in thread Manhole:
Traceback (most recent call last):
File "C:\Python34\lib\threading.py", line 920, in _bootstrap_inner
self.run()
File "C:\Python34\lib\site-packages\manhole.py", line 192, in run
sock = self.get_socket()
File "C:\Python34\lib\site-packages\manhole.py", line 445, in get_socket
sock = _ORIGINAL_SOCKET(socket.AF_UNIX, socket.SOCK_STREAM)
AttributeError: 'module' object has no attribute 'AF_UNIX'
2015-10-28 16:31:39 +00:00
Ivan Kozik
ca5c4611fa
Don't crash on lack of SIGINT support on Windows
2015-10-28 16:29:24 +00:00
Ivan Kozik
7a0c1b73bc
dupes: Also catch lmdb.Error for Windows, which complains about lacking disk space
2015-10-28 16:24:28 +00:00
Ivan Kozik
e6ebbd7b51
Don't try to use the unavailable --monitor- options on Windows
2015-10-28 16:19:11 +00:00
Ivan Kozik
ec9f7bdb43
setup.py: if GRAB_SITE_NO_CCHARDET env var set, don't require cchardet; wpull will fall back on chardet
2015-10-28 16:11:23 +00:00
Ivan Kozik
1bb8bcc4d8
global igset: also ignore recaptcha /mailhide/d links
2015-10-23 00:41:41 +00:00
Ivan Kozik
40ca80638d
global igset: Ignore /impixu on a new tumblr domain
2015-10-22 15:10:58 +00:00
Ivan Kozik
4487c43c83
Use wpull>=1.2.2
2015-10-21 22:38:47 +00:00
Ivan Kozik
b3c433a60b
Fix: new Click gives us () instead of None when no start_url's are given
2015-10-03 22:53:20 +00:00
Ivan Kozik
7a63a3dcd1
Add --no-dupespotter for turning off dupespotter which sometimes has false positives
2015-09-30 22:16:56 +00:00
Ivan Kozik
17c1b9caaa
Update default user agent
2015-09-25 20:32:08 +00:00
Ivan Kozik
f1548521ec
Write URLs skipped by --max-content-length= to DIR/skipped_max_content_length
2015-09-02 19:15:00 +00:00
Ivan Kozik
3def2a79bc
Fix for 32-bit machines: don't crash on startup with lmdb.MemoryError
...
lmdb.MemoryError: [...]/dupes_db: Cannot allocate memory
2015-09-02 19:04:56 +00:00
Ivan Kozik
e0ad2e9a25
Bump version
2015-08-28 04:29:24 +00:00
Ivan Kozik
5a04a38f59
Prevent twitter crawls from endlessly downloading [\?&]nav=
...
Credit to garyrh
2015-08-28 04:28:19 +00:00
Ivan Kozik
c28c593a83
Explain imdb ignore set
2015-08-21 08:35:57 +00:00
Ivan Kozik
b782c23389
Add --no-video option to skip the download of videos
2015-08-21 08:28:27 +00:00
Arkiver2
6f6754f81e
Add --warc-max-size=BYTES option for controlling WARC size
2015-08-21 07:47:12 +00:00
Ivan Kozik
ee2684941d
Add support for passing multiple URLs to grab-site
2015-08-21 07:18:31 +00:00
Ivan Kozik
524cdf2cec
Fix blogspot search? ignore
2015-08-21 05:35:30 +00:00
Ivan Kozik
f379264ed1
Don't crash on --igsets=blogs even though it's gone
2015-08-21 05:31:51 +00:00
Ivan Kozik
6a6dff0083
Add comment to reddit ignore set
2015-08-21 05:18:35 +00:00
Ivan Kozik
4b50c6de67
Migrate all other ignores from blogs to the global set
2015-08-21 05:16:26 +00:00
Ivan Kozik
954ab31acb
Remove ignores that probably only wget needed
2015-08-21 04:48:47 +00:00
Ivan Kozik
b83bce1f2a
Migrate some ignores from blogs to global set
2015-08-21 04:48:19 +00:00
Ivan Kozik
1b8b4b0077
Ignore per-post and per-comment Atom feeds on blogspot.com
2015-08-21 04:23:54 +00:00
Ivan Kozik
291b3e939b
Fix: --offsite-links should be on by default
2015-08-13 12:29:11 +00:00
Ivan Kozik
a3f1ff7ed9
Cache control files for just 1.5 sec instead of 3 sec
2015-08-12 08:52:56 +00:00
Ivan Kozik
050dbc44d8
Fix very recent regression: report the pattern instead of the regexp
2015-08-12 08:51:41 +00:00
Ivan Kozik
b8b6248aab
Remove no-longer-needed workaround in ignore sets
2015-08-12 07:55:08 +00:00
Ivan Kozik
1d52a28fac
Increase size of compiled regexp cache; remove unused code
2015-08-12 07:52:24 +00:00
Ivan Kozik
26c7ea84d8
Implement --wpull-args for passing additional arguments to wpull
2015-08-12 06:39:49 +00:00
Ivan Kozik
1674751b1c
Don't crash if DIR/concurrency is set to 0
2015-08-12 05:57:56 +00:00
Ivan Kozik
28f5652404
Bump version
2015-08-12 05:29:44 +00:00
Ivan Kozik
ba823a34f8
Print ignores without doubling up backslashes
2015-08-12 05:26:11 +00:00
Ivan Kozik
668c03d5d2
Implement -i / --input-file, supporting both local input files and URLs
2015-08-12 05:24:09 +00:00
Ivan Kozik
9989eb5b70
Pretend to be Firefox 40; it's out tomorrow
2015-08-10 13:38:54 +00:00
Ivan Kozik
b7743e780a
Implement --ua= for setting the User-Agent
2015-08-10 13:38:00 +00:00
Ivan Kozik
ee4dbe162e
Implement --igon / --igoff
2015-08-10 13:23:43 +00:00
Ivan Kozik
bf080c7cb4
Implement --max-content-length=N for skipping large responses
2015-08-10 13:12:34 +00:00
Ivan Kozik
8b1791475d
Remove unused import
2015-08-10 13:00:37 +00:00