Mike Fährmann
a3924d2072
[sankaku] fix swf extraction ( closes #52 )
2017-12-07 15:45:43 +01:00
Mike Fährmann
ebe9b0a04c
another attempt at downloader retry behavior
...
This commit changes the general behavior from
'Retry on every exception and abort on DownloadError' to
'Only retry on DownloadRetry exceptions and abort on every other one'
The previous version would have retried on several states which
would have no chance of ever succeeding (invalid URLs, etc.)
2017-12-07 15:31:14 +01:00
Mike Fährmann
291369eab2
various smaller changes/additions
2017-12-06 21:45:56 +01:00
Mike Fährmann
4fb6803fa6
add option to sleep before each download
2017-12-04 17:33:10 +01:00
Mike Fährmann
300346ecdf
[mangazuki] remove extractors
...
This site has been in "rebuild"-mode for a fairly long time and the
current extractor code isn't going to work for the new version either.
2017-12-04 13:36:04 +01:00
Mike Fährmann
d275b1d9a3
[khinsider] fix extraction
...
... again
2017-12-04 12:42:06 +01:00
Mike Fährmann
6b8e3003df
[hentai2read] ensure consistent extraction results
2017-12-03 02:34:35 +01:00
Mike Fährmann
a1980b16f3
[gelbooru] various improvements
...
- better metadata for pools
- map ratings to s/q/e like other boorus do
- skip() support
2017-12-03 01:41:30 +01:00
Mike Fährmann
93482a1f88
implement 'util.advance()'
2017-12-03 01:38:24 +01:00
Mike Fährmann
0e5057b15d
remove deprecated options
2017-12-02 15:31:57 +01:00
Mike Fährmann
8f518e03f8
add options to set maximum download rate
...
- -r/--limit-rate as cmdline option
- downloader.http.rate as config option
This implementation very roughly uses the idea of the token bucket
algorithm [1] and mostly uses Wget's approach [2] as inspiration.
[1] https://en.wikipedia.org/wiki/Token_bucket
[2] http://git.savannah.gnu.org/cgit/wget.git/tree/src/retr.c?h=v1.19.2&id=ba6b44f6745b14dce414761a8e4b35d31b176bba#n111
2017-12-02 01:47:26 +01:00
Mike Fährmann
a718c6c6cd
implement 'util.parse_bytes()'
2017-12-02 01:24:49 +01:00
Mike Fährmann
038e3b3369
[kissmanga] handle "AreYouHuman" redirects ( #51 )
2017-12-01 15:22:50 +01:00
Mike Fährmann
2b9a783fc7
[khinsider] fix extraction
2017-12-01 14:00:37 +01:00
Mike Fährmann
3dc1169736
use own mapping before relying on the 'mimetypes' module
2017-12-01 13:50:31 +01:00
Mike Fährmann
214972bc9a
[gelbooru] use manual extraction
...
... to compensate for their disabled API.
(https://gelbooru.com/index.php?page=forum&s=view&id=3875 )
This also adds an extractor for image-pools.
2017-11-29 20:48:17 +01:00
Mike Fährmann
55c64cad4b
[khinsider] fix filename extension and test-pattern
2017-11-28 19:35:47 +01:00
Mike Fährmann
c0bcf8e343
release version 1.0.2
2017-11-24 17:24:39 +01:00
Mike Fährmann
b14de6ffc2
[tumblr] small improvements
...
- don't transform inline GIF URLs
- set 'type' parameter for API calls if there is only
one post type selected
2017-11-24 16:51:07 +01:00
Mike Fährmann
9296a26eae
[tumblr] add warning messages
2017-11-23 16:12:07 +01:00
Mike Fährmann
65c1c53eb8
[khinsider] fix extraction
2017-11-23 15:33:49 +01:00
Mike Fährmann
12de658937
[tumblr] add options to control extraction behavior ( #48 )
...
- posts : list of post-types to inspect
- inline : scan post bodies for inline images
- external: follow external links
2017-11-23 15:32:54 +01:00
Mike Fährmann
077f8c12be
[tumblr] original video URLs + continuous offset
2017-11-20 20:51:02 +01:00
Mike Fährmann
8eb12ebeae
[tumblr] support more post/media types ( #48 )
...
This adds support for audio and video posts (most videos are shared
from youtube/instagram which isn't supported -> youtube-dl),
as well as link posts and image-search inside of text posts.
Most of this is just WIP and will need some sort of improvement
and options to enable/disable different media types etc.
2017-11-18 23:11:32 +01:00
Mike Fährmann
6c9da67581
apply selection options (filter, range) when using '-j'
2017-11-18 17:35:57 +01:00
Mike Fährmann
b8cdd42cab
[senmanga] fix extraction (again)
...
this is basically a re-revert of 2ace5c7
2017-11-18 17:23:32 +01:00
Mike Fährmann
e6814aebe2
add 'extractor.*.user-agent' config option
2017-11-15 14:01:33 +01:00
Mike Fährmann
6913eeaa40
[powermanga] replace manga extractor unit test
...
My Hero Academia is gone
2017-11-15 14:01:24 +01:00
Mike Fährmann
7e0d9257a7
[hbrowse] fix manga extraction
2017-11-15 13:59:50 +01:00
Mike Fährmann
3c576d10c0
[seiga] better metadata + 'skip()' support
2017-11-15 13:58:35 +01:00
Mike Fährmann
f72318e593
[seiga] support more than 200 images
...
Due to API restrictions and/or missing knowledge about and
documentation of API usage, it was only possible to retrieve the
latest 200 images of a niconico seiga user with said API.
The new approach manually visits each HTML page and gets its
information from there.
2017-11-13 20:46:24 +01:00
Mike Fährmann
baf8094868
improve Extractor.request()'s retry behavior
2017-11-13 20:37:11 +01:00
Mike Fährmann
7e7b64162b
[batoto] handle error 10031
2017-11-12 20:49:37 +01:00
Mike Fährmann
79bcaa8726
improve downloader retry behavior
...
- only retry download on 5xx and 429 status codes
- immediately fail on 4xx status codes
2017-11-10 21:46:18 +01:00
Mike Fährmann
5ee8ca0319
release version 1.0.1
2017-11-10 08:54:33 +01:00
Mike Fährmann
42e948584d
fix downloader error handling
...
RequestException being a subclass of OSError caused all exceptions
during file downloads to be ignored/re-raised.
2017-11-07 15:23:07 +01:00
Mike Fährmann
92027f67f9
use consistent names for URL constants
...
root := <scheme>://<host>
base_url := <root>/<common path>
2017-11-06 20:56:49 +01:00
Mike Fährmann
69cbc0619f
[mangastream] fix 'next-page' URLs ( fixes #49 )
2017-11-04 11:50:40 +01:00
Mike Fährmann
980fd3616d
[tumblr] use API v2 ( #48 )
2017-11-03 22:16:57 +01:00
Mike Fährmann
d6bed9f36f
[tumblr] prevent premature exit to get all images ( fixes #48 )
2017-11-03 14:59:31 +01:00
Mike Fährmann
305da540c3
[mangahere] fix metadata extraction
2017-11-03 14:54:46 +01:00
Mike Fährmann
2d0cfb33e1
[xvideos] add user profile extractor ( #45 )
2017-11-02 17:28:35 +01:00
Mike Fährmann
a393e6e538
[xvideos] add gallery extractor ( #45 )
2017-11-02 15:36:53 +01:00
Mike Fährmann
3a8a0c1f35
[imgbox] rewrite / fix extraction ( closes #47 )
2017-11-01 13:01:59 +01:00
Mike Fährmann
f97207a8e6
release version 1.0.0
2017-10-27 16:22:51 +02:00
Mike Fährmann
707b15b586
create missing directories for 'part-directory'
...
also some code improvements regarding downloader config values
2017-10-27 12:22:45 +02:00
Mike Fährmann
035ef655f1
[imagefap] update unit tests
...
old gallery/image has been deleted
2017-10-27 12:22:16 +02:00
Mike Fährmann
caf26412dd
add option to set alternate location of .part files ( #29 )
...
Note: The path set for 'downloader.*.part-directory' needs to point to an
already existing directory.
2017-10-26 00:16:48 +02:00
Mike Fährmann
ea8ca4cfa4
add 'util.expand_path()'
2017-10-26 00:04:28 +02:00
Mike Fährmann
9a41002b77
fix partial downloads for 'text:' URLs
...
Using a filesize in bytes as offset into a Python string is not
a good idea if said file contains non-ASCII characters.
2017-10-25 15:04:45 +02:00