3633 Commits

Author SHA1 Message Date
Mike Fährmann
4acc31bd9f
[newgrounds] set suitabilities filter before starting a search 2022-01-11 23:50:29 +01:00
Mike Fährmann
58a7921b5c
release version 1.20.1 2022-01-08 23:25:59 +01:00
Mike Fährmann
170711af7e
[mangadex] fix extraction (closes #2177) 2022-01-08 17:21:35 +01:00
Mike Fährmann
199e7616a7
[rule34] use https://api.rule34.xxx for API requests 2022-01-08 17:14:50 +01:00
Mike Fährmann
3c79c9b271
document extended blacklist/whitelist syntax (#2025)
and not just in the commit message of 010d65dc
2022-01-06 23:36:57 +01:00
Mike Fährmann
6e0a6c484f
apply SPECIAL_EXTRACTORS only for blacklist settings
as was the case before 010d65dc
2022-01-06 21:09:30 +01:00
Mike Fährmann
37beb1298e
[newgrounds] add 'search' extractor (closes #2161) 2022-01-06 19:32:39 +01:00
Mike Fährmann
8b910dd8ae
[hitomi] fix image URLs
again and again ...
2022-01-06 18:21:26 +01:00
Mike Fährmann
dcfe08838d
restore -d/--dest functionality
change short option for --directory from -d to -D
2022-01-03 18:30:36 +01:00
Mike Fährmann
3085aac4d8
[gelbooru] handle changed API response format (#2157) 2022-01-03 16:42:48 +01:00
Mike Fährmann
38e2af29d6
[hitomi] fix image URLs
update '_parse_gg()' yet again
2022-01-03 16:41:00 +01:00
Mike Fährmann
6f2e0c9c3d
fix cookie checks for patreon, fanbox, fantia
The changes in 9a255344 caused a warning about missing cookies to be
displayed even if those cookies were present, because _check_cookies()
did not account for an empty cookiedomain.
2022-01-01 17:55:58 +01:00
Mike Fährmann
1e0278702d
[hitomi] update '_parse_gg()' 2022-01-01 17:55:58 +01:00
Mike Fährmann
3b7c7daa76
improve UNC path handling (#2126)
always call 'abspath()' on the directory path to handle cases when the
current working directory is UNC and 'base-directory' is relative.
2021-12-30 22:22:19 +01:00
Mike Fährmann
47eae4c393
release version 1.20.0 2021-12-29 22:59:14 +01:00
Mike Fährmann
becc7f85a6
[hitomi] fix image URLs 2021-12-29 22:46:17 +01:00
Mike Fährmann
6af8d71da6
[kemonoparty] use service as subcategory (closes #2147) 2021-12-29 22:46:17 +01:00
Mike Fährmann
fa7d92f7a9
add docs for 'extractor.generic.enabled' 2021-12-29 22:46:17 +01:00
Vrihub
96fcff182c
generic extractor (#735)
* Generic extractor, see issue #683

* Fix failed test_names test, no subcategory needed

* Prefix directory_fmt with "generic"

* Relax regex (would break some urls)

* Flake8 compliance

* pattern: don't require a scheme

This fixes a bug when we force the generic extractor on urls without a
scheme (that are allowed by all other extractors).

* Fix using g: and r: on urls without http(s) scheme

Almost all extractors accept urls without an initial http(s) scheme.

Many extractors also allow for generic subdomains in their "pattern"
variable; some of them implement this with the regex character class
"[^.]+" (everything but a dot).

This leads to a problem when the extractor is given a url starting
with g: or r: (to force using the generic or recursive extractor)
and without the http(s) scheme: e.g. with "r:foobar.tumblr.com"
the "r:" is wrongly considered part of the subdomain.

This commit fixes the bug, replacing the too generic "[^.]+" with the
more specific "[\w-]+" (letters, digits and "-", the only characters
allowed in domain names), which is already used by some extractors.

* Relax imageurl_pattern_ext: allow relative urls

* First round of small suggested changes

* Support image urls starting with "//"

* self.baseurl: remove trailing slash

* Relax regexp (didn't catch some image urls)

* Some fixes and cleanup

* Fix domain pattern; option to enable extractor

Fixed the domain section for "pattern", to pass "test_add" and
"test_add_module" tests.
Added the "enabled" configuration option (default False) to enable the
generic extractor. Using "g(eneric):URL" forces using the extractor.
2021-12-29 22:39:29 +01:00
Mike Fährmann
4376b39a2b
[sexcom] fix and improve embed extraction (fixes #2145) 2021-12-28 21:59:39 +01:00
Mike Fährmann
b5b4f5a168
use 'build_extractor_filter' in test_results.py 2021-12-28 17:25:07 +01:00
Mike Fährmann
6d190834ee
[instagram] fix error when PostPage data is not in GraphQL format
(#2037)
2021-12-28 00:27:59 +01:00
Mike Fährmann
4edf43891c
add -d/--directory and -f/--filename command-line arguments 2021-12-27 23:31:54 +01:00
Mike Fährmann
dd67e24aa9
[lolisafe] include file ID in filenames
More precisely, it now splits the full 'filename' into 'name' and 'id'
instead of overwriting 'filename'. The format string stays the same as
before. Use '{name}.{extension}' to restore the old behavior.

before:
- filename: foobar
- id      : 12345

now:
- filename: foobar-12345
- name    : foobar
- id      : 12345
2021-12-25 17:16:45 +01:00
Mike Fährmann
f3d61de18d
[artstation] create directories per asset (closes #2136) 2021-12-25 17:16:45 +01:00
Mike Fährmann
49a50fb2eb
[500px] create directories per photo 2021-12-25 17:16:45 +01:00
Mike Fährmann
89bebe1bef
[500px] add 'favorite' extractor (closes #1927) 2021-12-25 17:16:45 +01:00
Mike Fährmann
22b0433985
[fanbox] support pixiv redirects (closes #2122) 2021-12-25 17:15:39 +01:00
Mike Fährmann
281828b58b
[tumblrgallery] improve search pagination (fixes #2132) 2021-12-24 03:42:28 +01:00
Mike Fährmann
9b67e63a89
[ytdl] update to latest yt-dlp changes (fixes #2124) 2021-12-24 01:50:47 +01:00
Mike Fährmann
4bec34fc94
[pixiv] allow setting a date range for search results (#2133)
with the 'scd' and 'ecd' query parameters
2021-12-23 23:03:39 +01:00
Mike Fährmann
882c614281
add album extractor for lolisafe/chibisafe instances
- support bunkr.is (closes #2038)
- support zz.ht    (closes #2105)
2021-12-21 19:24:17 +01:00
Mike Fährmann
7bf1d3fd32
rename --write-infojson to --write-info-json
to be consistent with the name used in youtube-dl/yt-dlp
(the old --write-infojson still works)
2021-12-21 00:21:39 +01:00
Mike Fährmann
d441888bfb
[deviantart] adjust API endpoints
Start all endpoints with a forward slash '/'
to be consistent with other API interfaces.
2021-12-21 00:18:06 +01:00
Mike Fährmann
8f0cf0bf71
[deviantart] use '/browse/newest' for most-recent searches
(#2096)
2021-12-20 22:40:03 +01:00
Mike Fährmann
0bd7607da5
[tumblrgallery] improve 'id' extraction (#2115) 2021-12-19 05:46:02 +01:00
Mike Fährmann
ac80474371
handle UNC paths (#2113) 2021-12-19 04:52:00 +01:00
Mike Fährmann
47df50a2ad
add --sleep-request and --sleep-extractor command-line options 2021-12-19 03:18:50 +01:00
Mike Fährmann
64cf26eaf4
allow specifying sleep-* options as string
either as single value or as range: "3.5", "2.1 - 5.0"
2021-12-18 23:28:56 +01:00
Mike Fährmann
0d02a7861e
[tumblrgallery] fix extraction (closes #2112) 2021-12-17 19:55:53 +01:00
Mike Fährmann
62692c6842
[exhentai] add 'source' option
setting it to "hitomi" downloads the corresponding gallery from
hitomi.la; might be extended to other sources in the future
2021-12-16 23:16:19 +01:00
Mike Fährmann
099ed72de7
[hitomi] disable extra 'metadata' by default
safes one HTTP request that not needed with default filename settings
2021-12-16 22:21:07 +01:00
Mike Fährmann
9a25534490
use Extractor._check_cookies() for all cookie checks 2021-12-16 02:21:16 +01:00
Mike Fährmann
63c6bc26b5
[rule34us] extract tags per category (#1527)
like for other boorus with 'tags': true
2021-12-16 00:06:52 +01:00
Mike Fährmann
f587458a3c
[twitter] include '4096x4096' as a default image fallback
(closes #2107, closes #1881)
2021-12-15 23:19:30 +01:00
Mike Fährmann
8ed282f7f2
[kemonoparty] support coomer.party URLs (#2100) 2021-12-15 16:21:05 +01:00
Mike Fährmann
87ce3fa669
[furaffinity] warn when no session cookies were found 2021-12-15 16:21:05 +01:00
Mike Fährmann
159631c808
[philomena] use a default 'filter_id' if non is given 2021-12-15 16:20:53 +01:00
Mike Fährmann
ad30653b17
allow running a BaseExtractor for any URL
by prefixing it with '<base-category>:'

For example:
  shopify:https://partakefoods.com/products/crunchy-cookie-variety-pack
  gelbooru_v01:https://5naf.booru.org/index.php?page=post&s=view&id=46963

Available base categories are:
  mastodon, shopify, moebooru, gelbooru_v01, gelbooru_v02,
  reactor, foolslide, foolfuuka,  philomena
2021-12-15 00:32:17 +01:00
Mike Fährmann
299bd2f1f5
[rule34us] add 'tag' and 'post' extractors (#1527) 2021-12-14 00:27:46 +01:00