3252 Commits

Author SHA1 Message Date
Mike Fährmann
d33227fc38
[twitter] restore errors for protected timelines etc (fixes #2237) 2022-01-30 16:42:13 +01:00
Mike Fährmann
ebd3d5c1cc
[bunkr] fix .mp4 downloads (closes #2239) 2022-01-28 23:21:16 +01:00
Mike Fährmann
e2be199124
[gelbooru] improve and fix pagination (#2230, #2232)
Use 'id:<POSTID' as a tag instead of going through pages with 'pid'.

Something similar was already implemented in 93cef784,
but that got broken again in 3085aac4.
2022-01-27 17:44:47 +01:00
Mike Fährmann
806badbeec
release version 1.20.3 2022-01-26 01:10:44 +01:00
Mike Fährmann
8230f31800
[twitter] update query hashes 2022-01-26 00:49:46 +01:00
Mike Fährmann
c180806cec
[twitter] fix deleted/invalid retweets (#2225) 2022-01-25 23:57:13 +01:00
Mike Fährmann
a2eecc6aa8
[kemonoparty] fix DMs extraction (#2008) 2022-01-25 23:16:13 +01:00
Mike Fährmann
2bf554a896
[twitter] fix several errors (#2212, #2216, #2225)
- fix Tweets with deleted quotes
- fix suspended Tweets without 'legacy' entry
- fix unified_cards without 'type'
2022-01-25 16:13:22 +01:00
Mike Fährmann
fbd17547f5
release version 1.20.2 2022-01-24 18:24:50 +01:00
Mike Fährmann
e5242b83bf
[twitter] define directory format for events (#2109) 2022-01-24 17:44:17 +01:00
Mike Fährmann
efb3e65a6a
[sexcom] extend URL pattern (fixes #2220) 2022-01-24 01:19:40 +01:00
vsyx
3f2b6335d7
[instagram] fix highlights extraction (#2197)
* [instagram] fix highlights extraction

* [instagram] improve highlights extraction

- 'yield' individual reels instead of collecting them in a list
  and returning them all at once
- reduce 'chunk_size' to an even saver value
  (instagram.com also uses 5)
2022-01-24 00:20:12 +01:00
Mike Fährmann
5ed26e1773
[twitter] fix pinned tweets (#2216)
caused by the changes in dffa440edef9be1e169ef1e2d6bc0a492493ffce
2022-01-23 22:52:57 +01:00
Mike Fährmann
a9f78e6527
[twitter] improve error handling
- handle accounts without 'rest_id'
- handle timelines with empty 'instructions'
2022-01-23 18:01:05 +01:00
Mike Fährmann
729b07c1f5
[twitter] simplify
- use dict with common GraphQL variables
- reduce 'variables' size with custom JSON encoder instance
- centralise TwitterAPI() creation
2022-01-23 01:44:55 +01:00
Mike Fährmann
7cb29224f0
[philomena] fix search parameter escaping (#2215)
The pluses from search terms in /tags/ URLs need to be
replaced with spaces to get accepted by Philomena.
2022-01-23 01:03:37 +01:00
Mike Fährmann
9ca8bb2dc0
[twitter] improve error handling 2022-01-22 23:09:45 +01:00
Mike Fährmann
9a221494c3
[twitter] add 'event' extractor (closes #2109) 2022-01-22 20:55:50 +01:00
Mike Fährmann
14867dad6b
[twitter] fix unified cards from search results 2022-01-22 20:25:10 +01:00
Mike Fährmann
dffa440ede
[twitter] improve handling of deleted tweets (#2212) 2022-01-22 00:41:58 +01:00
Mike Fährmann
54ef874ba4
[twitter] fix retweet filter (#2212) 2022-01-21 23:53:59 +01:00
Mike Fährmann
cb43f7731b
[twitter] update to GraphQL API (#2212)
The old REST API endpoints, which were not used by Twitter since
summer 2021, are going to finally be phased out it seems, with
'/2/timeline/profile/USERID.json' being the first one.

Only Twitter's search doesn't have a GraphQL interface yet.
2022-01-21 23:34:41 +01:00
Mike Fährmann
de754590e0
add --source-address command-line option (closes #2206) 2022-01-21 17:07:56 +01:00
Mike Fährmann
698f35215e
[blogger] support new image domain (fixes #2204) 2022-01-20 23:13:07 +01:00
Mike Fährmann
c587b678d0
[mangadex] re-enable warning for external chapters (#2193) 2022-01-16 03:21:50 +01:00
Mike Fährmann
f2e8aedd74
[twitter] changes to 'cards' option
- change default value to 'true'
- only invoke youtube-dl for cards unsupported by gallery
  when 'cards' is set to "ytdl"

"cards": true   --> only download card images
"cards": "ytdl" --> download card images and
                    use youtube_dl on otherwise unsupported cards
2022-01-15 22:02:57 +01:00
Mike Fährmann
2d34d8ff8b
[reddit] allow downloading from quarantined subreddits (#2180) 2022-01-14 21:55:59 +01:00
Mike Fährmann
17c9c47ca0
[hitomi] fix 'tag' extraction (fixes #2189) 2022-01-13 16:45:46 +01:00
Mike Fährmann
df2f0c09bb
[twitter] support "image_carousel_website" unified cards 2022-01-13 16:05:52 +01:00
Mike Fährmann
cdc96e1217
[gelbooru] improve video file detection (fixes #2188)
not all files from 'https://video-cdnN.gelbooru.com' are videos
2022-01-12 21:33:02 +01:00
Mike Fährmann
4acc31bd9f
[newgrounds] set suitabilities filter before starting a search 2022-01-11 23:50:29 +01:00
Mike Fährmann
58a7921b5c
release version 1.20.1 2022-01-08 23:25:59 +01:00
Mike Fährmann
170711af7e
[mangadex] fix extraction (closes #2177) 2022-01-08 17:21:35 +01:00
Mike Fährmann
199e7616a7
[rule34] use https://api.rule34.xxx for API requests 2022-01-08 17:14:50 +01:00
Mike Fährmann
6e0a6c484f
apply SPECIAL_EXTRACTORS only for blacklist settings
as was the case before 010d65dc
2022-01-06 21:09:30 +01:00
Mike Fährmann
37beb1298e
[newgrounds] add 'search' extractor (closes #2161) 2022-01-06 19:32:39 +01:00
Mike Fährmann
8b910dd8ae
[hitomi] fix image URLs
again and again ...
2022-01-06 18:21:26 +01:00
Mike Fährmann
dcfe08838d
restore -d/--dest functionality
change short option for --directory from -d to -D
2022-01-03 18:30:36 +01:00
Mike Fährmann
3085aac4d8
[gelbooru] handle changed API response format (#2157) 2022-01-03 16:42:48 +01:00
Mike Fährmann
38e2af29d6
[hitomi] fix image URLs
update '_parse_gg()' yet again
2022-01-03 16:41:00 +01:00
Mike Fährmann
6f2e0c9c3d
fix cookie checks for patreon, fanbox, fantia
The changes in 9a255344 caused a warning about missing cookies to be
displayed even if those cookies were present, because _check_cookies()
did not account for an empty cookiedomain.
2022-01-01 17:55:58 +01:00
Mike Fährmann
1e0278702d
[hitomi] update '_parse_gg()' 2022-01-01 17:55:58 +01:00
Mike Fährmann
3b7c7daa76
improve UNC path handling (#2126)
always call 'abspath()' on the directory path to handle cases when the
current working directory is UNC and 'base-directory' is relative.
2021-12-30 22:22:19 +01:00
Mike Fährmann
47eae4c393
release version 1.20.0 2021-12-29 22:59:14 +01:00
Mike Fährmann
becc7f85a6
[hitomi] fix image URLs 2021-12-29 22:46:17 +01:00
Mike Fährmann
6af8d71da6
[kemonoparty] use service as subcategory (closes #2147) 2021-12-29 22:46:17 +01:00
Vrihub
96fcff182c
generic extractor (#735)
* Generic extractor, see issue #683

* Fix failed test_names test, no subcategory needed

* Prefix directory_fmt with "generic"

* Relax regex (would break some urls)

* Flake8 compliance

* pattern: don't require a scheme

This fixes a bug when we force the generic extractor on urls without a
scheme (that are allowed by all other extractors).

* Fix using g: and r: on urls without http(s) scheme

Almost all extractors accept urls without an initial http(s) scheme.

Many extractors also allow for generic subdomains in their "pattern"
variable; some of them implement this with the regex character class
"[^.]+" (everything but a dot).

This leads to a problem when the extractor is given a url starting
with g: or r: (to force using the generic or recursive extractor)
and without the http(s) scheme: e.g. with "r:foobar.tumblr.com"
the "r:" is wrongly considered part of the subdomain.

This commit fixes the bug, replacing the too generic "[^.]+" with the
more specific "[\w-]+" (letters, digits and "-", the only characters
allowed in domain names), which is already used by some extractors.

* Relax imageurl_pattern_ext: allow relative urls

* First round of small suggested changes

* Support image urls starting with "//"

* self.baseurl: remove trailing slash

* Relax regexp (didn't catch some image urls)

* Some fixes and cleanup

* Fix domain pattern; option to enable extractor

Fixed the domain section for "pattern", to pass "test_add" and
"test_add_module" tests.
Added the "enabled" configuration option (default False) to enable the
generic extractor. Using "g(eneric):URL" forces using the extractor.
2021-12-29 22:39:29 +01:00
Mike Fährmann
4376b39a2b
[sexcom] fix and improve embed extraction (fixes #2145) 2021-12-28 21:59:39 +01:00
Mike Fährmann
6d190834ee
[instagram] fix error when PostPage data is not in GraphQL format
(#2037)
2021-12-28 00:27:59 +01:00
Mike Fährmann
4edf43891c
add -d/--directory and -f/--filename command-line arguments 2021-12-27 23:31:54 +01:00