1640 Commits

Author SHA1 Message Date
Mike Fährmann
1b82d36ab2
[deviantart] handle decode errors for extended_fetch results (#655)
This isn't going to solve the underlying problem, but it should at
least provide the server response when those errors happen.
2020-03-24 20:56:41 +01:00
Mike Fährmann
09f2271528
[35photo] add 'tag' extractor 2020-03-24 02:49:00 +01:00
Mike Fährmann
77fda8190c
[35photo] simplify/remove tests for the 'genre' extractor
There is still a nice genre overview page (https://35photo.pro/genre/)
but the individual sub-pages don't list photos anymore
2020-03-24 02:48:25 +01:00
Mike Fährmann
fb846c9ee5
[instagram] reduce line lengths and make flake8 happy 2020-03-23 22:56:43 +01:00
Mike Fährmann
ad2efa8509
[e621] derive from Danbooru extractors (#651)
- use extractor implementations from 'danbooru'
- use "page": "b[ID]" to paginate over results instead of
  "tags": "id:<[ID]", avoiding infinite loops with certain
  post orders
- bump User-Agent version
2020-03-22 21:08:45 +01:00
Mike Fährmann
9b39e1cd7e
[e621] fix bug in API rate limiting (#651) 2020-03-22 14:01:23 +01:00
Mike Fährmann
b607d0ad7f
[twitter] fix typo in 'x-twitter-auth-type' header (#625) 2020-03-21 23:11:39 +01:00
Mike Fährmann
2c3b9e1450
[nozomi] support multiple images per post (#646)
This changes the default filename format as well as archive IDs,
since those assumed that each post would only have one image.
2020-03-19 21:07:31 +01:00
Mike Fährmann
c606d0c854
[instagram] update pattern for user profile URLs
Allow for query parameters and fragments,
for example https://www.instagram.com/instagram/?hl=en
2020-03-18 22:24:20 +01:00
Mike Fährmann
2530db3f4d
[mangadex] transform 'date' timestamps to datetime objects 2020-03-18 02:19:19 +01:00
Mike Fährmann
ae2a33243b
[newgrounds] catch general Exceptions 2020-03-18 02:17:43 +01:00
Mike Fährmann
32e36d8f02
[sexcom] replace tests 2020-03-17 22:47:45 +01:00
Mike Fährmann
33b42dc847
[nozomi] sort search results (fixes #646) 2020-03-17 22:28:23 +01:00
Mike Fährmann
eaa60a438b
[piczel] fix extraction
- manually filter by folder_id
- extract data for single posts from embedded JSON, since the
  '/api/gallery/image/<id>' endpoint is no longer available
2020-03-17 17:12:28 +01:00
Mike Fährmann
5bcc7184c9
[danbooru][e621] increase page limits 2020-03-17 15:53:28 +01:00
Mike Fährmann
90d15e3682
[instagram] use 'itertools.chain()' 2020-03-17 15:52:44 +01:00
Leonardo Taccari
160328d21c
[instagram] Add support for user's saved medias (#644)
* [instagram] Gracefully handle possible 'HttpErrorPage' in _extract_page()

`HttpErrorPage' is returned in shared_data at least  when not authenticated or
when trying to fetch other users saved medias
(i.e. `instagram.com/<user>/saved/').

Gracefully handle it by returning nothing.

* [instagram] Add support for user's saved medias

(Please note that this need the user to be authenticated and they can
only see their saved media (not other users ones).)

Close #643.

* [instagram] Bump copyright year
2020-03-16 21:09:14 +01:00
Mike Fährmann
d3482ace7f
[furaffinity] extract more metadata
- views
- favorites
- comments
- rating
- fa_category (since 'category' is already in use)
- theme
- species
- gender
- width
- height
2020-03-13 23:56:55 +01:00
Mike Fährmann
fdd2dd5136
[kabeuchi] add 'user' extractor (closes #561) 2020-03-13 16:45:42 +01:00
Mike Fährmann
59edcdc822
[hitomi] restore metadata fields from before f33b13a
... and add a 'metadata' option to disable
visiting the gallery page and extracting data from it
if this is not needed.
2020-03-12 23:43:41 +01:00
Mike Fährmann
2d5703c493
[twitter] use a simpler data structure to store cookies in cache
Use a dict with name-value pairs instead of an entire
RequestsCookieJar object.
2020-03-12 22:02:12 +01:00
Mike Fährmann
87d4f83597
[newgrounds] make post extraction nonfatal 2020-03-10 01:49:59 +01:00
Mike Fährmann
823fbeaae6
[newgrounds] add 'favorite' extractor (#394) 2020-03-10 01:07:09 +01:00
Mike Fährmann
a45fbc38ea
[pixiv] implement 'avatar' option (#595, #623) 2020-03-09 21:18:16 +01:00
Mike Fährmann
a63a376ad2
[mangoxo] fix login 2020-03-08 23:01:51 +01:00
Mike Fährmann
ebc70e87ce
[e621] update to new interface / API endpoints (closes #635) 2020-03-06 21:12:58 +01:00
Mike Fährmann
d1cf7ccdb3
[instagram] add 'post_shortcode' metadata field (#525) 2020-03-06 15:20:32 +01:00
Mike Fährmann
32df8d06fe
[twitter] add 'bookmark' extractor (closes #625) 2020-03-06 01:20:04 +01:00
Mike Fährmann
3fb41c34c8
[bcy] reduce requests to '/item/detail/<id>' (#613)
The former implementation would try to use the embedded data from
'/item/detail/' pages for every post, even if that wasn't really
necessary.

This commit also fixes some issues with posts only visible to
logged in users.
2020-03-04 01:37:51 +01:00
Mike Fährmann
f33b13aacf
[hitomi] simplify metadata extraction
Use the data from https://ltn.hitomi.la/galleries/<id>.js for both
image URLs and metadata and ignore any gallery or reader pages.

This removes 'artist', 'characters', 'group', and 'parody' metadata
fields since this information is, as for now, only available in
gallery pages.
2020-03-04 01:22:32 +01:00
Mike Fährmann
ce5e2a58fe
[imgbb] update test results
Image server domain changed from
https://image.ibb.co/ to https://i.ibb.co/
2020-03-01 20:38:25 +01:00
Mike Fährmann
f117e32910
[danbooru] restore 'popular' functionality 2020-02-29 23:37:53 +01:00
Mike Fährmann
39b48d665b
[hiperdex] use proper name for 'chapter_minor' 2020-02-29 00:18:54 +01:00
Mike Fährmann
8fbbaa54ff
[bcy] fix partial image URLs (#613)
Images from new posts can have incomplete/partial URLs (1)
without any filename extension when fetching their data from
'/apiv3/user/selfPosts', so now all data gets taken from
'/item/detail/ID' pages.

It is currently unknown how to get the non-watermarked original version
of these images, or if that is possible at all. (2)
Images with a watermark will have their 'filter' metadata field set to
"watermark". For original images this field is an empty string "".

Enabling the 'noop' option will, in addition to the watermarked version,
yield the the '~noop.image' filter version (3),
where 'filter' is set to "noop".

(1) "https://img-bcy-qn.pstatp.com/banciyuan/3ccdff22479c4060aadc86718209b281"
(2) "https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~tplv-banciyuan-logo-v3:wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDR-eIseWlveiAheekvuWMug==.image"
(3) "https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~noop.image"
2020-02-28 22:57:10 +01:00
Mike Fährmann
86c00f9e66
[danbooru] move extractor logic from booru.py 2020-02-28 22:53:45 +01:00
Mike Fährmann
1d4a369ea2
update extractor test results 2020-02-27 22:15:40 +01:00
Mike Fährmann
7625912b31
[piczel] improve and update
- fix tag names
- fix a bug in _pagination()
- parse datetime in 'created_at' as 'date'
- rewrite main loop
- replace user profile test
2020-02-27 22:13:12 +01:00
Mike Fährmann
913b8333cc
write DeviantArt refresh-tokens to cache (#616)
Writing the token is currently disabled by default and must be
enabled with 'extractor.oauth.cache'.

'extractor.deviantart.refresh-token' must be set to '"cache"'
to use the cached token.
2020-02-25 22:55:11 +01:00
Mike Fährmann
2a4f227e08
warn about expired cookies 2020-02-25 00:34:42 +01:00
Mike Fährmann
4e361b3008
add tests for specific datetime values 2020-02-23 16:48:30 +01:00
Mike Fährmann
80ecb99089
[hitomi] fix extraction 2020-02-22 22:07:21 +01:00
Mike Fährmann
247c9e1416
[vsco] update gallery URL pattern 2020-02-22 21:39:31 +01:00
Mike Fährmann
19ae6f3fc4
update test results
- twitter:

    Don't test the whole kwdict, only the actual content, since the
    keyword hash changes whenever that user changes his display name.

- khinsider:

    Download host changed
2020-02-22 03:25:32 +01:00
Mike Fährmann
cc5079c844
[hiperdex] add chapter and manga extractors (closes #606) 2020-02-22 03:09:29 +01:00
Mike Fährmann
64bdec8430
[deviantart] check availability of intermediary URLs (fixes #609) 2020-02-21 03:10:53 +01:00
Mike Fährmann
5607dd3646
[hitomi] follow multiple redirects 2020-02-20 18:22:13 +01:00
Mike Fährmann
765b2a0527
[hentaihand] add extractors (closes #605) 2020-02-19 21:55:47 +01:00
Mike Fährmann
d94215d119
[tumblr] replace '-' with ' ' in tag searches (fixes #611)
To search for tags with actual minus signs in them
(there shouldn't be too many,) manually replace those
with url-encoded minus characters ('-' -> '%2d')
before inputting them into gallery-dl:

https://s679874.tumblr.com/tagged/tag-with-minus
 ->
https://s679874.tumblr.com/tagged/tag%2dwith%2dminus
2020-02-17 23:29:13 +01:00
Mike Fährmann
e6cd49e78b
update extractor test results 2020-02-16 21:48:46 +01:00
Mike Fährmann
5d9437b398
[vsco] skip "invalid" entities 2020-02-15 23:49:44 +01:00