1653 Commits

Author SHA1 Message Date
Mike Fährmann
0b2ff406f6
[plurk] add timeline- and post-extractors (#212) 2019-04-14 21:48:38 +02:00
Mike Fährmann
dcd1bd3b6f
release version 1.8.2 2019-04-12 10:38:51 +02:00
Mike Fährmann
d6ddb74cde
update test results
- deviantart: 'index' is now an integer
- flickr: image file with lower quality
- paheal: image server name changed
- rule34: post got deleted
2019-04-12 09:59:48 +02:00
Mike Fährmann
87b0929bec
Revert "[flickr] restore image quality"
This reverts commit 3f513f10564a10ece8650e64d2233d8482fc14c7.

Both live.staticflickr and farmN.staticflickr servers now produce the
same image file with a lower overall quality than before this change in
Flickr's end.
2019-04-11 20:31:05 +02:00
Mike Fährmann
e7cd5510d5
[pixnet] add extractors (closes #177)
for:
- users/blogs: http://albertayu773.pixnet.net/
- folders: https://albertayu773.pixnet.net/album/folder/1405768
- sets   : https://albertayu773.pixnet.net/album/set/15078995
- photos : https://albertayu773.pixnet.net/album/photo/159443828
2019-04-11 19:27:02 +02:00
Mike Fährmann
155e1faeaf
[imagebam] support galleries with >100 images (fixes #219) 2019-04-11 19:12:27 +02:00
Mike Fährmann
9587aea98f
[deviantart] don't rewrite URLs for newer deviations
The '/intermediary/' trick stopped working for recently posted
deviations, but it still appears to be functional for older ones.
2019-04-11 10:37:01 +02:00
Mike Fährmann
f2220938cb
[mangoxo] improve channel extraction (#184) 2019-04-10 18:56:21 +02:00
Mike Fährmann
d9b94a585d
[mangoxo] add login support (#184)
A very recent change: It is now only possible to see more
than the first 5 images of an album if you are logged in.
2019-04-10 18:55:25 +02:00
Mike Fährmann
49a6522c38
ensure consistent headers and params ordering
Necessary to avoid being labeled a bot and getting a CAPTCHA response
after solving a Cloudflare challenge.
2019-04-09 10:52:27 +02:00
Mike Fährmann
9af9823067
increase required 'requests' version to 2.11.0
- uses an OrderedDict for session.headers (since 2.9.2)
- ships with urllib3 1.16, which is the first version to have an
  'allowed_gai_family()' function
2019-04-09 10:41:14 +02:00
Mike Fährmann
e730fc9045
[twitter] add login support (#214) 2019-04-09 09:27:49 +02:00
林博仁(Buo-ren Lin)
fad5833245 snap description: Change hyperlink markup (#216)
Snap Store no longer supports Markdown's titled hyperlink markup [due to
its ugliness in the `snap info` output in the terminal][1], this patch
changes the description to reference style instead.

[1]:
https://forum.snapcraft.io/t/use-of-markdown-in-snap-metadata-summary-description/2128/23
2019-04-09 09:01:25 +02:00
林博仁(Buo-ren Lin)
640fc72c75 snap: Fix scriptlet leaked into the final snap (#215)
The selective-checkout scriptlet is only used during the build step, don't let it make into the final snap.

Signed-off-by: 林博仁(Buo-ren Lin) <Buo.Ren.Lin@gmail.com>
2019-04-09 09:00:41 +02:00
Mike Fährmann
2c32dc76cb
[yaplog] update metadata structure (#190)
Put all blog post related fields in its own dict.

'image_id' -> 'id'
'post_id'  -> 'post[id]'
'title'    -> 'post[title]'
etc ...
2019-04-06 16:40:07 +02:00
Mike Fährmann
35919a9bb8
[livedoor] add blog- and post-extractors (#190) 2019-04-06 16:27:48 +02:00
Mike Fährmann
3f513f1056
[flickr] restore image quality
Flickr started serving images from live.staticflickr.com (see ec88ff1),
but the old farmN.staticflickr.com URLs still work - at least for the
time being.
Filesize (and most likely quality as well) for images from live.…  is
severely reduced compared to images from farmN.… for non-original files,
so all live URLs are replaced to point to a randomly chosen farm server.
2019-04-06 11:26:10 +02:00
Mike Fährmann
060859cc68
fix URL patterns
allow https:// as well as http://
2019-04-05 23:15:19 +02:00
Mike Fährmann
13526f3624
[yaplog] fix archive_id and posts with more than 24 images
- 'post_id' and 'image_id' are only unique per user
- /image/ pages only show a maximum of 24 images, but there can be more
  images than that in a blog post
- let extraction run in its own thread and maybe improve speed
- #190
2019-04-05 23:15:03 +02:00
Mike Fährmann
2ff043edfa
[yaplog] add user- and post-extractors (#190) 2019-04-04 17:56:56 +02:00
Mike Fährmann
790f15a56f
[photobucket] use HTTPS 2019-04-03 18:30:45 +02:00
Mike Fährmann
6da665f32e
[mangoxo] add album- and channel-extractors (closes #184) 2019-04-03 07:55:51 +02:00
Mike Fährmann
21e80d60ff
[wikiart] docstring fixes 2019-04-03 07:28:10 +02:00
Mike Fährmann
c70b21248d
[wikiart] add extractors (#179)
for
- artists:          https://www.wikiart.org/en/thomas-cole
- artist-listings:  https://www.wikiart.org/en/artists-by-century/12
- artwork-listings: https://www.wikiart.org/en/paintings-by-media/grisaille
2019-04-02 17:34:57 +02:00
Mike Fährmann
9ebd29fcc1
update cloudflare bypass (wip)
This commit adds support for the two new JS expressions embedded in the
overall challenge code.

It does compute the correct 'js_answer' value, but the HTTP request to
/cdn-cgi/l/chk_jschl to get the 'cf_clearance' cookie always results in
a 403 response with a CAPTCHA inside (hence 'wip')

All steps to make this HTTP request indistinguishable from a regular web
browser (which passes the test) show no effect. This includes:
- using the exact same HTTP headers as a web browser
- follow query argument order
- different wait times
2019-04-01 15:14:59 +02:00
Mike Fährmann
0f02e85961
[reactor] use "/full/" URLs (closes #210)
Putting a "/full/" in image URLs potentially gives higher resolution
and better quality.
2019-03-30 22:14:57 +01:00
Mike Fährmann
17c11393f5
[weibo] allow user-ids in status URLs 2019-03-30 18:38:58 +01:00
Mike Fährmann
ec88ff1562
[flickr] relax unit test results
Images are now randomly served from the 'live.staticflickr.com' domain
instead of the "old" 'farmN.staticflickr.com' one, making it impossible
to use static 'url' and 'keyword' hashes as results.

Image quality doesn't appear to be effected by which image-server is
used. Files from 'farmN' and 'live' are the same.
2019-03-30 18:31:59 +01:00
Mike Fährmann
bc2020e86c
release version 1.8.1 2019-03-29 17:37:11 +01:00
Mike Fährmann
00d604cafb
[luscious] fix SearchExtractor URL-pattern 2019-03-29 15:58:08 +01:00
Mike Fährmann
0c991a3155
add convenience targets to Makefile 2019-03-29 15:35:00 +01:00
Mike Fährmann
1384ebf907
[luscious] fix metadata extraction
- remove 'artist', 'language', and 'lang' fields
- replace 'section' with 'genre'
- provide 'tags' as list
- use GalleryExtractor as base class
2019-03-29 13:06:02 +01:00
林博仁(Buo-ren Lin)
c3a75a0c40 fixup! Snap packaging improvements (#207) (#208)
The build failed due to missing `requests` build dependency, this patch
drops the unused component to build to avoid the problem.

The manpages are still built for the upcoming read-manual workaround.

Signed-off-by: 林博仁(Buo-ren Lin) <Buo.Ren.Lin@gmail.com>
2019-03-29 13:05:26 +01:00
林博仁(Buo-ren Lin)
81d4d49234 Snap packaging improvements (#207)
* fixup! snap: Support official config paths via *-files confinement interfaces (#197)

* FIXME no longer applied
* Obsoleted HOME environment variable assignment

Signed-off-by: 林博仁(Buo-ren Lin) <Buo.Ren.Lin@gmail.com>

* snap: Migrate to selective-checkout

The selective-pull stage snap is superseded by selective-checkout, prefer the new one.

Refer-to: Selective-checkout: Check out the tagged release revision if it isn't promoted to the stable channel <https://forum.snapcraft.io/t/the-selective-pull-scriptlet-stage-snap-workaround/10389>
Signed-off-by: 林博仁(Buo-ren Lin) <Buo.Ren.Lin@gmail.com>

* snap: Support bash completion

Refer-to: Scriptlets <https://docs.snapcraft.io/scriptlets/4892>
Refer-to: Tab completion for snaps <https://docs.snapcraft.io/tab-completion-for-snaps/2261>
Signed-off-by: 林博仁(Buo-ren Lin) <Buo.Ren.Lin@gmail.com>

* snap: Implement interface connection warning in the launcher

This patch ensures that the user will be acknowledge the missing
connection to the `removable-media` interface.

Signed-off-by: 林博仁(Buo-ren Lin) <Buo.Ren.Lin@gmail.com>
2019-03-29 09:55:43 +01:00
林博仁(Buo-ren Lin)
c689bc2971 Ignore generated manpages and bash completion data (#206)
Signed-off-by: 林博仁(Buo-ren Lin) <Buo.Ren.Lin@gmail.com>
2019-03-29 09:55:27 +01:00
Mike Fährmann
5398bfbd69
[exhentai] fix search and favorite extraction
removes basically all metadata, but that can be compensated for with the
right search query. writing "parsers" for all 4 possible views that have
been introduced in the latest changes is too much of a hassle ...
2019-03-28 16:22:02 +01:00
Mike Fährmann
369eb66125
consistently use '*' for rst lists 2019-03-28 16:21:41 +01:00
Mike Fährmann
089923e3dd
parse configuration.rst to build gallery-dl.conf.5 (#150)
… a man-page containing all of gallery-dl's configuration file options.

This implementation relies on Python dicts preserving their insertion
order. Python 3.4 and 3.5 need to use OrderedDict or they produce
randomly ordered man-page sections.

The man-page formatting is a bit rough around the edges, but it works
for the most part. The only real "problem" are inline-links, but it's
better if they are left in there.
2019-03-28 16:20:52 +01:00
Mike Fährmann
5476404a5c
update and fix Cloudflare bypass 2019-03-25 22:53:36 +01:00
Mike Fährmann
0df4edc20a
skip missing data_files in setup.py (#204)
[ci skip]
2019-03-24 18:05:54 +01:00
Leonardo Taccari
790b1336a6 [instagram] Add support for hashtags
Add support for hashtags (TagPage-s), i.e. explore/tags/<tag> URLs.

This also introduce a get_metadata() method in order to append
possible further metadata per-(sub)extractor.

Refactor and generalize _extract_profilepage() to _extract_page()
in order to be reused by _extract_profilepage() and _extract_tagpage()
simply by passing the type of page (`ProfilePage' or `TagPage') and picking up
the respective fields in shared data.
2019-03-24 14:05:34 +01:00
Mike Fährmann
114b8eecc5
[downloader;ytdl] utilize '_ytdl_index' metadata fields 2019-03-24 11:27:20 +01:00
Mike Fährmann
a9bdd0f153
[instagram] fix syntax for Python 3.4
Python 3.4 doesn't like '**common' in dict literals.
This also makes '_ytdl_index' zero-based.
2019-03-24 11:25:42 +01:00
Mike Fährmann
e5f44a5160
add Makefile; include manpage&completion in setup.py (#150) 2019-03-24 11:03:02 +01:00
Mike Fährmann
eacebf41e4
fix typo in README 2019-03-24 11:03:02 +01:00
Mike Fährmann
e47a24afc7
script to generate a simple man page (#150) 2019-03-24 11:03:01 +01:00
Leonardo Taccari
1e38f65996 [instagram] Add support for GraphSidecar media types (#201)
* [instagram] Add support for GraphSidecar media types

Refactor _extract_postpage() to always return a list of medias.

Fetch common keywords and gracefully handle GraphSidecar media type
by extracting each single media and adding `sidecar_media_id' and
`sidecar_shortcode' keywords to indicate the parent of sidecar
childrens.

While here join the copyright comment lines in a single one.

Closes #178.

* [instagram] Use `yield from' instead of `for ... yield' (thanks @mikf)!

* [instagram] Adjust filename for GraphSidecar medias

Add a possible leading `media_id' of the sidecar for GraphSidecar
media.

Thanks to @mikf for the suggestion!

* [instagram] Add extra metadata for youtube-dl in GraphSidecar childrens

GraphSidecar children ytdl: URLs when consumed by youtube-dl
redirects to the URL of their parent.  In GraphSidecar-s with
multiple GraphVideo-s this leads to downloading the same video
multiple times.

Add a `_ytdl_index' field to indicate the index of the youtube-dl
playlist corresponding the children of the sidecar.

This will be used by the `ytdl' downloader.
2019-03-24 11:02:32 +01:00
Mike Fährmann
e7d0d98c88
improve FFmpeg arguments for --ugoira-conv 2019-03-23 09:50:39 +01:00
Mike Fährmann
b0f88c2ab5
script to generate a simple bash completion file (#150) 2019-03-23 09:50:39 +01:00
Mike Fährmann
6ba67b0537
[hypnohub] add extractors (closes #196) 2019-03-23 09:50:39 +01:00