1936 Commits

Author SHA1 Message Date
Mike Fährmann
35f343206c
update default SSL cipher list in urllib3 < 1.25
Cloudflare now also checks the client's SSL/TLS cipher capabilities and
produces a 403: Forbidden response with CAPTCHA if they are insufficient.

This commit replaces the default cipher list in urllib3 < 1.25 with the
one from 1.25 (1), which doesn't cause problems as long as the client
platform actually supports these ciphers. On some platforms (tested with
Python 3.4 on Linux and Python 3.7 on an outdated Windows 7 VM) it is
necessary to install pyOpenSSL to get everything to work.

Explicitly setting a minimum/maximum version for urllib3 is also no
longer necessary and installing gallery-dl will therefore not pull a
incompatible urllib3 version (#229)

Fixes the "403: Forbidden" error on Artstation (#227)

(1) 0cedb3b0f1
2019-05-03 22:40:04 +02:00
林博仁(Buo-ren Lin)
77eae04bcf snap: Use descriptive interface reference for *-files plugs
New Snap Store policy requires *-files interface plugs be named in a specific name.

Fixes #241.

Refer-to: The personal-files interface - doc - snapcraft.io <https://forum.snapcraft.io/t/the-personal-files-interface/9357>
Refer-to: The system-files interface - doc - snapcraft.io <https://forum.snapcraft.io/t/the-system-files-interface/9358>
Signed-off-by: 林博仁(Buo-ren Lin) <Buo.Ren.Lin@gmail.com>
2019-05-02 10:29:57 +02:00
Mike Fährmann
fc5e4f2b21
[hitomi] simplify data extraction code 2019-05-01 11:14:21 +02:00
Mike Fährmann
2756cc8dde
[hitomi] set Referer header (fixes #239) 2019-05-01 10:56:00 +02:00
Mike Fährmann
5582b06ae4
fix tests with 'urllist' messages 2019-04-30 16:31:48 +02:00
Mike Fährmann
dcc1592dbf
[twitter] add fallback URLs (#237) 2019-04-30 15:57:21 +02:00
Mike Fährmann
1c665fd4bd
[mangoxo] fix login 2019-04-30 15:57:06 +02:00
林博仁(Buo-ren Lin)
8acbe863cb Support snap building and simple testrun in Travis CI
This ensure snap build-time and run-time problems will be noticed more
promptly by the maintainers, like issue #229.

Signed-off-by: 林博仁(Buo-ren Lin) <Buo.Ren.Lin@gmail.com>
2019-04-30 15:55:50 +02:00
林博仁(Buo-ren Lin)
2df3aaf966 Migrate to core base
Snapcraft now supports a special `core` base for modern snaps to target
on which has the same effect of using the Ubuntu 16.04 based `core` snap
in the past(it is special as the intended base snap to support Ubuntu
16.04 is rather the `core16` base snap, but it isn't ready yet).

This patch migrates the gallery-dl snap to `core` base, which allows the
user to avoid installing the additional `core18` base snap in order to
run it.  It also drops the redundant stage-packages entries that was
collected by Snapcraft automatically in the first place.

Signed-off-by: 林博仁(Buo-ren Lin) <Buo.Ren.Lin@gmail.com>
2019-04-30 15:55:14 +02:00
Mike Fährmann
add7e693d0
[tumblr] provide parsed 'date' metadata (#232) 2019-04-29 17:30:42 +02:00
Mike Fährmann
9544683d56
[deviantart] provide 'date' metadata (#232) 2019-04-29 17:30:24 +02:00
Mike Fährmann
5018781898
allow type tests by name 2019-04-29 17:27:59 +02:00
Mike Fährmann
df7cdb648a
specify maximum versions for requests & urllib3 (#229)
Wouldn't be necessary if pip would do proper version management.
As things are right now, pip ignores the urllib3 version requirements
from requests because gallery-dl is specifying its own.
2019-04-29 17:24:30 +02:00
Mike Fährmann
76df628b13
rewrite invalid cloudflare redirect locations
After solving a challenge on komikcast.com, cloudflare would redirect to
https:/komikcast.com (with only one '/') when testing on TravisCI.
2019-04-27 16:22:42 +02:00
Mike Fährmann
0d7e8be987
[dynastyscans] simplify image extractor 2019-04-27 13:24:30 +02:00
Mike Fährmann
9aa0bb5afe
[dynastyscans] encode "[]" in search queries
urllib3 1.25 classifies URLs with unencoded "[" or "]" as invalid
and raises an exception
2019-04-27 13:22:40 +02:00
Mike Fährmann
fe849382d8
[komikcast] improve extraction 2019-04-26 15:14:10 +02:00
Mike Fährmann
c35217e9a3
specify version requirements for urllib3
urllib3 versions 1.24.1 and 1.24.2 cause HTTP requests to
https://www.artstation.com/users/<username>/quick.json
to fail with a 403: Forbidden status code (#227),
and provoke a CAPTCHA response after solving a Cloudflare challenge.
2019-04-26 12:58:20 +02:00
Mike Fährmann
bc26fc2439
implement '--clear-cache'
Effectively clears all cached values from the cache database by
executing "DELETE FROM data" without any further user input.
2019-04-25 21:31:01 +02:00
Mike Fährmann
0318c610dc
[sexcom] add extractor for search results (#147) 2019-04-24 22:10:01 +02:00
Mike Fährmann
a247c94c34
[sexcom] add pin and board extractors (#147) 2019-04-24 22:09:19 +02:00
Mike Fährmann
6264a46212
use 'utcfromtimestamp()'
'fromtimestamp()' converts its results to the local timezone and causes
problems when running tests on a different machine.
2019-04-21 16:22:53 +02:00
Mike Fährmann
d84e7c6861
[twitter] extract 'date' metadata (#224) 2019-04-21 15:41:22 +02:00
Mike Fährmann
d670de0344
implement 'text.parse_timestamp()' 2019-04-21 15:28:27 +02:00
Mike Fährmann
f2cf1c1d73
use 'text.extract_from()' in a few places 2019-04-21 15:19:20 +02:00
Mike Fährmann
21a7e395a7
implement convenience wrapper for text.extract functionality 2019-04-19 22:30:11 +02:00
Mike Fährmann
8f249f1d54
improve text.extract_iter() performance
by roughly 40% through
- inlining code
- pre-calculating reused values
- entering a try-except block only once
2019-04-18 23:37:17 +02:00
Mike Fährmann
e25ebc4bff
don't disable certificate checks anymore
Executables generated with PyInstaller auto-include the root certificate
file and certificate checks now work out-of-the-box.
2019-04-17 13:27:19 +02:00
Mike Fährmann
166a721c19
include PyInstaller executable in release script 2019-04-17 12:44:45 +02:00
Mike Fährmann
18345381f3
add PyInstaller script and hook (#166) 2019-04-17 12:43:29 +02:00
Mike Fährmann
96c7ccd380
update/cleanup Python dev scripts
- put common code in its own util.py file
- same Python3 shebang for all scripts
- add file docstrings
- fix format string replacement fields in man page template
2019-04-16 21:21:34 +02:00
Mike Fährmann
7973419b54
restrict downloader and postprocessor module imports 2019-04-16 18:09:30 +02:00
Mike Fährmann
70be494161
[plurk] add a 'comments' options (#212) 2019-04-14 22:12:46 +02:00
Mike Fährmann
0b2ff406f6
[plurk] add timeline- and post-extractors (#212) 2019-04-14 21:48:38 +02:00
Mike Fährmann
dcd1bd3b6f
release version 1.8.2 2019-04-12 10:38:51 +02:00
Mike Fährmann
d6ddb74cde
update test results
- deviantart: 'index' is now an integer
- flickr: image file with lower quality
- paheal: image server name changed
- rule34: post got deleted
2019-04-12 09:59:48 +02:00
Mike Fährmann
87b0929bec
Revert "[flickr] restore image quality"
This reverts commit 3f513f10564a10ece8650e64d2233d8482fc14c7.

Both live.staticflickr and farmN.staticflickr servers now produce the
same image file with a lower overall quality than before this change in
Flickr's end.
2019-04-11 20:31:05 +02:00
Mike Fährmann
e7cd5510d5
[pixnet] add extractors (closes #177)
for:
- users/blogs: http://albertayu773.pixnet.net/
- folders: https://albertayu773.pixnet.net/album/folder/1405768
- sets   : https://albertayu773.pixnet.net/album/set/15078995
- photos : https://albertayu773.pixnet.net/album/photo/159443828
2019-04-11 19:27:02 +02:00
Mike Fährmann
155e1faeaf
[imagebam] support galleries with >100 images (fixes #219) 2019-04-11 19:12:27 +02:00
Mike Fährmann
9587aea98f
[deviantart] don't rewrite URLs for newer deviations
The '/intermediary/' trick stopped working for recently posted
deviations, but it still appears to be functional for older ones.
2019-04-11 10:37:01 +02:00
Mike Fährmann
f2220938cb
[mangoxo] improve channel extraction (#184) 2019-04-10 18:56:21 +02:00
Mike Fährmann
d9b94a585d
[mangoxo] add login support (#184)
A very recent change: It is now only possible to see more
than the first 5 images of an album if you are logged in.
2019-04-10 18:55:25 +02:00
Mike Fährmann
49a6522c38
ensure consistent headers and params ordering
Necessary to avoid being labeled a bot and getting a CAPTCHA response
after solving a Cloudflare challenge.
2019-04-09 10:52:27 +02:00
Mike Fährmann
9af9823067
increase required 'requests' version to 2.11.0
- uses an OrderedDict for session.headers (since 2.9.2)
- ships with urllib3 1.16, which is the first version to have an
  'allowed_gai_family()' function
2019-04-09 10:41:14 +02:00
Mike Fährmann
e730fc9045
[twitter] add login support (#214) 2019-04-09 09:27:49 +02:00
林博仁(Buo-ren Lin)
fad5833245 snap description: Change hyperlink markup (#216)
Snap Store no longer supports Markdown's titled hyperlink markup [due to
its ugliness in the `snap info` output in the terminal][1], this patch
changes the description to reference style instead.

[1]:
https://forum.snapcraft.io/t/use-of-markdown-in-snap-metadata-summary-description/2128/23
2019-04-09 09:01:25 +02:00
林博仁(Buo-ren Lin)
640fc72c75 snap: Fix scriptlet leaked into the final snap (#215)
The selective-checkout scriptlet is only used during the build step, don't let it make into the final snap.

Signed-off-by: 林博仁(Buo-ren Lin) <Buo.Ren.Lin@gmail.com>
2019-04-09 09:00:41 +02:00
Mike Fährmann
2c32dc76cb
[yaplog] update metadata structure (#190)
Put all blog post related fields in its own dict.

'image_id' -> 'id'
'post_id'  -> 'post[id]'
'title'    -> 'post[title]'
etc ...
2019-04-06 16:40:07 +02:00
Mike Fährmann
35919a9bb8
[livedoor] add blog- and post-extractors (#190) 2019-04-06 16:27:48 +02:00
Mike Fährmann
3f513f1056
[flickr] restore image quality
Flickr started serving images from live.staticflickr.com (see ec88ff1),
but the old farmN.staticflickr.com URLs still work - at least for the
time being.
Filesize (and most likely quality as well) for images from live.…  is
severely reduced compared to images from farmN.… for non-original files,
so all live URLs are replaced to point to a randomly chosen farm server.
2019-04-06 11:26:10 +02:00