Mike Fährmann
add7e693d0
[tumblr] provide parsed 'date' metadata ( #232 )
2019-04-29 17:30:42 +02:00
Mike Fährmann
9544683d56
[deviantart] provide 'date' metadata ( #232 )
2019-04-29 17:30:24 +02:00
Mike Fährmann
5018781898
allow type tests by name
2019-04-29 17:27:59 +02:00
Mike Fährmann
df7cdb648a
specify maximum versions for requests & urllib3 ( #229 )
...
Wouldn't be necessary if pip would do proper version management.
As things are right now, pip ignores the urllib3 version requirements
from requests because gallery-dl is specifying its own.
2019-04-29 17:24:30 +02:00
Mike Fährmann
76df628b13
rewrite invalid cloudflare redirect locations
...
After solving a challenge on komikcast.com, cloudflare would redirect to
https:/komikcast.com (with only one '/') when testing on TravisCI.
2019-04-27 16:22:42 +02:00
Mike Fährmann
0d7e8be987
[dynastyscans] simplify image extractor
2019-04-27 13:24:30 +02:00
Mike Fährmann
9aa0bb5afe
[dynastyscans] encode "[]" in search queries
...
urllib3 1.25 classifies URLs with unencoded "[" or "]" as invalid
and raises an exception
2019-04-27 13:22:40 +02:00
Mike Fährmann
fe849382d8
[komikcast] improve extraction
2019-04-26 15:14:10 +02:00
Mike Fährmann
c35217e9a3
specify version requirements for urllib3
...
urllib3 versions 1.24.1 and 1.24.2 cause HTTP requests to
https://www.artstation.com/users/ <username>/quick.json
to fail with a 403: Forbidden status code (#227 ),
and provoke a CAPTCHA response after solving a Cloudflare challenge.
2019-04-26 12:58:20 +02:00
Mike Fährmann
bc26fc2439
implement '--clear-cache'
...
Effectively clears all cached values from the cache database by
executing "DELETE FROM data" without any further user input.
2019-04-25 21:31:01 +02:00
Mike Fährmann
0318c610dc
[sexcom] add extractor for search results ( #147 )
2019-04-24 22:10:01 +02:00
Mike Fährmann
a247c94c34
[sexcom] add pin and board extractors ( #147 )
2019-04-24 22:09:19 +02:00
Mike Fährmann
6264a46212
use 'utcfromtimestamp()'
...
'fromtimestamp()' converts its results to the local timezone and causes
problems when running tests on a different machine.
2019-04-21 16:22:53 +02:00
Mike Fährmann
d84e7c6861
[twitter] extract 'date' metadata ( #224 )
2019-04-21 15:41:22 +02:00
Mike Fährmann
d670de0344
implement 'text.parse_timestamp()'
2019-04-21 15:28:27 +02:00
Mike Fährmann
f2cf1c1d73
use 'text.extract_from()' in a few places
2019-04-21 15:19:20 +02:00
Mike Fährmann
21a7e395a7
implement convenience wrapper for text.extract functionality
2019-04-19 22:30:11 +02:00
Mike Fährmann
8f249f1d54
improve text.extract_iter() performance
...
by roughly 40% through
- inlining code
- pre-calculating reused values
- entering a try-except block only once
2019-04-18 23:37:17 +02:00
Mike Fährmann
e25ebc4bff
don't disable certificate checks anymore
...
Executables generated with PyInstaller auto-include the root certificate
file and certificate checks now work out-of-the-box.
2019-04-17 13:27:19 +02:00
Mike Fährmann
166a721c19
include PyInstaller executable in release script
2019-04-17 12:44:45 +02:00
Mike Fährmann
18345381f3
add PyInstaller script and hook ( #166 )
2019-04-17 12:43:29 +02:00
Mike Fährmann
96c7ccd380
update/cleanup Python dev scripts
...
- put common code in its own util.py file
- same Python3 shebang for all scripts
- add file docstrings
- fix format string replacement fields in man page template
2019-04-16 21:21:34 +02:00
Mike Fährmann
7973419b54
restrict downloader and postprocessor module imports
2019-04-16 18:09:30 +02:00
Mike Fährmann
70be494161
[plurk] add a 'comments' options ( #212 )
2019-04-14 22:12:46 +02:00
Mike Fährmann
0b2ff406f6
[plurk] add timeline- and post-extractors ( #212 )
2019-04-14 21:48:38 +02:00
Mike Fährmann
dcd1bd3b6f
release version 1.8.2
2019-04-12 10:38:51 +02:00
Mike Fährmann
d6ddb74cde
update test results
...
- deviantart: 'index' is now an integer
- flickr: image file with lower quality
- paheal: image server name changed
- rule34: post got deleted
2019-04-12 09:59:48 +02:00
Mike Fährmann
87b0929bec
Revert "[flickr] restore image quality"
...
This reverts commit 3f513f10564a10ece8650e64d2233d8482fc14c7.
Both live.staticflickr and farmN.staticflickr servers now produce the
same image file with a lower overall quality than before this change in
Flickr's end.
2019-04-11 20:31:05 +02:00
Mike Fährmann
e7cd5510d5
[pixnet] add extractors ( closes #177 )
...
for:
- users/blogs: http://albertayu773.pixnet.net/
- folders: https://albertayu773.pixnet.net/album/folder/1405768
- sets : https://albertayu773.pixnet.net/album/set/15078995
- photos : https://albertayu773.pixnet.net/album/photo/159443828
2019-04-11 19:27:02 +02:00
Mike Fährmann
155e1faeaf
[imagebam] support galleries with >100 images ( fixes #219 )
2019-04-11 19:12:27 +02:00
Mike Fährmann
9587aea98f
[deviantart] don't rewrite URLs for newer deviations
...
The '/intermediary/' trick stopped working for recently posted
deviations, but it still appears to be functional for older ones.
2019-04-11 10:37:01 +02:00
Mike Fährmann
f2220938cb
[mangoxo] improve channel extraction ( #184 )
2019-04-10 18:56:21 +02:00
Mike Fährmann
d9b94a585d
[mangoxo] add login support ( #184 )
...
A very recent change: It is now only possible to see more
than the first 5 images of an album if you are logged in.
2019-04-10 18:55:25 +02:00
Mike Fährmann
49a6522c38
ensure consistent headers and params ordering
...
Necessary to avoid being labeled a bot and getting a CAPTCHA response
after solving a Cloudflare challenge.
2019-04-09 10:52:27 +02:00
Mike Fährmann
9af9823067
increase required 'requests' version to 2.11.0
...
- uses an OrderedDict for session.headers (since 2.9.2)
- ships with urllib3 1.16, which is the first version to have an
'allowed_gai_family()' function
2019-04-09 10:41:14 +02:00
Mike Fährmann
e730fc9045
[twitter] add login support ( #214 )
2019-04-09 09:27:49 +02:00
林博仁(Buo-ren Lin)
fad5833245
snap description: Change hyperlink markup ( #216 )
...
Snap Store no longer supports Markdown's titled hyperlink markup [due to
its ugliness in the `snap info` output in the terminal][1], this patch
changes the description to reference style instead.
[1]:
https://forum.snapcraft.io/t/use-of-markdown-in-snap-metadata-summary-description/2128/23
2019-04-09 09:01:25 +02:00
林博仁(Buo-ren Lin)
640fc72c75
snap: Fix scriptlet leaked into the final snap ( #215 )
...
The selective-checkout scriptlet is only used during the build step, don't let it make into the final snap.
Signed-off-by: 林博仁(Buo-ren Lin) <Buo.Ren.Lin@gmail.com>
2019-04-09 09:00:41 +02:00
Mike Fährmann
2c32dc76cb
[yaplog] update metadata structure ( #190 )
...
Put all blog post related fields in its own dict.
'image_id' -> 'id'
'post_id' -> 'post[id]'
'title' -> 'post[title]'
etc ...
2019-04-06 16:40:07 +02:00
Mike Fährmann
35919a9bb8
[livedoor] add blog- and post-extractors ( #190 )
2019-04-06 16:27:48 +02:00
Mike Fährmann
3f513f1056
[flickr] restore image quality
...
Flickr started serving images from live.staticflickr.com (see ec88ff1),
but the old farmN.staticflickr.com URLs still work - at least for the
time being.
Filesize (and most likely quality as well) for images from live.… is
severely reduced compared to images from farmN.… for non-original files,
so all live URLs are replaced to point to a randomly chosen farm server.
2019-04-06 11:26:10 +02:00
Mike Fährmann
060859cc68
fix URL patterns
...
allow https:// as well as http://
2019-04-05 23:15:19 +02:00
Mike Fährmann
13526f3624
[yaplog] fix archive_id and posts with more than 24 images
...
- 'post_id' and 'image_id' are only unique per user
- /image/ pages only show a maximum of 24 images, but there can be more
images than that in a blog post
- let extraction run in its own thread and maybe improve speed
- #190
2019-04-05 23:15:03 +02:00
Mike Fährmann
2ff043edfa
[yaplog] add user- and post-extractors ( #190 )
2019-04-04 17:56:56 +02:00
Mike Fährmann
790f15a56f
[photobucket] use HTTPS
2019-04-03 18:30:45 +02:00
Mike Fährmann
6da665f32e
[mangoxo] add album- and channel-extractors ( closes #184 )
2019-04-03 07:55:51 +02:00
Mike Fährmann
21e80d60ff
[wikiart] docstring fixes
2019-04-03 07:28:10 +02:00
Mike Fährmann
c70b21248d
[wikiart] add extractors ( #179 )
...
for
- artists: https://www.wikiart.org/en/thomas-cole
- artist-listings: https://www.wikiart.org/en/artists-by-century/12
- artwork-listings: https://www.wikiart.org/en/paintings-by-media/grisaille
2019-04-02 17:34:57 +02:00
Mike Fährmann
9ebd29fcc1
update cloudflare bypass (wip)
...
This commit adds support for the two new JS expressions embedded in the
overall challenge code.
It does compute the correct 'js_answer' value, but the HTTP request to
/cdn-cgi/l/chk_jschl to get the 'cf_clearance' cookie always results in
a 403 response with a CAPTCHA inside (hence 'wip')
All steps to make this HTTP request indistinguishable from a regular web
browser (which passes the test) show no effect. This includes:
- using the exact same HTTP headers as a web browser
- follow query argument order
- different wait times
2019-04-01 15:14:59 +02:00
Mike Fährmann
0f02e85961
[reactor] use "/full/" URLs ( closes #210 )
...
Putting a "/full/" in image URLs potentially gives higher resolution
and better quality.
2019-03-30 22:14:57 +01:00