Mike Fährmann
049a9575c4
[tumblr] fix inline extraction #2
...
Using only the "comment" field isn't enough ...
[ci skip]
2018-12-11 21:57:20 +01:00
Mike Fährmann
f6bf66f72c
[pixiv] create directory for each "work" item ( #136 )
2018-12-11 20:37:47 +01:00
Mike Fährmann
79f6755c60
[postprocessor:classify] handle missing "extension" ( #138 )
2018-12-11 20:10:02 +01:00
Mike Fährmann
b7a9f6cc49
[tumblr] improve inline extraction ( #137 )
2018-12-11 20:02:48 +01:00
Mike Fährmann
010da8372a
[instagram] relax test pattern
2018-12-11 19:59:28 +01:00
Mike Fährmann
1c6b9ba322
[readcomiconline] use HTTPS
2018-12-09 14:54:55 +01:00
Leonardo Taccari
2655a2ea02
Add support for instagram.com user profiles and pages ( #134 )
...
* [instagram] Add extractor for instagram.com user profiles and pages
The extractor scrapes `instagram.com/<user>' timelines and
`instagram.com/p/<shortcode>' by mimicking the behaviour of a web
browser and extracting the sharedData JSON of the single pages.
Please note that this mean that for user timelines we also do an
extra request to the `instagram.com/p/<shortcode>' page but this
permit to have consistent (and all) information about the media
fetched.
The MD5 logic used for X-Instagram-GIS was documented in
<https://stackoverflow.com/questions/49786980/ >
* [instagram] Test for keywords, not url for GraphImage and GraphSidecar
URLs returned by instagram seems not stable so avoid testing for
them and instead test for keyword returned.
* [instagram] Improve test of InstagramProfilepageExtractor
Also check the count of media returned.
* [instagram] Several cleanup and improvements
- Change description, subcategories to generate a better description in
docs/supportedsite.rst
- Remove not needed InstagramExtractor.__init__()
- Use text.parse_int() instead of directly using int() (the former is more
robust)
- Use self.request().json() instead of using json.loads() the
self.request().text()
- Add `pattern:' to check the URLs where we do not have a stable URLs.
It seems that only the subdomain is not stable.
Thanks to @mikf !
2018-12-09 12:52:14 +01:00
HRXN
e80ee77d71
tumblr.py: update regex for video ( #133 )
...
There seems to be another sub-domain for videos, apparently..
Not just
`vt(.media).tumblr`
`vtt(media).tumblr`
But also
`ve(.media).tumblr`
2018-12-09 09:07:46 +01:00
Mike Fährmann
9a98b6769d
use extractor.request for API calls ( #130 )
...
... at least for OAuth1.0 based APIs (flickr, smugmug, tumblr)
2018-12-04 21:29:06 +01:00
Mike Fährmann
0225d90078
add exception name and traceback for OSErrors
2018-12-04 19:24:50 +01:00
Mike Fährmann
ad2cefda6b
[tumblr] in case of exception use filename as 'hash' ( #129 )
...
While a filename might not be a real 'hash', or comparable to what
tumbler usually provides, it is still better than an empty string.
At least as long as "alternatives" in format strings aren't implemented.
2018-12-04 19:15:23 +01:00
Mike Fährmann
95636418ad
[tumblr] catch exception for 'hash' extraction ( fixes #129 )
2018-12-02 19:48:09 +01:00
Mike Fährmann
40e30694f3
[pinterest] fix pin.it redirects
2018-12-02 19:38:50 +01:00
Mike Fährmann
770200888e
[gfycat] use public API endpoint
2018-12-02 18:56:53 +01:00
Mike Fährmann
b1e22e8354
release version 1.6.1
2018-11-28 15:34:01 +01:00
Mike Fährmann
be52069cbc
update CHANGELOG and docs/supportedsites
2018-11-28 14:53:27 +01:00
Mike Fährmann
5d6e219fb2
[joyreactor] update tests
2018-11-28 14:52:19 +01:00
Mike Fährmann
c59f56fe7e
[gfycat] fix extraction
...
/cajax/get/<id> doesn't work anymore
2018-11-28 13:26:21 +01:00
Mike Fährmann
ba56827f36
[newgrounds] add user-, video-, image-extractors ( #119 )
2018-11-27 15:44:53 +01:00
Mike Fährmann
15890930ea
[mangafox] fix extraction
...
use mobile version since desktop version is obfuscated
2018-11-26 16:13:41 +01:00
Mike Fährmann
a4263fb253
[luscious] add extractor for search results ( closes #127 )
2018-11-25 18:57:51 +01:00
Mike Fährmann
fb53b5dd55
fix control+c during -j and range tests
2018-11-25 18:54:05 +01:00
Mike Fährmann
a0ae156edc
[pornreactor] add tag-, user-, post-extractors ( #114 )
2018-11-23 14:41:26 +01:00
Mike Fährmann
bacbc2e7bd
[joyreactor] try to prevent JsonDecodeErrors ( #114 )
2018-11-23 14:32:37 +01:00
Mike Fährmann
503d42a1c2
[joyreactor] add tag-, user-, post-extractors ( #114 )
2018-11-23 09:25:02 +01:00
Mike Fährmann
59bb434ba5
[flickr] add ability to download all albums of a user
...
for example with 'https://www.flickr.com/photos/shona_s/albums '
2018-11-23 09:09:37 +01:00
Mike Fährmann
13cb270326
set target directory before postprocessor init ( fixes #126 )
2018-11-21 22:21:26 +01:00
Mike Fährmann
9e188f6a21
[4chan] support 4channel.org domain
2018-11-21 17:40:38 +01:00
Mike Fährmann
041bd501fc
[hentaifoundry] unescape YII_CSRF_TOKEN value
...
This fixes the POST requests to /site/filters
2018-11-19 21:46:17 +01:00
Mike Fährmann
b828473aa3
retry HTTP requests for more exception classes
2018-11-19 15:49:13 +01:00
Mike Fährmann
c2e59b9a7d
update CHANGELOG.md
...
[ci skip]
2018-11-18 22:33:35 +01:00
Mike Fährmann
d4b2b73bef
release version 1.6.0
2018-11-17 18:28:02 +01:00
Mike Fährmann
ea9d1b6501
update README.rst
...
- point to pip3/python3 in installation-instructions (#118 , #121 )
- add dependency list
- update URLs to external resources
- remove incomplete list of supported sites
2018-11-17 17:46:19 +01:00
Mike Fährmann
c47482b110
smaller changes, missing docs, etc.
...
- make 'netrc' extractor-specific
- rename 'downloader.enable' to 'enabled'
- document 'downloader.ytdl.format'
- consistent newlines in configuration.rst
2018-11-16 18:18:07 +01:00
Mike Fährmann
b17a5d6f3b
give downloader classes proper names
2018-11-16 14:40:05 +01:00
Mike Fährmann
3c25fa2dad
update build_testresult_db.py script
2018-11-15 22:58:14 +01:00
Mike Fährmann
7f6a0be982
adjust some tests
2018-11-15 22:50:04 +01:00
Mike Fährmann
baad7b0fa5
[twitter] unpack API responses when logged in ( closes #123 )
2018-11-14 11:49:35 +01:00
Mike Fährmann
3bdfc15be1
[pinterest] don't crash on pins without image info
2018-11-14 11:46:14 +01:00
Mike Fährmann
8ef84a6823
add option to enable/disable specific downloader modules
...
... and write URLs with no (active) downloader to unsupported-file
2018-11-13 18:06:36 +01:00
Mike Fährmann
14ee6bf611
[behance] handle external URLs with youtube-dl
2018-11-13 15:10:23 +01:00
Mike Fährmann
36425122ff
[artstation] handle external URLs with youtube-dl
2018-11-13 14:27:02 +01:00
Mike Fährmann
bd8670d925
[gfycat] extend URL pattern
2018-11-11 21:19:11 +01:00
Mike Fährmann
2fa28a2609
update default user-agent string ( closes #122 )
2018-11-11 10:07:10 +01:00
Mike Fährmann
7e2d6bcd62
[deviantart] fix original image downloads
2018-11-10 19:16:10 +01:00
Mike Fährmann
9e12e073ab
[2chan] fix extraction
2018-11-10 19:15:21 +01:00
Mike Fährmann
966a9ca3a0
update test results
2018-11-10 19:14:54 +01:00
Mike Fährmann
e26ba682a2
enforce utf-8 encoding for input files ( #120 )
2018-11-10 18:27:01 +01:00
Mike Fährmann
a36259d8f1
update setup.py
...
- add Python version check
- add classifiers
- simplify sys.exit() usage
2018-10-24 14:43:37 +02:00
Mike Fährmann
fd8ed35591
[turboimagehost] fix extraction
2018-10-23 21:08:24 +02:00