37 Commits

Author SHA1 Message Date
Mike Fährmann
09f37fde39
[reddit] move date-min/-max handling into Extractor class 2019-07-16 22:54:39 +02:00
Mike Fährmann
fdec59f8e2
replace extractor.request() 'expect' argument
with
- 'fatal': allow 4xx status codes
- 'notfound': raise NotFoundError on 404
2019-07-05 00:42:16 +02:00
Mike Fährmann
a2af2d2965
adjust cache maxage values 2019-03-14 22:21:49 +01:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)

Example: "https://example.org/path/filename.ext"

before:
- filename : filename.ext
- name     : filename
- extension: ext

now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
2e516a1e3e
store the full original URL in Extractor.url 2019-02-12 18:46:48 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
abbd45d0f4
update handling of extractor URL patterns
When loading extractor classes during 'extractor.find(…)', their
'pattern' attribute will be replaced with a compiled version of itself.
2019-02-08 20:08:16 +01:00
Mike Fährmann
6284731107
simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
6126615698
update URLs for supportedsites.rst 2019-01-30 16:18:22 +01:00
Mike Fährmann
4ab0960083
[reddit] add metadata to extracted URLs 2018-12-29 17:52:43 +01:00
Mike Fährmann
7471933d5f
use extractor.request for all other API calls
- deviantart
- pawoo
- pixiv
- reddit
2018-12-22 14:42:23 +01:00
Mike Fährmann
966a9ca3a0
update test results 2018-11-10 19:14:54 +01:00
Mike Fährmann
c9b8e6aefc
[reddit] fix submission-ID parsing (#104)
Uppercase characters caused a ValueError exception
2018-09-07 18:27:54 +02:00
Mike Fährmann
4313c95bc9
improve error message for OAuth2 authentication 2018-08-11 23:54:25 +02:00
Mike Fährmann
92fc199b07
[reddit] allow arbitrary subdomains 2018-05-13 11:23:23 +02:00
Mike Fährmann
3cec533c28
Merge branch 'archive' 2018-02-12 18:07:58 +01:00
Mike Fährmann
20af86b2ea
add more extractor tests
for mangastream, reddit and imgur
2018-02-12 17:07:18 +01:00
Mike Fährmann
34873dbd90
set 'archive_fmt' values
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
cc0c2cca57
[reddit] add extractor for reddit-hosted images (closes #68) 2018-01-14 18:55:42 +01:00
Mike Fährmann
676602056c
[reddit] unescape output URLs 2017-12-19 22:22:43 +01:00
Mike Fährmann
864a63ed33
fix typo
[skip ci]
2017-10-10 17:42:06 +02:00
Mike Fährmann
f3fbaa5c3e
[reddit] allow users to override the API User-Agent
Only overriding the Client-ID is not enough if you want to follow
Reddit's API access rules [1].

[1] https://github.com/reddit/reddit/wiki/API#rules
2017-10-10 17:29:46 +02:00
Mike Fährmann
0dedbe759c
enable '--chapter-filter'
The same filter infrastructure that can be applied to image URLS now
also works for manga chapters and other delegated URLs.

TODO: actually provide any metadata (currently supported is only
deviantart and imagefap).
2017-09-12 16:19:00 +02:00
Mike Fährmann
54c0715135
allow users to set their own API access_tokens/client_ids 2017-09-09 17:50:19 +02:00
Mike Fährmann
85696d0b3b
[reddit] fix issue with datetime errors 2017-07-02 08:19:45 +02:00
Mike Fährmann
80c2e03aaa
[reddit] allow 'date-min/max' to be human readable dates
If the date-min/max config value is a string, try parsing it using
datetime.strptime [1] with 'date-format' as format string [2]
(default: "%Y-%m-%dT%H:%M:%S")

Example: get all submissions posted in 2016

$ gallery-dl reddit.com/r/... \
    -o date-format=%Y \
    -o date-min=\"2016\" \
    -o date-max=\"2017\"

[1] https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime
[2] https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior
2017-07-01 18:46:38 +02:00
Mike Fährmann
f3d0373120
[reddit] add ability to filter by submission id
'extractor.reddit.id-min' and '….id-max' specify the lowest and
highest submission-/post-id to consider, similar to 'date-min' and
'date-max'
2017-06-29 17:39:22 +02:00
Mike Fährmann
2993206c4b
smaller fixes and "security" measures
- move the OAuthSession class into util.py
- block special extractors for reddit and recursive
- ignore 'only matching' tests for testresults script
2017-06-16 21:01:40 +02:00
Mike Fährmann
56bec79e6a
[reddit] add ability to load more comments (#15)
The 'extractor.reddit.morecomments' option enables the use of
the '/api/morechildren' API endpoint (1) to load even more
comments than the usual submission-request provides.
Possible values are the booleans 'true' and 'false' (default).

Note: this feature comes at the cost of 1 extra API call towards
the rate limit for every 100 extra comments.

(1) https://www.reddit.com/dev/api/#GET_api_morechildren
2017-06-13 18:49:07 +02:00
Mike Fährmann
090e11b35d
[reddit] enable user authentication with OAuth2 (#15)
Call '$ gallery-dl oauth:reddit' to get a refresh_token
for your account.
2017-06-08 16:17:13 +02:00
Mike Fährmann
8456b84a12
fix tests and small stuff 2017-06-06 14:22:09 +02:00
Mike Fährmann
fbfc8d0f78
[reddit] ignore Authorization errors for subreddits
- also made the limit for retrieved comments customizable via
  the 'extractor.reddit.comments' config value
- default is 500;  0 ignores comments completely
2017-06-05 18:43:08 +02:00
Mike Fährmann
5f05543f23
[reddit] support filtering by timestamp (#15)
- Added the 'extractor.reddit.date-min' and '….date-max'
  config options. These values should be UTC timestamps.
- All submissions not posted in date-min <= T <= date-max
  will be ignored.

- Fixed the limit parameter for submission comments by setting
  it to its apparent max value (500).
2017-06-03 13:33:48 +02:00
Mike Fährmann
bce51e90a5
[reddit] support sorting options and sub-options (#15)
Example:
    https://www.reddit.com/r/<subreddit>/top/?sort=top&t=month
    (the 'sort=top' parameter is irrelevant and can be omitted)
2017-05-29 12:45:35 +02:00
Mike Fährmann
99b72130ee
[reddit] enable recursion (#15)
reddit extractors now recursively visit other submissions/posts
linked to in the initial set of submissions.
This behaviour can be configured via the 'extractor.reddit.recursion'
key in the configuration file or by `-o recursion=<value>`.

Example:
{"extractor": {
  "reddit": {
   "recursion": <value>
}}}

Possible values:
* -1 - infinite recursion (don't do this)
*  0 - recursion is disabled (default)
*  1 and higher - maximum recursion level
2017-05-26 17:01:27 +02:00
Mike Fährmann
e425243b1e
[reddit] some small fixes
- filter or complete some URLs
- remove the 'nofollow:' scheme before printing URLs
- (#15)
2017-05-23 11:48:00 +02:00
Mike Fährmann
a22892f494
[reddit] add subreddit- and submission-extractor
- these extractors scan submissions and their comments for
  (external) URLs and defer them to other extractors
- (#15)
2017-05-23 09:38:50 +02:00