464 Commits

Author SHA1 Message Date
Mike Fährmann
f3793660ef
update tests 2018-08-02 14:57:28 +02:00
Mike Fährmann
42a346413b
fix "re:" prefix for keyword tests 2018-08-02 14:48:51 +02:00
Mike Fährmann
e0dd8dff5f
implement L<maxlen>/<replacement>/ format option
The L option allows for the contents of a format field to be replaced
with <replacement> if its length is greater than <maxlen>.

Example:
{f:L5/too long/} -> "foo"      (if "f" is "foo")
                 -> "too long" (if "f" is "foobar")

(#92) (#94)
2018-07-29 13:52:07 +02:00
Mike Fährmann
bb89a1e6d7
[mangahere] use http://
invalid SSL cert for quite some time now
2018-07-26 18:11:31 +02:00
Mike Fährmann
ce34d82cb4
fix skipping tests on 5xx status codes 2018-07-19 18:47:23 +02:00
Mike Fährmann
a6fe2bb594
[whatisthisimnotgoodwithcomputers] remove extractor 2018-07-14 09:53:16 +02:00
Mike Fährmann
0ba93650e0
[8chan] replace unit test URL
the other thread is no longer accessible
2018-07-14 09:53:16 +02:00
Mike Fährmann
8fe9056b16
implement string slicing for format strings
It is now possible to slice string (or list) values of format string
replacement fields with the same syntax as in regular Python code.

"{digits}"       -> "0123456789"
"{digits[2:-2]}" -> "234567"
"{digits[:5]}"   -> "01234"

The optional third parameter (step) has been left out to simplify things.
2018-07-14 09:53:15 +02:00
Mike Fährmann
269dc2bbd5
[sankaku] add 'tags' option (#94) 2018-07-14 09:53:01 +02:00
Mike Fährmann
764331823b
release version 1.4.2 2018-07-06 16:02:40 +02:00
Mike Fährmann
2eefaa99a3
[mangapark] support .net and .com mirrors 2018-07-05 14:45:05 +02:00
Mike Fährmann
188e956c4e
[imagefap] use HTTPS + update test results 2018-06-30 19:40:46 +02:00
Mike Fährmann
a699787d01
[deviantart] update URL patterns to new format
DeviantArt changed its URL format from
https://<name>.deviantart.com/...
to
https://www.deviantart.com/<name>/...

With this change both formats will be supported.
2018-06-28 20:21:59 +02:00
Mike Fährmann
0055fdd714
change OAuth test server
DNS record for oauthbin.com expired
2018-06-28 14:32:02 +02:00
Mike Fährmann
b8c97d2295
use 'extractor.request()' for more HTTP requests 2018-06-25 23:40:59 +02:00
Mike Fährmann
7a98cc9798
[smugmug] update tests
My test account expired and all uploaded images got deleted.
2018-06-22 15:04:31 +02:00
Mike Fährmann
4eb94aca17
[postprocessor:ugoira] pass '-f' if not present 2018-06-22 13:26:17 +02:00
Mike Fährmann
a9e276bc37
reset delete-flag
Since 'PathFormat' objects are being reused, setting `delete`
to True once caused all files downloaded after to be deleted as well.
2018-06-20 18:12:59 +02:00
Mike Fährmann
6ac403c5d3
add postprocessor config example 2018-06-08 18:31:59 +02:00
Mike Fährmann
2403c405e3
Merge branch 'postprocessor' 2018-06-08 17:43:11 +02:00
Mike Fährmann
b344f2290f
fix downloader tests 2018-06-07 22:27:36 +02:00
Mike Fährmann
a47c6136cd
[simplyhentai] avoid redirects for all-pages.json (#89) 2018-06-01 22:06:34 +02:00
Mike Fährmann
ae9a37a528
implement text.split_html() 2018-05-27 15:00:41 +02:00
Mike Fährmann
0a1863fce3
[pixiv] respect more query parameters for user URLs
The API endpoint responsible for user illustrations does not
provide sufficient filter capabilities* to match the actual
website, so we are spinning our own filters.

Respected parameters are
    'type': illust, manga, ugoira
    'tag' : any image tag (this was already supported)
    'p'   : the page to start on

*
- API can filter for illustrations and manga, but not for ugoira.
- 'offset' is applied before filtering
- no 'tag' filter
2018-05-18 15:36:30 +02:00
Mike Fährmann
7f899bd5d8
Merge branch 'master' into 1.4-dev 2018-05-14 14:50:02 +02:00
Mike Fährmann
4cea886177
[imgur] allow longer album hashes 2018-05-13 11:21:51 +02:00
Mike Fährmann
e1e23165a0
[pinterest] catch JSON decode errors 2018-05-11 17:37:27 +02:00
Mike Fährmann
6a31ada9e3
re-implement OAuth1.0 code
OAuth support for SmugMug needs some additional features
(auth-rebuild on redirect, query parameters in URL, ...)
and fixing this in the old code wouldn't work all that well.
2018-05-10 18:47:05 +02:00
Mike Fährmann
e2157f594e
[mangadex] fix manga extraction (closes #84)
Chapter listings for manga now use
https://mangadex.org/manga/<id>/_/chapters/2/
as URL instead of
https://mangadex.org/manga/<id>/_//2/
2018-05-06 17:43:50 +02:00
Mike Fährmann
69a5e6ddb3
Merge branch 'master' into 1.4-dev 2018-05-04 10:19:02 +02:00
Mike Fährmann
3fe653d940
fix test_results for empty sets
{} is an empty dict and doesn't support set operations
2018-04-29 22:43:37 +02:00
Mike Fährmann
d96b3474e5
[puremashiro] remove module
site has been unreachable for a couple of weeks
and now the DNS record is gone as well
2018-04-28 14:24:20 +02:00
Mike Fährmann
b44a296404
[gomanga] remove module
site has been unreachable for a couple of weeks
and the cloudflare status page shows host errors
2018-04-28 14:24:21 +02:00
Mike Fährmann
2395d870dd
[pinterest] unquote board and user names, better errors 2018-04-26 16:38:12 +02:00
Mike Fährmann
55d4d23860
[pinterest] use Pinterest's "Web" API (#83)
no access tokens, no user credentials of any kind ...
2018-04-24 22:28:10 +02:00
Mike Fährmann
f471161920
Merge branch 'master' into 1.4-dev 2018-04-21 12:15:40 +02:00
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
10cc59f3b5
fix extractor names 2018-04-18 18:12:57 +02:00
Mike Fährmann
b1325d4d2c
fix extractor docstrings 2018-04-18 18:03:43 +02:00
Mike Fährmann
df7e18399e
[luscious] fix image order 2018-04-17 17:32:21 +02:00
Mike Fährmann
d10579edb5
[pinterest] improve PinterestAPI code; remove OAuth mentions
on another note: access_tokens have been set to only allow for
10 requests per hour (from 200 yesterday)
2018-04-17 17:12:42 +02:00
Mike Fährmann
4bd182c107
[pinterest] implement oauth:pinterest (#83)
Pinterest access tokens are rate limited at 200 requests per
hour (or maybe per 2 or 3 hours?) so having just one access token
for all users isn't going to work in the long run.
2018-04-16 20:03:28 +02:00
Mike Fährmann
dbe250f7e5
[pinterest] update access_token (#83) 2018-04-16 09:46:45 +02:00
Mike Fährmann
5c487300ee
improve 'parse_query()' and add tests
- another irrelevant micro-optimization !
- use urllib.parse.parse_qsl directly instead of parse_qs, which
  just packs the results of parse_qsl in a different data structure
- reduced memory requirements since no additional dict and lists are
  created
2018-04-15 19:05:29 +02:00
Mike Fährmann
4ffa94f634
remove 'shorten_path()' and 'shorten_filename()' 2018-04-15 18:44:13 +02:00
Mike Fährmann
27eab4e467
rewrite text tests and improve functions
- test more edge cases
- consistently return an empty string for invalid arguments
- remove the ungreedy-flag in 'remove_html()'
2018-04-15 18:13:46 +02:00
Mike Fährmann
e3f2bd4087
add tests for 'text.clean_xml()' and improve it 2018-04-14 22:07:01 +02:00
Mike Fährmann
6d8b191ea7
improve 'parse_query()' and add tests
- another irrelevant micro-optimization !
- use urllib.parse.parse_qsl directly instead of parse_qs, which
  just packs the results of parse_qsl in a different data structure
- reduced memory requirements since no additional dict and lists are
  created
2018-04-13 19:21:32 +02:00
Mike Fährmann
51ea699083
add 'abort()' as function to filter expressions
calling 'abort()' in a filter aborts the current extractor run
in a cleaner way than using something like 1/0, which
causes an error message to be printed
2018-04-12 17:07:12 +02:00
Mike Fährmann
48a83a89e9
[loveisover] remove module
archive.loveisover.me was shut down on 2018-03-29;
https://www.archiveteam.org/index.php?title=4chan#archive.loveisover.me
2018-04-09 16:05:15 +02:00