508 Commits

Author SHA1 Message Date
Mike Fährmann
b5b4f5a168
use 'build_extractor_filter' in test_results.py 2021-12-28 17:25:07 +01:00
Mike Fährmann
64cf26eaf4
allow specifying sleep-* options as string
either as single value or as range: "3.5", "2.1 - 5.0"
2021-12-18 23:28:56 +01:00
Mike Fährmann
010d65dcec
extend blacklist/whitelist syntax (#2025)
Each entry in such a list can now also include a subcategory
'<category>:<subcategory>'
and it is possible to use '*' or an empty string as placeholder
'*:<subcategory>', ':<subcategory>', '<category>:*'

For example
  "blacklist": "imgur,*:tag,gfycat:user" or
  "blacklist": ["imgur", "*:tag", "gfycat:user"]
will filter all 'imgur' extractors, all extractors  with a 'tag'
subcategory (e.g. https://danbooru.donmai.us/posts?tags=bonocho),
and all 'gfycat' user extractors.
2021-11-23 20:31:43 +01:00
Mike Fährmann
af6424f398
allow testing metadata in list elements 2021-11-21 22:46:34 +01:00
Mike Fährmann
3842cdcd8f
[formatter] implement 'D' format specifier
To be able to parse any string into a 'datetime' object
and format it as necessary.

Example:

{created_at:D%Y-%m-%dT%H:%M:%S%z}
->
"2010-01-01 00:00:00"

{created_at:D%Y-%m-%dT%H:%M:%S%z/%b %d %Y %I:%M %p}
->
"Jan 01 2010 12:00 AM"

with 'created_at' == "2010-01-01T01:00:00+0100"
2021-11-20 23:04:34 +01:00
Mike Fährmann
2ab190ce08
add tests for special format strings 2021-11-01 23:26:18 +01:00
Mike Fährmann
46e17c5e61
support accessing the current local datetime in format strings
{_now}, {_now:%Y-%m-%d}, etc
(#1968)
2021-10-30 21:41:09 +02:00
Mike Fährmann
38193dba46
support accessing environment variables in format strings (#1968)
{_env[HOME]} to get the value of $HOME
every other format string feature is supported as well
2021-10-28 19:18:55 +02:00
Mike Fährmann
f2d6b3e6b4
run tests without using 'nose'
run_tests.sh -> run_tests.py
2021-10-13 04:07:41 +02:00
Mike Fährmann
12fc646c53
fix filename formatting tests 2021-09-29 23:39:02 +02:00
Mike Fährmann
e0bdacd932
[fappic] add 'image' extractor (closes #1898) 2021-09-28 23:35:29 +02:00
Mike Fährmann
c22ff97743
remove 'unit' argument from 'util.format_value()' 2021-09-28 23:07:55 +02:00
Mike Fährmann
cad85640de
move 'util.PathFormat' into its own 'path' module
to prevent circular imports between 'formatter' and 'util'
2021-09-27 21:29:37 +02:00
Mike Fährmann
74145467dd
move 'util.Formatter' into its own 'formatter' module 2021-09-27 02:37:04 +02:00
Mike Fährmann
9377543162
[mastodon] add 'following' extractor (#1891) 2021-09-26 00:12:34 +02:00
Mike Fährmann
bd845303ad
implement a way to shorten filenames with east-asian characters
(#1377)

Setting 'output.shorten' to "eaw" (East-Asian Width) uses a slower
algorithm that also considers characters with a width > 1.
2021-09-13 21:38:33 +02:00
Mike Fährmann
292fffc83c
add 'j' format string conversion
to convert to a JSON formatted string
2021-08-28 01:19:36 +02:00
Mike Fährmann
bb6a130942
automatically set required DDoS-GUARD cookies (#1779)
for kemono.party and seiso.party
2021-08-16 17:40:29 +02:00
Mike Fährmann
2792ed6e4b
implement 'util.format_value()' 2021-07-26 02:11:22 +02:00
Mike Fährmann
9e42cd58ea
replace ChainPredicate class with 'functools.partial' 2021-07-20 20:21:32 +02:00
Mike Fährmann
36ac2197db
[ytdl] add extractor for sites supported by youtube-dl
(#1680, #878)

Can be used by prefixing any URL with 'ytdl:',
or by setting 'extractor,ytdl.enabled' to 'true'.
2021-07-10 20:55:47 +02:00
Mike Fährmann
64240c8d42
[imagevenue] fix extraction
(closes #1677)
2021-07-09 20:13:18 +02:00
Mike Fährmann
0179581340
add 'T' format string conversion (#1646)
to convert 'date'/datetime to timestamp
2021-06-25 22:35:45 +02:00
Mike Fährmann
f74cf52e2b
[seisoparty] add 'user' and 'post' extractors (#1635) 2021-06-25 18:40:11 +02:00
Mike Fährmann
759735fb02
[kemonoparty] fix 'username' extraction (fixes #1652)
The site's <title> content changed from

<title>NAME | Kemono</title>

to

<title>
    NAME | Kemono
</title>
2021-06-25 15:35:20 +02:00
Mike Fährmann
07c8adbd8b
[mangadex] implement login with username & password (#1535) 2021-06-08 02:12:57 +02:00
Mike Fährmann
4a747a31a3
[postprocessor:metadata] handle dicts in mode;tags (fixes #1598) 2021-06-04 22:37:43 +02:00
Mike Fährmann
3cbbefd4ed
support 'filter' option for post processors (#1460) 2021-06-04 18:23:32 +02:00
Mike Fährmann
0abad8bc12
implement 'compile_expression()' 2021-06-03 22:34:58 +02:00
Mike Fährmann
da6806a161
fix job tests for Python 3.4 and 3.5
assert_called() and assert_not_called() got added in Python 3.6
2021-05-22 21:40:52 +02:00
Mike Fährmann
8fd8126117
fix ISO 639-1 code for Japanese
"jp" -> "ja"
2021-05-22 16:07:04 +02:00
Mike Fährmann
af9dba4684
add DataJob tests 2021-05-21 02:59:54 +02:00
Mike Fährmann
adf4d661b3
use '_extractor' info in UrlJobs 2021-05-19 15:52:30 +02:00
Mike Fährmann
1eabfa5c7a
[pillowfort] implement login with username & password (#846) 2021-05-19 02:59:16 +02:00
Mike Fährmann
559462789d
add some tests for job.py 2021-05-14 19:44:16 +02:00
Mike Fährmann
c5ca7905ce
add 'noop()' and 'identity()' functions 2021-05-04 19:27:17 +02:00
Mike Fährmann
bc868e7bb8
consider apparently long extensions as part of the filename
(#1516)
2021-05-02 21:15:50 +02:00
Mike Fährmann
bdfcc9c4b1
update extractor test results 2021-04-18 20:28:15 +02:00
Mike Fährmann
387fe415d5
unescape items in text.split_html() 2021-03-29 02:12:29 +02:00
Mike Fährmann
78fd63b8f0
remove 'text.clean_xml()'
was not used anywhere
2021-03-28 04:05:16 +02:00
Mike Fährmann
8553b218d9
replace calls to 'os.path.splitext()' with 'str.rpartition()'
Makes functions who used it more than twice as fast
and we can get rid of an import as well.
2021-03-28 04:01:27 +02:00
Mike Fährmann
bff71cde80
implement 'util.unique_squence()' 2021-03-02 23:11:08 +01:00
Mike Fährmann
5f1a6ff6fa
remove unneeded 'TRAVIS_SKIP' from test_results.py 2021-03-01 01:38:18 +01:00
Mike Fährmann
8821dceb79
use __import__() to dynamically load modules 2021-03-01 01:27:02 +01:00
Mike Fährmann
36bf76fa44
update 'oauth:mastodon:<instance>' code 2021-01-28 02:20:12 +01:00
Mike Fährmann
91308140ec
make 'generate_token()' compatible with Python 3.4 2021-01-14 03:48:10 +01:00
Mike Fährmann
780b6adb91
rename 'generate_csrf_token()' to just 'generate_token()'
and add a 'size' argument
2021-01-11 22:12:40 +01:00
Mike Fährmann
0fdaea00a3
[postprocessor:metadata] sanitize filenames 2021-01-10 00:13:20 +01:00
Mike Fährmann
aac00a2024
add 'd' conversion for format strings
to convert a timestamp to a formattable 'datetime' object.

For example '{created_at!d:%Y-%m-%d}'
transforms the timestamp in 'created_at' into a 'datetime' object
and then formats its content using '%Y-%m-%d' as template.

1262304000 -> datetime(2010, 1, 1) -> "2010-01-01"
2021-01-09 01:58:44 +01:00
Mike Fährmann
912eea29bc
update extractor test results 2020-12-27 17:41:08 +01:00