Mike Fährmann
b5b4f5a168
use 'build_extractor_filter' in test_results.py
2021-12-28 17:25:07 +01:00
Mike Fährmann
64cf26eaf4
allow specifying sleep-* options as string
...
either as single value or as range: "3.5", "2.1 - 5.0"
2021-12-18 23:28:56 +01:00
Mike Fährmann
010d65dcec
extend blacklist/whitelist syntax ( #2025 )
...
Each entry in such a list can now also include a subcategory
'<category>:<subcategory>'
and it is possible to use '*' or an empty string as placeholder
'*:<subcategory>', ':<subcategory>', '<category>:*'
For example
"blacklist": "imgur,*:tag,gfycat:user" or
"blacklist": ["imgur", "*:tag", "gfycat:user"]
will filter all 'imgur' extractors, all extractors with a 'tag'
subcategory (e.g. https://danbooru.donmai.us/posts?tags=bonocho ),
and all 'gfycat' user extractors.
2021-11-23 20:31:43 +01:00
Mike Fährmann
af6424f398
allow testing metadata in list elements
2021-11-21 22:46:34 +01:00
Mike Fährmann
3842cdcd8f
[formatter] implement 'D' format specifier
...
To be able to parse any string into a 'datetime' object
and format it as necessary.
Example:
{created_at:D%Y-%m-%dT%H:%M:%S%z}
->
"2010-01-01 00:00:00"
{created_at:D%Y-%m-%dT%H:%M:%S%z/%b %d %Y %I:%M %p}
->
"Jan 01 2010 12:00 AM"
with 'created_at' == "2010-01-01T01:00:00+0100"
2021-11-20 23:04:34 +01:00
Mike Fährmann
2ab190ce08
add tests for special format strings
2021-11-01 23:26:18 +01:00
Mike Fährmann
46e17c5e61
support accessing the current local datetime in format strings
...
{_now}, {_now:%Y-%m-%d}, etc
(#1968 )
2021-10-30 21:41:09 +02:00
Mike Fährmann
38193dba46
support accessing environment variables in format strings ( #1968 )
...
{_env[HOME]} to get the value of $HOME
every other format string feature is supported as well
2021-10-28 19:18:55 +02:00
Mike Fährmann
f2d6b3e6b4
run tests without using 'nose'
...
run_tests.sh -> run_tests.py
2021-10-13 04:07:41 +02:00
Mike Fährmann
12fc646c53
fix filename formatting tests
2021-09-29 23:39:02 +02:00
Mike Fährmann
e0bdacd932
[fappic] add 'image' extractor ( closes #1898 )
2021-09-28 23:35:29 +02:00
Mike Fährmann
c22ff97743
remove 'unit' argument from 'util.format_value()'
2021-09-28 23:07:55 +02:00
Mike Fährmann
cad85640de
move 'util.PathFormat' into its own 'path' module
...
to prevent circular imports between 'formatter' and 'util'
2021-09-27 21:29:37 +02:00
Mike Fährmann
74145467dd
move 'util.Formatter' into its own 'formatter' module
2021-09-27 02:37:04 +02:00
Mike Fährmann
9377543162
[mastodon] add 'following' extractor ( #1891 )
2021-09-26 00:12:34 +02:00
Mike Fährmann
bd845303ad
implement a way to shorten filenames with east-asian characters
...
(#1377 )
Setting 'output.shorten' to "eaw" (East-Asian Width) uses a slower
algorithm that also considers characters with a width > 1.
2021-09-13 21:38:33 +02:00
Mike Fährmann
292fffc83c
add 'j' format string conversion
...
to convert to a JSON formatted string
2021-08-28 01:19:36 +02:00
Mike Fährmann
bb6a130942
automatically set required DDoS-GUARD cookies ( #1779 )
...
for kemono.party and seiso.party
2021-08-16 17:40:29 +02:00
Mike Fährmann
2792ed6e4b
implement 'util.format_value()'
2021-07-26 02:11:22 +02:00
Mike Fährmann
9e42cd58ea
replace ChainPredicate class with 'functools.partial'
2021-07-20 20:21:32 +02:00
Mike Fährmann
36ac2197db
[ytdl] add extractor for sites supported by youtube-dl
...
(#1680 , #878 )
Can be used by prefixing any URL with 'ytdl:',
or by setting 'extractor,ytdl.enabled' to 'true'.
2021-07-10 20:55:47 +02:00
Mike Fährmann
64240c8d42
[imagevenue] fix extraction
...
(closes #1677 )
2021-07-09 20:13:18 +02:00
Mike Fährmann
0179581340
add 'T' format string conversion ( #1646 )
...
to convert 'date'/datetime to timestamp
2021-06-25 22:35:45 +02:00
Mike Fährmann
f74cf52e2b
[seisoparty] add 'user' and 'post' extractors ( #1635 )
2021-06-25 18:40:11 +02:00
Mike Fährmann
759735fb02
[kemonoparty] fix 'username' extraction ( fixes #1652 )
...
The site's <title> content changed from
<title>NAME | Kemono</title>
to
<title>
NAME | Kemono
</title>
2021-06-25 15:35:20 +02:00
Mike Fährmann
07c8adbd8b
[mangadex] implement login with username & password ( #1535 )
2021-06-08 02:12:57 +02:00
Mike Fährmann
4a747a31a3
[postprocessor:metadata] handle dicts in mode;tags ( fixes #1598 )
2021-06-04 22:37:43 +02:00
Mike Fährmann
3cbbefd4ed
support 'filter' option for post processors ( #1460 )
2021-06-04 18:23:32 +02:00
Mike Fährmann
0abad8bc12
implement 'compile_expression()'
2021-06-03 22:34:58 +02:00
Mike Fährmann
da6806a161
fix job tests for Python 3.4 and 3.5
...
assert_called() and assert_not_called() got added in Python 3.6
2021-05-22 21:40:52 +02:00
Mike Fährmann
8fd8126117
fix ISO 639-1 code for Japanese
...
"jp" -> "ja"
2021-05-22 16:07:04 +02:00
Mike Fährmann
af9dba4684
add DataJob tests
2021-05-21 02:59:54 +02:00
Mike Fährmann
adf4d661b3
use '_extractor' info in UrlJobs
2021-05-19 15:52:30 +02:00
Mike Fährmann
1eabfa5c7a
[pillowfort] implement login with username & password ( #846 )
2021-05-19 02:59:16 +02:00
Mike Fährmann
559462789d
add some tests for job.py
2021-05-14 19:44:16 +02:00
Mike Fährmann
c5ca7905ce
add 'noop()' and 'identity()' functions
2021-05-04 19:27:17 +02:00
Mike Fährmann
bc868e7bb8
consider apparently long extensions as part of the filename
...
(#1516 )
2021-05-02 21:15:50 +02:00
Mike Fährmann
bdfcc9c4b1
update extractor test results
2021-04-18 20:28:15 +02:00
Mike Fährmann
387fe415d5
unescape items in text.split_html()
2021-03-29 02:12:29 +02:00
Mike Fährmann
78fd63b8f0
remove 'text.clean_xml()'
...
was not used anywhere
2021-03-28 04:05:16 +02:00
Mike Fährmann
8553b218d9
replace calls to 'os.path.splitext()' with 'str.rpartition()'
...
Makes functions who used it more than twice as fast
and we can get rid of an import as well.
2021-03-28 04:01:27 +02:00
Mike Fährmann
bff71cde80
implement 'util.unique_squence()'
2021-03-02 23:11:08 +01:00
Mike Fährmann
5f1a6ff6fa
remove unneeded 'TRAVIS_SKIP' from test_results.py
2021-03-01 01:38:18 +01:00
Mike Fährmann
8821dceb79
use __import__() to dynamically load modules
2021-03-01 01:27:02 +01:00
Mike Fährmann
36bf76fa44
update 'oauth:mastodon:<instance>' code
2021-01-28 02:20:12 +01:00
Mike Fährmann
91308140ec
make 'generate_token()' compatible with Python 3.4
2021-01-14 03:48:10 +01:00
Mike Fährmann
780b6adb91
rename 'generate_csrf_token()' to just 'generate_token()'
...
and add a 'size' argument
2021-01-11 22:12:40 +01:00
Mike Fährmann
0fdaea00a3
[postprocessor:metadata] sanitize filenames
2021-01-10 00:13:20 +01:00
Mike Fährmann
aac00a2024
add 'd' conversion for format strings
...
to convert a timestamp to a formattable 'datetime' object.
For example '{created_at!d:%Y-%m-%d}'
transforms the timestamp in 'created_at' into a 'datetime' object
and then formats its content using '%Y-%m-%d' as template.
1262304000 -> datetime(2010, 1, 1) -> "2010-01-01"
2021-01-09 01:58:44 +01:00
Mike Fährmann
912eea29bc
update extractor test results
2020-12-27 17:41:08 +01:00