153 Commits

Author SHA1 Message Date
Mike Fährmann
1b2f9050fb
rename all instances of 'kwds' to 'kwdict' 2021-07-20 20:21:19 +02:00
Mike Fährmann
0179581340
add 'T' format string conversion (#1646)
to convert 'date'/datetime to timestamp
2021-06-25 22:35:45 +02:00
Mike Fährmann
befe635022
cache parsed Formatter functions 2021-06-22 19:46:04 +02:00
Mike Fährmann
79b7ee2712
use 'functools.partial' in '_build_cleanfunc' when possible
makes calls to the returned function a slight bit faster (~10%)
2021-06-20 23:34:41 +02:00
Mike Fährmann
ceaf7fd989
optimize 'base-directory' initialization and usage
apply 'clean_path()' only once
2021-06-20 21:35:43 +02:00
Mike Fährmann
2ca011dfa8
add 'kwdict' argument to PathFormat.build_filename() 2021-06-20 20:26:38 +02:00
Mike Fährmann
fd00d47116
implement conditional directories (#1394)
They work the same way as conditional filenames (84d2e640), e.g.

"directory": {
    "score >= 20": ["high score"],
    "score >= 5" : ["mid score"],
    ""           : ["{category}", "default"]
}
2021-06-20 20:09:35 +02:00
Mike Fährmann
def0148582
restructure code in PathFormat constructor 2021-06-08 18:05:07 +02:00
Mike Fährmann
84d2e64024
combine conditional filenames into filename option (#1394) 2021-06-08 18:00:06 +02:00
Mike Fährmann
4cf40434d7
initial support for conditional filenames (#1394) 2021-06-04 16:45:32 +02:00
Mike Fährmann
0abad8bc12
implement 'compile_expression()' 2021-06-03 22:34:58 +02:00
Mike Fährmann
8fd8126117
fix ISO 639-1 code for Japanese
"jp" -> "ja"
2021-05-22 16:07:04 +02:00
Mike Fährmann
c5ca7905ce
add 'noop()' and 'identity()' functions 2021-05-04 19:27:17 +02:00
Mike Fährmann
bff71cde80
implement 'util.unique_squence()' 2021-03-02 23:11:08 +01:00
Mike Fährmann
92071d02f4
fix crash when 'base-directory' is an empty string (#1339) 2021-02-24 14:49:17 +01:00
Mike Fährmann
970fc2b2b5
allow setting 'filename' & '(base-)directory' to default
by setting them to 'None'/'null'
2021-02-24 02:24:22 +01:00
Mike Fährmann
91308140ec
make 'generate_token()' compatible with Python 3.4 2021-01-14 03:48:10 +01:00
Mike Fährmann
780b6adb91
rename 'generate_csrf_token()' to just 'generate_token()'
and add a 'size' argument
2021-01-11 22:12:40 +01:00
Mike Fährmann
79501a356f
fix crash when 'path-restrict' is an object/dict
This basically reverts commit 5818c928

(#1234)
2021-01-10 00:13:48 +01:00
Mike Fährmann
5d4494b15f
add "ascii" as a special 'path-restrict' value 2021-01-09 02:41:20 +01:00
Mike Fährmann
5818c928c4
refactor 'path-restrict' parsing 2021-01-09 02:33:42 +01:00
Mike Fährmann
aac00a2024
add 'd' conversion for format strings
to convert a timestamp to a formattable 'datetime' object.

For example '{created_at!d:%Y-%m-%d}'
transforms the timestamp in 'created_at' into a 'datetime' object
and then formats its content using '%Y-%m-%d' as template.

1262304000 -> datetime(2010, 1, 1) -> "2010-01-01"
2021-01-09 01:58:44 +01:00
Mike Fährmann
511d8d3fa3
increase SQLite connection timeouts (#1173) 2020-12-19 20:15:07 +01:00
Mike Fährmann
9b1bd09454
change 'extension-map' default
Replace all JPEG filename extensions with 'jpg'.
2020-11-14 22:40:31 +01:00
Mike Fährmann
e3480bc8de
implement 'extension-map' option (#318) 2020-11-02 15:27:07 +01:00
Mike Fährmann
c3f01dc4e6
implement 'util.unique()' 2020-10-29 23:33:41 +01:00
Mike Fährmann
de4a1e45c9
improve 'generate_csrf_token()'
no need to use hashlib.md5()
2020-10-24 02:56:40 +02:00
Mike Fährmann
ec61696316
add 't' format string conversion (closes #1065)
to Trim whitespace from the beginning and end of strings.
Example: '{field!t}' becomes 'foo' for 'field' == "  \nfoo\t\r"
2020-10-16 00:37:22 +02:00
Mike Fährmann
1b1cf01d0d
add a general 'generate_csrf_token()' function 2020-10-15 15:14:18 +02:00
Mike Fährmann
d5fa716d89
fix crash when using 'skip=false' and archive (fixes #1023)
Separating the archive check from pathfmt.exists() in b5243297
had some unintended side effects.

It is also not possible to monkey-patch a dunder method like
__contains__ because of the special method lookup that gets
performed for them.
2020-09-23 19:07:40 +02:00
Mike Fährmann
65744a7a31
use alternative for all falsey values in format strings
… and not just None (#525)

It would be better to consistently use None for all non-existent
fields and/or fields without a valid value, but this is a good
enough workaround for now.
2020-09-19 22:02:47 +02:00
Mike Fährmann
b5243297ff
write skipped files to archive (closes #550) 2020-09-03 18:37:38 +02:00
Mike Fährmann
4d8b3e4f70
defer directory creation (fixes #722)
Only call os.makedirs() before a file is getting downloaded,
and not immediately for every Directory message.
2020-07-04 22:15:23 +02:00
Mike Fährmann
1ae1df0d27
update '--write-pages' (#737)
- fix infinite recursion for responses with multiple entries in
  'history'
- hide values of Set-Cookie headers
- only write the response content by default
  (use '-o write-pages=all' to also include HTTP headers)
2020-06-18 15:07:30 +02:00
Mike Fährmann
1fcf938f9c
implement a general 'delete_items()' function 2020-06-06 23:49:49 +02:00
Mike Fährmann
ddc253cf9a
implement a 'path-replace' option (#662, #755) 2020-05-25 22:21:58 +02:00
Mike Fährmann
15c3d29062
move dump_response() into a separate function (#737) 2020-05-25 22:21:58 +02:00
Mike Fährmann
bc53302ad6
extend 'path-restrict' option
Allow its value to be a JSON object / Python dict that specifies
a mapping from invalid/unwanted input characters to specific
output characters.

For example {"/": "-", "*": "+"} will transform
"foo / ***bar***" into "foo - +++bar+++"

(closes #662, #755)
2020-05-25 22:21:56 +02:00
Mike Fährmann
3201fe3521
add global SENTINEL object 2020-05-19 22:32:53 +02:00
Mike Fährmann
c8787647ed
add global WINDOWS bool 2020-05-19 22:32:53 +02:00
Mike Fährmann
ece73b5b2a
make 'path' and 'keywords' available in logging messages
Wrap all loggers used by job, extractor, downloader, and postprocessor
objects into a (custom) LoggerAdapter that provides access to the
underlying job, extractor, pathfmt, and kwdict objects and their
properties.

__init__() signatures for all downloader and postprocessor classes have
been changed to take the current Job object as their first argument,
instead of the current extractor or pathfmt.

(#574, #575)
2020-05-18 19:04:51 +02:00
Mike Fährmann
abbd8fbbd9
reset filenames on empty file extensions (#733) 2020-05-18 19:04:50 +02:00
Mike Fährmann
38bc6430d3
[downloader:http] don't overwrite existing '_mtime' fields 2020-04-10 23:08:03 +02:00
Mike Fährmann
9159cb8fb3
remove trailing dots and spaces from directory names (#647) 2020-03-19 21:12:18 +01:00
Mike Fährmann
90e4c645ba
[formatter] allow multiple "special" format specifiers (#595)
It is now, for example, possible to specify multiple replacement
operations per format replacement field: {name:Ra/b/Rc/d/}
2020-02-16 21:47:08 +01:00
Mike Fährmann
219c4cc78c
[formatter] allow for numeric list and string indices 2020-02-15 22:46:22 +01:00
Mike Fährmann
7d1da614d9
[formatter] implement field name alternatives (#525)
The format string '{a|b|c}' will now try to use the value from 'a' and
fall back to 'b' and 'c' if accessing a field raises an exception or
if its value is None.
2020-02-15 17:58:21 +01:00
Mike Fährmann
56f1c96168
implement 'parent-directory' option (#551) 2020-01-29 18:32:37 +01:00
Mike Fährmann
2a9be48511
improve util.load/save_cookiestxt() and add tests
- take a file object as argument instead of an filename
- accept whitespace before comments ("   # comment")
- map expiration "0" to None and not the number 0
2020-01-25 23:02:15 +01:00
Mike Fährmann
c1a6862863
implement functions to load/save cookies.txt files (closes #586)
The methods of the standard libraries' MozillaCookieJar have
several shortcomings (#HttpOnly_ cookies, 0 expiration timestamps, etc.)
and require construction of an ultimately pointless CookieJar object.
2020-01-21 21:59:36 +01:00