118 Commits

Author SHA1 Message Date
Mike Fährmann
f5604492c3
update interface of config functions 2019-11-24 00:42:28 +01:00
Mike Fährmann
3fc1e12949
[postprocessor:metadata] filter private entries
i.e. keys starting with an underscore
2019-11-21 16:58:44 +01:00
Mike Fährmann
9e88e7a344
[postprocessor:exec] improve (#421, #413)
- add 'final' option
- include job status in pp finalization
- improve and extend documentation
2019-11-03 21:45:45 +01:00
Mike Fährmann
5af291ba5c
include failed downloads and child extractors in exit status 2019-10-29 15:56:54 +01:00
Mike Fährmann
322c2e7ed4
renaming variables
mostly 'keyword(s)' to 'kwdict'
2019-10-29 15:46:35 +01:00
Mike Fährmann
4409d00141
embed error messages in StopExtraction exceptions 2019-10-28 16:39:49 +01:00
Mike Fährmann
c887493a80
overhaul exception stuff 2019-10-27 23:53:37 +01:00
Mike Fährmann
389d2d7e38
implement 'cookies-update' option (#445) 2019-10-19 15:23:55 +02:00
Mike Fährmann
03bc8adfc7
[postprocessor:exec] run after file moved to target location
(#421)
2019-10-06 23:12:22 +02:00
Mike Fährmann
776e9e073f
close archive on job completion (#417) 2019-09-10 22:43:51 +02:00
Mike Fährmann
9178b54eae
handle errors when opening download archive file (#417) 2019-09-10 16:44:47 +02:00
Mike Fährmann
682105b8ee
prevent crash when loading unavailable downloader (#405) 2019-08-31 21:58:33 +02:00
Mike Fährmann
5f8621b29d
improve output of active post processor modules 2019-08-15 13:31:04 +02:00
Mike Fährmann
0bb873757a
update PathFormat class
- change 'has_extension' from a simple flag/bool to a field that
  contains the original filename extension
- rename 'keywords' to 'kwdict' and some other stuff as well
- inline 'adjust_path()'
- put enumeration index before filename extension (#306)
2019-08-12 21:40:37 +02:00
Mike Fährmann
8dc42bb178
implement 'enumerate' for 'extractor.skip' (#306)
[ci skip]
2019-08-08 18:37:54 +02:00
Mike Fährmann
20f7b07312
ensure postproc finalize() is called during C-c or crash (#355) 2019-07-27 11:14:52 +02:00
Mike Fährmann
7b77ecc35a
fix paths for files without extension (#220) 2019-07-15 16:39:03 +02:00
Mike Fährmann
62097284fe
add 'download' option (#220) 2019-07-14 18:48:18 +02:00
Mike Fährmann
fe7805de7c
improve attribute access in DownloadJob.handle_url()
Storing a value in a local variable an accessing it that way is faster
than going through 'self' if it is accessed more than once.
2019-07-13 21:42:07 +02:00
Mike Fährmann
f2000a69aa
implement 'image-unique' and 'chapter-unique' options (#303)
The default value for both is 'false', i.e. duplicate URLs are NOT
ignored.

The previous behavior was to always ignore duplicate URLs to make
'--abort-on-skip' work properly when new images where added to the
beginning of a collection while gallery-dl is running.
2019-06-29 22:50:17 +02:00
Mike Fährmann
ee4d7c3d89
update downloader.find() and related code
Instead of replacing 'https' with 'http' for every URL in
'get_downloader()', this now only happens once during downloader
initialization. Also unit tests.
2019-06-20 16:59:44 +02:00
Mike Fährmann
523ebc9b0b
Fix serialization of 'datetime' objects in '--write-metadata'
Simplified universal serialization support in json.dump() can be achieved
by passing 'default=str', which was already the case in DataJob.run()
for -j/--dump-json, but not for the 'metadata' post-processor.

This commit introduces util.dump_json() that (more or less) unifies the
JSON output procedure of both --write-metadata and --dump-json.

(#251, #252)
2019-05-09 16:49:22 +02:00
Mike Fährmann
b09a8184ca
move TestJob into test module; test _extractor values 2019-02-17 18:18:31 +01:00
Mike Fährmann
ae353ed3b0
provide "extractor" and "job" keys for logging output
This allows for stuff like "{extractor.url}" and "{extractor.category}"
in logging format strings.
Accessing 'extractor' and 'job' in any way will return "None" if those
fields aren't defined, i.e. in general logging messages.
2019-02-14 11:09:58 +01:00
Mike Fährmann
89ee8cd7e4
filter "private" kwdict entries 2019-02-13 13:22:11 +01:00
Mike Fährmann
61741d7333
provide type information for Queue messages
Child extractors are now directly constructed with Extractor.from_url()
if the extractor class is known beforehand, instead of using
extractor.find() and searching through all possible extractor classes.
2019-02-12 21:32:32 +01:00
Mike Fährmann
277b52101a
add 'category-transfer' option
[ci skip]
2019-01-19 20:28:19 +01:00
Mike Fährmann
5f38ac9609
[postprocessor:exec] add a better error message (#155) 2019-01-13 13:59:11 +01:00
Mike Fährmann
0225d90078
add exception name and traceback for OSErrors 2018-12-04 19:24:50 +01:00
Mike Fährmann
fb53b5dd55
fix control+c during -j and range tests 2018-11-25 18:54:05 +01:00
Mike Fährmann
13cb270326
set target directory before postprocessor init (fixes #126) 2018-11-21 22:21:26 +01:00
Mike Fährmann
b828473aa3
retry HTTP requests for more exception classes 2018-11-19 15:49:13 +01:00
Mike Fährmann
c47482b110
smaller changes, missing docs, etc.
- make 'netrc' extractor-specific
- rename 'downloader.enable' to 'enabled'
- document 'downloader.ytdl.format'
- consistent newlines in configuration.rst
2018-11-16 18:18:07 +01:00
Mike Fährmann
3c25fa2dad
update build_testresult_db.py script 2018-11-15 22:58:14 +01:00
Mike Fährmann
8ef84a6823
add option to enable/disable specific downloader modules
... and write URLs with no (active) downloader to unsupported-file
2018-11-13 18:06:36 +01:00
Mike Fährmann
d3d7f01543
add 'prepare()' step for post-processors
This allows post-processors to modify the destination path before
checking if a file already exists.
2018-10-18 22:32:03 +02:00
Mike Fährmann
6ed629f2b6
allow specifying number of skips before abort/exit (closes #115)
In addition to 'abort' and 'exit', it is now possible to specify
'abort:N' and 'exit:N' (where N is any integer) as value for 'skip'
to abort/exit after consecutively skipping N downloads.
2018-10-13 17:21:55 +02:00
Mike Fährmann
48a8717a7c
add 'output.num-to-str' option
... to convert any numeric values to string when outputting them as JSON
(during '--dump-json' or otherwise)
2018-10-08 20:28:54 +02:00
Mike Fährmann
0514d6a0ae
make --filter and --range config-file options
The functionality of --(chapter-)filter and --(chapter-)range are now
also exposed as the following config-file options:

- extractor.*.image-filter
- extractor.*.image-range
- extractor.*.chapter-filter
- extractor.*.chapter-range

TODO: update configuration.rst
2018-10-07 21:39:56 +02:00
Mike Fährmann
4a348990f4
adjust value resolution for retries/timeout/verify options
This change introduces 'extractor.*.retries/timeout/verify' options
as a general way to set these values for all HTTP requests.

'downloader.http.retries/timeout/verify' is a way to override these
options for file downloads only and will fall back to 'extractor.*.…*
values if they haven't been explicitly set.

Also: downloader classes now take an extractor object as first argument
instead of a requests.session.
2018-10-07 21:13:39 +02:00
Mike Fährmann
ca6ac4db6a
fix 'content' tests 2018-10-05 21:10:33 +02:00
Mike Fährmann
188876d814
implement youtube-dl downloader module
URLs starting with 'ytdl:' will now be handled by youtube-dl.
There is probably a lot to fix and improve, but the basic use case
works.

TODO:
- format selection and ytdl options in general
- better filename/path handling
- ytdl support for "unsupported URLs"
- ...
2018-10-05 18:05:11 +02:00
Mike Fährmann
8c8da11bb8
do not create directory structures when using '-s' 2018-09-21 17:55:04 +02:00
Mike Fährmann
41249f3ead
improve extractor.get_downloader() 2018-09-05 18:17:16 +02:00
Mike Fährmann
712b58a93b
[postprocessor] add black-/whitelist options
Each post-processor config dict now supports a list of extractor
categories for which it should/shouldn't be active for.

For example:
"postprocessors": [
    {"name": "classify",
     "whitelist": ["tumblr", "deviantart"],
     ...
    }
]
2018-09-03 14:53:43 +02:00
Mike Fährmann
4313c95bc9
improve error message for OAuth2 authentication 2018-08-11 23:54:25 +02:00
Mike Fährmann
973cf98e88
fix download skip for files without extension 2018-06-27 17:16:07 +02:00
Mike Fährmann
2403c405e3
Merge branch 'postprocessor' 2018-06-08 17:43:11 +02:00
Mike Fährmann
baccf8a958
improve postprocessor handling
- add pathfmt argument for __init__()
- add finalization step
- add option to keep or delete zipped files
2018-06-08 17:39:02 +02:00
Mike Fährmann
7646bdbcfd
improve postprocessor initialization code 2018-06-07 22:29:54 +02:00