102 Commits

Author SHA1 Message Date
Mike Fährmann
5bcf28de93
add a 'extractor.modules' option 2020-10-25 03:05:10 +01:00
Mike Fährmann
6ecb0a19cf
handle sys.stdin being None when using '-' as input file (#653) 2020-03-25 22:33:39 +01:00
Mike Fährmann
4bc161ca0f
prevent crash when sys.stdout and co. are None (#653) 2020-03-23 23:38:55 +01:00
Mike Fährmann
383795b550
prevent superfluous calls to Logger.makeRecord()
… by setting an appropriate minimal logging level for the root Logger.
2020-01-30 15:19:06 +01:00
Mike Fährmann
f5604492c3
update interface of config functions 2019-11-24 00:42:28 +01:00
Mike Fährmann
5af291ba5c
include failed downloads and child extractors in exit status 2019-10-29 15:56:54 +01:00
Mike Fährmann
03e0cec715
return with non-zero exit status on error 2019-10-27 23:54:18 +01:00
Mike Fährmann
5ac9732adc
call 'sys.exit()' on Ctrl+c 2019-09-10 16:53:21 +02:00
Mike Fährmann
6393b47db2
add '-A/--abort'; deprecate '--abort-on-skip' 2019-06-30 14:28:28 +02:00
Mike Fährmann
bd9cb3d191
improve job class selection code
+ consistent argument order for add_argument() calls
2019-05-10 22:05:57 +02:00
Mike Fährmann
e64773ffdd
allow multiple post-processor command-line options (#253)
... without overwriting any previous ones
2019-05-10 15:32:23 +02:00
Mike Fährmann
bc26fc2439
implement '--clear-cache'
Effectively clears all cached values from the cache database by
executing "DELETE FROM data" without any further user input.
2019-04-25 21:31:01 +02:00
Mike Fährmann
176b7253a1
update function signature for config.load() 2019-03-01 14:13:34 +01:00
Mike Fährmann
ae353ed3b0
provide "extractor" and "job" keys for logging output
This allows for stuff like "{extractor.url}" and "{extractor.category}"
in logging format strings.
Accessing 'extractor' and 'job' in any way will return "None" if those
fields aren't defined, i.e. in general logging messages.
2019-02-14 11:09:58 +01:00
Mike Fährmann
bc0951d974
allow for simplified test data structures
Instead of a strict list of (URL, RESULTS)-tuples, extractor result
tests can now be a single (URL, RESULTS)-tuple, if it's just one test,
and "only matching" tests can now be a simple string.
2019-02-06 17:24:44 +01:00
Mike Fährmann
344bbaa71a
remove useless line
A remnant from when `filter` and `range` were global and only
available as command line options.
2019-01-11 12:25:49 +01:00
Mike Fährmann
e26ba682a2
enforce utf-8 encoding for input files (#120) 2018-11-10 18:27:01 +01:00
Mike Fährmann
a36259d8f1
update setup.py
- add Python version check
- add classifiers
- simplify sys.exit() usage
2018-10-24 14:43:37 +02:00
Mike Fährmann
0514d6a0ae
make --filter and --range config-file options
The functionality of --(chapter-)filter and --(chapter-)range are now
also exposed as the following config-file options:

- extractor.*.image-filter
- extractor.*.image-range
- extractor.*.chapter-filter
- extractor.*.chapter-range

TODO: update configuration.rst
2018-10-07 21:39:56 +02:00
Mike Fährmann
39f609b4c6
include current Git HEAD in debug output 2018-07-17 22:44:32 +02:00
Mike Fährmann
e8311eb1ed
drop Python 3.3 support 2018-07-17 21:21:27 +02:00
Mike Fährmann
12797e3b1f
update configuration.rst
... again

- some more 'Path' references
- fixed some inconsistencies and errors
- added note about logging config for files
2018-05-28 22:14:38 +02:00
Mike Fährmann
b08d95ebe4
add an 'encoding' option for logging files (default 'utf-8') 2018-05-25 16:29:45 +02:00
Mike Fährmann
2df1a15fb8
add '-s/--simulate' to run data extraction without download
Useful for quick testing (even though -g and -j kind of do the same)
and to fill a download archive without actually downloading the files.

-s does the same as the default behaviour, except downloading stuff.
Maybe it should get a more fitting name, as it does actually write to
disk (cache, archive)?
2018-05-25 16:07:18 +02:00
Mike Fährmann
8bf3cdd82b
implement logging options
Standard logging to stderr, logfiles, and unsupported URL files (which
are now handled through the logging module) can now be configured by
setting their respective option keys (log, logfile, unsupportedfile)
to a dict and specifying the following options;

- format:
    format string for logging messages
    available keys: see [1]
    default: "[{name}][{levelname}] {message}"
- format-date:
    format string for {asctime} fields in logging messages
    available keys: see [2]
    default: "%Y-%m-%d %H:%M:%S"
- level:
    the lowercase levelname until which the logger should activate;
    available levels are debug, info, warning, error, exception
    default: "info"
- path:
    path of the file to be written to
- mode:
    'mode' argument when opening the specified file
    can be either "w" to truncate the file or "a" to append to it (see [3])

If 'output.log', '.logfile', or '.unsupportedfile' is a string, it will
be interpreted, as it has been, as the filepath
(or as format string for .log)

[1] https://docs.python.org/3/library/logging.html#logrecord-attributes
[2] https://docs.python.org/3/library/time.html#time.strftime
[3] https://docs.python.org/3/library/functions.html#open
2018-05-01 17:54:52 +02:00
Mike Fährmann
0381ae5318
replace error handlers for stdout and co.
Python3.5 and lower throw an UnicodeEncodeError when trying to print
not-encodable characters when not using 'utf-8' as encoding.
Setting their error handlers to 'replace' should help.
2018-04-04 17:30:42 +02:00
Mike Fährmann
b50bdbf3d7
change config specifiers in input file format
Instead of a dictionary/object, input file options are now specified
by a 'key=value' pair starting with '-' for options only applying to
the next URL or '-G' for Global options applying to all following URLs.

See the docstring of parse_inputfile() for details.

Example option specifiers:

- filename = "{id}.{extension}"
- extractor.pixiv.user.directory = ["Pixiv Users", "{user[id]}"]
-spaces="are_optional"
-G keywords = {"global": "option"}
2018-02-16 03:10:41 +01:00
Mike Fährmann
7f7c16ae37
add option to specify additional key-value pairs 2018-02-08 23:10:58 +01:00
Mike Fährmann
057668e17e
extend input-file format with per-URL config and comments
- see docstring of parse_inputfile() for details
- TODO: unittests, recursion (currently setting for example
  {"extractor": {"key": "value"}} will override the whole "extractor"
  branch instead of merging {"key": "value"} into the already existing
  dictionary)
2018-02-07 21:47:27 +01:00
Mike Fährmann
d951f13e37
add config option for unsupported-URL file
for consistency's sake
2018-01-28 18:42:10 +01:00
Mike Fährmann
364e335440
smaller adjustments and improvements
- requests and urllib3 version on 1 line
- close input file after reading from it
- use expand_path for unsupported-urls file
- remove unnecessary logging from options.py
2018-01-27 01:05:17 +01:00
Mike Fährmann
c9a9664a65
change --write-log behaviour
- log files now get truncated when opening them
  (mode "w" instead of "a")
- log verbosity to file depends on -q/-v
  (same  as logging to stderr)
2018-01-27 00:51:40 +01:00
Mike Fährmann
97f4f15ec0
add option to write logging output to a file
- '--write-log FILE' as cmdline argument
- 'output.logfile' as config file option
2018-01-26 18:51:51 +01:00
Mike Fährmann
5488643fac
add requests and urllib3 versions to debug output 2017-12-27 22:12:40 +01:00
Mike Fährmann
0e5057b15d
remove deprecated options 2017-12-02 15:31:57 +01:00
Mike Fährmann
8a97bd0433
rename '--images' and '--chapters'
... to '--range' and '--chapter-range' to be consistent with
'--filter' and '--chapter-filter'
2017-09-23 17:31:40 +02:00
Mike Fährmann
0dedbe759c
enable '--chapter-filter'
The same filter infrastructure that can be applied to image URLS now
also works for manga chapters and other delegated URLs.

TODO: actually provide any metadata (currently supported is only
deviantart and imagefap).
2017-09-12 16:19:00 +02:00
Mike Fährmann
470bbe9d8c
fix smaller stuff
- change filename option in example config file
- adapt default filename format for mangafox
- remove unnecessary newline

[skip ci]
2017-09-11 17:07:29 +02:00
Mike Fährmann
9b21d3f13c
add '--filter' command-line option
This allows for image filtering via Python expressions by the same
metadata that is also used to build filenames (--list-keywords).

The usually shunned eval() function is used to evaluate
filter-expressions, but it seemed quite appropriate in this case and
shouldn't introduce any new security issues, as any attacker that could do
> gallery-dl --filter "delete-everything()" ...
could as well do
> python -c "delete-everything()"
2017-09-08 17:52:00 +02:00
Mike Fährmann
f7de048980
add additional debug output 2017-08-13 20:35:44 +02:00
Mike Fährmann
06c4cae05b
extend the output of '--list-extractors'
It now includes category and subcategory values for
each extractor class.
2017-06-28 18:51:47 +02:00
Mike Fährmann
d5a70f2580
add simple progress indicator for multiple URLs (#19)
The output can be configured via the 'output.progress'
config value.

Possible values:
    - true:     Show the default progress indicator
                "[{current}/{total}] {url}" (default)
    - false:    Never show the progress indicator
    - <string>: Show the progress indicator using this
                as a custom format string(1).
                Possible replacement keys are:
                - current: current URL index
                - total  : total number of URLs
                - url    : current URL

(1) https://docs.python.org/3/library/string.html#formatstrings
2017-06-09 20:12:15 +02:00
Mike Fährmann
25bcdc8aa9
add --write-unsupported option (#15) 2017-05-27 16:16:57 +02:00
Mike Fährmann
701c016b97
add '-q/--quiet' option 2017-04-26 11:33:19 +02:00
Mike Fährmann
f0aa35ac84
add '--ignore-config' option 2017-04-25 17:09:10 +02:00
Mike Fährmann
5af35ea150
add -v/--verbose option and reduce error verbosity
(#12)
2017-04-18 11:38:48 +02:00
Mike Fährmann
b43cd88101
add '-j/--dump-json' option
this outputs the extractor-results in JSON format rather then
downloading files
2017-04-12 18:43:41 +02:00
Mike Fährmann
e4b3077168
improve config module
- speed improvements, especially in the 'interpolate' function
- 'interpolate' now prioritizes base-level values if they exist
  - "username" is chosen before "extractor.<category>.username"
  - -u/--username & co can now override config-file values
2017-03-27 11:59:27 +02:00
Mike Fährmann
11d5c6f717
move option parsing to seperate module 2017-03-23 16:29:40 +01:00
Mike Fährmann
abfe7456d6
add '-R/--retries' and '--http-timeout' options
(#10)
2017-03-16 04:28:40 +01:00