Mike Fährmann
179d112083
[downloader] overhaul http and text modules
...
Get rid of the modular structure and simplify/specialize those modules.
2019-06-19 22:56:11 +02:00
Mike Fährmann
6da3e21237
[downloader:ytdl] provide 'filename' metadata ( closes #291 )
2019-05-31 14:56:45 +02:00
Mike Fährmann
7973419b54
restrict downloader and postprocessor module imports
2019-04-16 18:09:30 +02:00
Mike Fährmann
114b8eecc5
[downloader;ytdl] utilize '_ytdl_index' metadata fields
2019-03-24 11:27:20 +01:00
Mike Fährmann
c14d44e1bc
[downloader:common] retry downloads on SSL errors ( #130 )
2018-12-14 16:33:04 +01:00
Mike Fährmann
b17a5d6f3b
give downloader classes proper names
2018-11-16 14:40:05 +01:00
Mike Fährmann
655549df7c
[downloader:ytdl] add several options
...
The "default" downloader options (rate, retries, timeout, verify) are
mapped to corresponding youtube-dl options.
downloader.ytdl.logging tells the downloader to pass youtube-dl's output
to a Logger object.
downloader.ytdl.raw-options allows to pass arbitrary options to the
YoutubeDL constructor.
2018-10-20 18:26:49 +02:00
Mike Fährmann
4a348990f4
adjust value resolution for retries/timeout/verify options
...
This change introduces 'extractor.*.retries/timeout/verify' options
as a general way to set these values for all HTTP requests.
'downloader.http.retries/timeout/verify' is a way to override these
options for file downloads only and will fall back to 'extractor.*.…*
values if they haven't been explicitly set.
Also: downloader classes now take an extractor object as first argument
instead of a requests.session.
2018-10-07 21:13:39 +02:00
Mike Fährmann
188876d814
implement youtube-dl downloader module
...
URLs starting with 'ytdl:' will now be handled by youtube-dl.
There is probably a lot to fix and improve, but the basic use case
works.
TODO:
- format selection and ytdl options in general
- better filename/path handling
- ytdl support for "unsupported URLs"
- ...
2018-10-05 18:05:11 +02:00
Mike Fährmann
e9ae6fd080
improve downloader/postprocessor module loading
...
- handle arguments of any type without propagating an exception
- prevent potential security risk through relative imports
2018-09-05 16:39:40 +02:00
Mike Fährmann
973cf98e88
fix download skip for files without extension
2018-06-27 17:16:07 +02:00
Mike Fährmann
821535b458
adjust PathFormat class
2018-06-06 20:17:17 +02:00
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
2018-04-20 14:53:21 +02:00
Mike Fährmann
1d54a8e07d
fix logging output during downloads
...
from:
filename.ext[download][warning] ...
to:
filename.ext
[download][warning] ...
2018-03-01 18:43:43 +01:00
Mike Fährmann
915807dd77
log HTTP errors as warnings
2018-01-29 21:55:46 +01:00
Mike Fährmann
f94e3706a8
use logging module for error messages during downloads
2018-01-26 18:11:13 +01:00
Mike Fährmann
b837420291
fix minor urllist issues
2018-01-19 22:54:15 +01:00
Mike Fährmann
6174a5c4ef
[download] adjust filename extension on filetype mismatch
...
(closes #63 )
2018-01-17 18:37:06 +01:00
Mike Fährmann
ebe9b0a04c
another attempt at downloader retry behavior
...
This commit changes the general behavior from
'Retry on every exception and abort on DownloadError' to
'Only retry on DownloadRetry exceptions and abort on every other one'
The previous version would have retried on several states which
would have no chance of ever succeeding (invalid URLs, etc.)
2017-12-07 15:31:14 +01:00
Mike Fährmann
8f518e03f8
add options to set maximum download rate
...
- -r/--limit-rate as cmdline option
- downloader.http.rate as config option
This implementation very roughly uses the idea of the token bucket
algorithm [1] and mostly uses Wget's approach [2] as inspiration.
[1] https://en.wikipedia.org/wiki/Token_bucket
[2] http://git.savannah.gnu.org/cgit/wget.git/tree/src/retr.c?h=v1.19.2&id=ba6b44f6745b14dce414761a8e4b35d31b176bba#n111
2017-12-02 01:47:26 +01:00
Mike Fährmann
3dc1169736
use own mapping before relying on the 'mimetypes' module
2017-12-01 13:50:31 +01:00
Mike Fährmann
79bcaa8726
improve downloader retry behavior
...
- only retry download on 5xx and 429 status codes
- immediately fail on 4xx status codes
2017-11-10 21:46:18 +01:00
Mike Fährmann
42e948584d
fix downloader error handling
...
RequestException being a subclass of OSError caused all exceptions
during file downloads to be ignored/re-raised.
2017-11-07 15:23:07 +01:00
Mike Fährmann
707b15b586
create missing directories for 'part-directory'
...
also some code improvements regarding downloader config values
2017-10-27 12:22:45 +02:00
Mike Fährmann
caf26412dd
add option to set alternate location of .part files ( #29 )
...
Note: The path set for 'downloader.*.part-directory' needs to point to an
already existing directory.
2017-10-26 00:16:48 +02:00
Mike Fährmann
9a41002b77
fix partial downloads for 'text:' URLs
...
Using a filesize in bytes as offset into a Python string is not
a good idea if said file contains non-ASCII characters.
2017-10-25 15:04:45 +02:00
Mike Fährmann
963670d73b
add options to control usage of .part files ( #29 )
...
- '--no-part' command line option to disable them
- 'downloader.http.part' and 'downloader.text.part' config options
Disabling .part files restores the behaviour of the old downloader
implementation.
2017-10-24 23:33:44 +02:00
Mike Fährmann
b0353aa02d
rewrite download modules ( #29 )
...
- use '.part' files during file-download
- implement continuation of incomplete downloads
- check if file size matches the one reported by server
2017-10-24 12:53:03 +02:00
Mike Fährmann
2e982f56af
use 'Content-Length' to determine incomplete downloads ( #29 )
2017-10-20 18:56:18 +02:00
Mike Fährmann
b8862ff15e
add 'downloader.http.verify' option
...
(also: change the default 'timeout' from None to 30)
2017-08-31 15:21:08 +02:00
Mike Fährmann
d70c66c516
fix "text:" downloader
2017-08-16 12:11:47 +02:00
Mike Fährmann
58e95a7487
share extractor and downloader sessions
...
There was never any "good" reason for the strict separation
between extractors and downloaders. This change allows for
reduced resource usage (probably unnoticeable) and less lines
of code at the "cost" of tighter coupling.
2017-06-30 19:38:14 +02:00
Mike Fährmann
fac6c02224
[downloader] fix extension from content-type
2017-06-19 09:24:00 +02:00
Mike Fährmann
107d29ad8a
improve handling of text:... URLs
...
- don't require // after the colon
- open output files in text mode
2017-05-12 14:10:25 +02:00
Mike Fährmann
48a5b11204
fix error if no file extension is found
2017-04-26 12:31:42 +02:00
Mike Fährmann
e3212dd98f
fix some smaller stuff
...
- remove support for old windows config paths
- catch exception if cache-database can't be opened
- fix username/password settings for unit tests
- rename variable 'max_tries' to 'retries'
2017-03-27 14:30:32 +02:00
Mike Fährmann
e2b5cd9918
change config-path for 'retries' and 'timeout'
2017-03-26 18:24:46 +02:00
Mike Fährmann
0b5076815d
always delete incompletely downloaded files
2017-03-21 15:53:43 +01:00
Mike Fährmann
22910f9562
improve error handling of http file downloads
...
(#10 )
2017-03-16 04:17:35 +01:00
Mike Fährmann
4f123b8513
code adjustments according to pep8
2017-01-30 19:40:15 +01:00
Mike Fährmann
3c1daef839
don't delete downloaded files in certain edge cases
2016-11-27 23:43:25 +01:00
Mike Fährmann
2b2bdce366
don't raise an exception if a download fails ( #5 )
2016-11-23 13:07:44 +01:00
Mike Fährmann
dd8236e733
enable non-standard MIME types
2016-09-30 16:41:49 +02:00
Mike Fährmann
29692c5784
get extension from Content-Type header if not provided
2016-09-30 12:32:48 +02:00
Mike Fährmann
ecc6542fc8
change required parameter type to file-like objects
2015-12-21 22:46:49 +01:00
Mike Fährmann
a8c0b4531d
fix issue with Ctrl+c on windows
2015-12-02 01:01:33 +01:00
Mike Fährmann
4b377ccc09
use output-module during downloads
2015-12-01 21:22:58 +01:00
Mike Fährmann
352950eebe
new method to import downloaders
2015-11-12 02:29:59 +01:00
Mike Fährmann
28fa7c53b4
docstrings and other small fixes for downloaders
2015-04-10 21:45:41 +02:00
Mike Fährmann
5545624da1
use seperate session in http downloader
2015-04-10 19:19:12 +02:00