gallery-dl

Author	SHA1	Message	Date
Mike Fährmann	6ed629f2b6	allow specifying number of skips before abort/exit (closes #115 ) In addition to 'abort' and 'exit', it is now possible to specify 'abort:N' and 'exit:N' (where N is any integer) as value for 'skip' to abort/exit after consecutively skipping N downloads.	2018-10-13 17:21:55 +02:00
Mike Fährmann	48a8717a7c	add 'output.num-to-str' option ... to convert any numeric values to string when outputting them as JSON (during '--dump-json' or otherwise)	2018-10-08 20:28:54 +02:00
Mike Fährmann	0514d6a0ae	make --filter and --range config-file options The functionality of --(chapter-)filter and --(chapter-)range are now also exposed as the following config-file options: - extractor..image-filter - extractor..image-range - extractor..chapter-filter - extractor..chapter-range TODO: update configuration.rst	2018-10-07 21:39:56 +02:00
Mike Fährmann	4a348990f4	adjust value resolution for retries/timeout/verify options This change introduces 'extractor..retries/timeout/verify' options as a general way to set these values for all HTTP requests. 'downloader.http.retries/timeout/verify' is a way to override these options for file downloads only and will fall back to 'extractor..…* values if they haven't been explicitly set. Also: downloader classes now take an extractor object as first argument instead of a requests.session.	2018-10-07 21:13:39 +02:00
Mike Fährmann	ca6ac4db6a	fix 'content' tests	2018-10-05 21:10:33 +02:00
Mike Fährmann	188876d814	implement youtube-dl downloader module URLs starting with 'ytdl:' will now be handled by youtube-dl. There is probably a lot to fix and improve, but the basic use case works. TODO: - format selection and ytdl options in general - better filename/path handling - ytdl support for "unsupported URLs" - ...	2018-10-05 18:05:11 +02:00
Mike Fährmann	8c8da11bb8	do not create directory structures when using '-s'	2018-09-21 17:55:04 +02:00
Mike Fährmann	41249f3ead	improve extractor.get_downloader()	2018-09-05 18:17:16 +02:00
Mike Fährmann	712b58a93b	[postprocessor] add black-/whitelist options Each post-processor config dict now supports a list of extractor categories for which it should/shouldn't be active for. For example: "postprocessors": [ {"name": "classify", "whitelist": ["tumblr", "deviantart"], ... } ]	2018-09-03 14:53:43 +02:00
Mike Fährmann	4313c95bc9	improve error message for OAuth2 authentication	2018-08-11 23:54:25 +02:00
Mike Fährmann	973cf98e88	fix download skip for files without extension	2018-06-27 17:16:07 +02:00
Mike Fährmann	2403c405e3	Merge branch 'postprocessor'	2018-06-08 17:43:11 +02:00
Mike Fährmann	baccf8a958	improve postprocessor handling - add pathfmt argument for __init__() - add finalization step - add option to keep or delete zipped files	2018-06-08 17:39:02 +02:00
Mike Fährmann	7646bdbcfd	improve postprocessor initialization code	2018-06-07 22:29:54 +02:00
Mike Fährmann	821535b458	adjust PathFormat class	2018-06-06 20:17:17 +02:00
Mike Fährmann	2df1a15fb8	add '-s/--simulate' to run data extraction without download Useful for quick testing (even though -g and -j kind of do the same) and to fill a download archive without actually downloading the files. -s does the same as the default behaviour, except downloading stuff. Maybe it should get a more fitting name, as it does actually write to disk (cache, archive)?	2018-05-25 16:07:18 +02:00
Mike Fährmann	76c32d58e5	[postprocessor] initial code	2018-05-22 14:59:22 +02:00
Mike Fährmann	8bf3cdd82b	implement logging options Standard logging to stderr, logfiles, and unsupported URL files (which are now handled through the logging module) can now be configured by setting their respective option keys (log, logfile, unsupportedfile) to a dict and specifying the following options; - format: format string for logging messages available keys: see [1] default: "[{name}][{levelname}] {message}" - format-date: format string for {asctime} fields in logging messages available keys: see [2] default: "%Y-%m-%d %H:%M:%S" - level: the lowercase levelname until which the logger should activate; available levels are debug, info, warning, error, exception default: "info" - path: path of the file to be written to - mode: 'mode' argument when opening the specified file can be either "w" to truncate the file or "a" to append to it (see [3]) If 'output.log', '.logfile', or '.unsupportedfile' is a string, it will be interpreted, as it has been, as the filepath (or as format string for .log) [1] https://docs.python.org/3/library/logging.html#logrecord-attributes [2] https://docs.python.org/3/library/time.html#time.strftime [3] https://docs.python.org/3/library/functions.html#open	2018-05-01 17:54:52 +02:00
Mike Fährmann	9fb82e6b43	apply expand_path() to archive paths	2018-03-08 18:06:39 +01:00
Mike Fährmann	f970a8f13c	fix adding keys to download archive when using skip=false	2018-02-13 23:45:30 +01:00
Mike Fährmann	be3ea4425d	test archive-id creation and uniqueness	2018-02-12 23:02:09 +01:00
Mike Fährmann	3cec533c28	Merge branch 'archive'	2018-02-12 18:07:58 +01:00
Mike Fährmann	4d2fadfb6f	restore skip actions with download archive	2018-02-12 16:56:45 +01:00
Mike Fährmann	7f7c16ae37	add option to specify additional key-value pairs	2018-02-08 23:10:58 +01:00
Mike Fährmann	8c3b713362	rework DownloadJob.handle_url(); include archive functionality todo: "abort" and "exit" skip modes if download is skipped because of archive	2018-02-01 20:49:41 +01:00
Mike Fährmann	db7f04dd97	emit log messages on download failure and when retrying with fallback URLs	2018-01-28 18:44:10 +01:00
Mike Fährmann	27fce6f600	fix UrlJob behavior	2018-01-23 15:42:26 +01:00
Mike Fährmann	b837420291	fix minor urllist issues	2018-01-19 22:54:15 +01:00
Mike Fährmann	9d69401391	initial support for multiple URLs per image	2018-01-17 22:08:19 +01:00
Mike Fährmann	6174a5c4ef	[download] adjust filename extension on filetype mismatch (closes #63)	2018-01-17 18:37:06 +01:00
Mike Fährmann	1a70857a12	update extractor-unittest capabilities - "count" can now be a string defining a comparison in the form of '<operator> <value>', for example: '> 12' or '!= 1'. If its value is not a string, it is assumed to be a concrete integer as before. - "keyword" can now be a dictionary defining tests for individual keys. These tests can either be a type, a concrete value or a regex starting with "re:". Dictionaries can be stacked inside each other. Optional keys can be indicated with a "?" before its name. For example: "keyword:" { "image_id": int, "gallery_id", 123, "name": "re:pattern", "user": { "id": 321, }, "?optional": None, }	2017-12-30 19:05:37 +01:00
Mike Fährmann	88bb0798fd	delay initialization of PathFormat objects This allows the DeviantArt group-check to be moved inside the Extractor.items() method which in turn allows for better exception handling. As a new general rule: Never raise exceptions during extractor initialization.	2017-12-29 22:15:57 +01:00
Mike Fährmann	9d73ed4772	fix issue with using 'skip()' when a filter is present calling skip() skips over unfiltered items and does not apply the filter expression to them, which is not what should happen	2017-12-27 22:09:10 +01:00
Mike Fährmann	291369eab2	various smaller changes/additions	2017-12-06 21:45:56 +01:00
Mike Fährmann	4fb6803fa6	add option to sleep before each download	2017-12-04 17:33:10 +01:00
Mike Fährmann	6c9da67581	apply selection options (filter, range) when using '-j'	2017-11-18 17:35:57 +01:00
Mike Fährmann	27c026543f	re-enable download unit tests	2017-10-25 12:55:36 +02:00
Mike Fährmann	2e982f56af	use 'Content-Length' to determine incomplete downloads (#29 )	2017-10-20 18:56:18 +02:00
Mike Fährmann	2ef3c35c98	smaller textual changes - swapped doc for deviantart.mature and .original - updated gallery-dl.conf - "transferred" -> "delegated"	2017-10-09 23:23:19 +02:00
Mike Fährmann	0386503c80	fix (sub)category-transfer for DownloadJob instances (#41 ) ... and extend "parent" parameters to TestJob- and DataJob-classes as well.	2017-10-06 15:38:35 +02:00
Mike Fährmann	b319f4bab3	smaller code and text changes	2017-10-01 18:23:40 +02:00
Mike Fährmann	26a866e7d8	implement (sub)category-transfer between extractors (#41 ) ImageFap- and all Manga-Extractors will transfer their (sub)category values to other extractors instantiated by them, which will in turn allow those to use options set for their parents. Example: ImagefapGalleryExtractors will use options set under extractor.imagefap.user, if (and only if) they have been instantiated by a ImagefapUserExtractor; and options from extractor.imagefap.gallery otherwise.	2017-09-26 21:05:11 +02:00
Mike Fährmann	9c138dfc1f	[common] detect empty HTTP response bodies	2017-09-26 16:49:58 +02:00
Mike Fährmann	0dedbe759c	enable '--chapter-filter' The same filter infrastructure that can be applied to image URLS now also works for manga chapters and other delegated URLs. TODO: actually provide any metadata (currently supported is only deviantart and imagefap).	2017-09-12 16:19:00 +02:00
Mike Fährmann	5704c709fa	apply filter before range	2017-09-09 14:51:31 +02:00
Mike Fährmann	9b21d3f13c	add '--filter' command-line option This allows for image filtering via Python expressions by the same metadata that is also used to build filenames (--list-keywords). The usually shunned eval() function is used to evaluate filter-expressions, but it seemed quite appropriate in this case and shouldn't introduce any new security issues, as any attacker that could do > gallery-dl --filter "delete-everything()" ... could as well do > python -c "delete-everything()"	2017-09-08 17:52:00 +02:00
Mike Fährmann	268cfa3cfe	filter duplicate URLs (#36 ) Duplicate URLs might occur if, for example, an artist adds another image to his gallery while an extractor is running and images are being downloaded on sites like pixiv/nijie/hentaifoundry. The next image on the next page will have already been downloaded and will cause a premature end if '--abort-on-skip' is being used.	2017-09-06 17:08:50 +02:00
Mike Fährmann	47bcf53ec1	implement support for additional unit test result types - "pattern" matches all resulting URLs against the given regex - "count" allows to specify the amount of returned URLs	2017-08-25 22:01:14 +02:00
Mike Fährmann	ae2d61e5b3	handle format string exceptions separately	2017-08-11 21:48:37 +02:00
Mike Fährmann	3c9f190757	extend output of --list-keywords	2017-08-10 17:36:21 +02:00

1 2

82 Commits