gallery-dl

Author	SHA1	Message	Date
Mike Fährmann	3201fe3521	add global SENTINEL object	2020-05-19 22:32:53 +02:00
Mike Fährmann	c8787647ed	add global WINDOWS bool	2020-05-19 22:32:53 +02:00
Mike Fährmann	ece73b5b2a	make 'path' and 'keywords' available in logging messages Wrap all loggers used by job, extractor, downloader, and postprocessor objects into a (custom) LoggerAdapter that provides access to the underlying job, extractor, pathfmt, and kwdict objects and their properties. __init__() signatures for all downloader and postprocessor classes have been changed to take the current Job object as their first argument, instead of the current extractor or pathfmt. (#574, #575)	2020-05-18 19:04:51 +02:00
Mike Fährmann	abbd8fbbd9	reset filenames on empty file extensions (#733 )	2020-05-18 19:04:50 +02:00
Mike Fährmann	38bc6430d3	[downloader:http] don't overwrite existing '_mtime' fields	2020-04-10 23:08:03 +02:00
Mike Fährmann	9159cb8fb3	remove trailing dots and spaces from directory names (#647 )	2020-03-19 21:12:18 +01:00
Mike Fährmann	90e4c645ba	[formatter] allow multiple "special" format specifiers (#595 ) It is now, for example, possible to specify multiple replacement operations per format replacement field: {name:Ra/b/Rc/d/}	2020-02-16 21:47:08 +01:00
Mike Fährmann	219c4cc78c	[formatter] allow for numeric list and string indices	2020-02-15 22:46:22 +01:00
Mike Fährmann	7d1da614d9	[formatter] implement field name alternatives (#525 ) The format string '{a\|b\|c}' will now try to use the value from 'a' and fall back to 'b' and 'c' if accessing a field raises an exception or if its value is None.	2020-02-15 17:58:21 +01:00
Mike Fährmann	56f1c96168	implement 'parent-directory' option (#551 )	2020-01-29 18:32:37 +01:00
Mike Fährmann	2a9be48511	improve util.load/save_cookiestxt() and add tests - take a file object as argument instead of an filename - accept whitespace before comments (" # comment") - map expiration "0" to None and not the number 0	2020-01-25 23:02:15 +01:00
Mike Fährmann	c1a6862863	implement functions to load/save cookies.txt files (closes #586 ) The methods of the standard libraries' MozillaCookieJar have several shortcomings (#HttpOnly_ cookies, 0 expiration timestamps, etc.) and require construction of an ultimately pointless CookieJar object.	2020-01-21 21:59:36 +01:00
Mike Fährmann	760b9b4db4	add remove_file() and remove_directory() helpers these functions call os.unlink() or os.rmdir() while catching and suppressing potential OSErrors	2020-01-18 00:21:26 +01:00
Mike Fährmann	b2d542ad40	improve PathFormat._enum_file() open only one try-except block for the whole loop, instead of one for each iteration in os.path.exists()	2020-01-18 00:21:25 +01:00
Mike Fährmann	025f6e3398	add fallback for missing WITHOUT ROWID support (#553 )	2020-01-03 22:58:28 +01:00
Mike Fährmann	58391d492d	cache archive keys generated in __contains__() (#524 ) To avoid writing a different key to the archive than what was checked against before the file download.	2019-12-20 16:43:08 +01:00
Mike Fährmann	0f1538af78	split filename formatting into its own function	2019-11-29 22:32:07 +01:00
Mike Fährmann	3fc1e12949	[postprocessor:metadata] filter private entries i.e. keys starting with an underscore	2019-11-21 16:58:44 +01:00
Mike Fährmann	d5e3910270	adjust 'util.raises()'	2019-10-28 15:06:17 +01:00
Mike Fährmann	c887493a80	overhaul exception stuff	2019-10-27 23:53:37 +01:00
Mike Fährmann	776e9e073f	close archive on job completion (#417 )	2019-09-10 22:43:51 +02:00
Mike Fährmann	0ce98169b8	improve path generation - fix 'abspath()' results for Python <3.7 (closes #402) - 'abspath()' in Python 3.7+ removes trailing path separators - in Python <3.7 it doesn't - filter empty path segments	2019-08-28 23:25:18 +02:00
Mike Fährmann	3284c62f22	ensure PathFormat.directory ends with a path separator ... plus some other small optimizations	2019-08-20 00:25:13 +02:00
Mike Fährmann	e77a656437	optimize directory path generation - use str.join() instead of os.path.join() (less "features", but 10x as fast) - cache directory formatters - detect and optimize field access for 1-element format strings	2019-08-19 15:56:20 +02:00
Mike Fährmann	454bf1ebf9	preserve enumeration index after 'set_extension()' (#306 )	2019-08-16 23:12:33 +02:00
Mike Fährmann	f5039b897f	replace DownloadArchive.check() with __contains__() Interestingly enough, 'a in obj' is slightly faster than 'obj.check(a)' and is also nicer to look at, I think.	2019-08-16 23:12:32 +02:00
Mike Fährmann	5a210991b6	Remove control characters from filesystem paths - add 'path-remove' option to specify the set of characters that should be removed - rename 'restrict-filenames' to 'path-restrict' - #348, #380	2019-08-16 23:12:16 +02:00
Mike Fährmann	0bb873757a	update PathFormat class - change 'has_extension' from a simple flag/bool to a field that contains the original filename extension - rename 'keywords' to 'kwdict' and some other stuff as well - inline 'adjust_path()' - put enumeration index before filename extension (#306)	2019-08-12 21:40:37 +02:00
Mike Fährmann	8dc42bb178	implement 'enumerate' for 'extractor.skip' (#306 ) [ci skip]	2019-08-08 18:37:54 +02:00
Mike Fährmann	b1bea8aaeb	add 'restrict-filenames' option (#348 )	2019-07-23 17:41:24 +02:00
Mike Fährmann	7b77ecc35a	fix paths for files without extension (#220 )	2019-07-15 16:39:03 +02:00
Mike Fährmann	16c582aaf9	implement 'mtime' post-processor (#332 ) This can set a file's modification time according to a UNIX timestamp or a datetime object from its metadata.	2019-07-14 22:39:17 +02:00
Mike Fährmann	40da44b17f	Merge branch 'v1.9.0'	2019-06-29 15:39:52 +02:00
Mike Fährmann	95b1e4c3c0	implement R<old>/<new>/ format option (#318 )	2019-06-23 22:45:44 +02:00
Mike Fährmann	f4ba98771d	use Last-Modified header to set file modification time (#236, #277)	2019-06-19 23:16:32 +02:00
Mike Fährmann	523ebc9b0b	Fix serialization of 'datetime' objects in '--write-metadata' Simplified universal serialization support in json.dump() can be achieved by passing 'default=str', which was already the case in DataJob.run() for -j/--dump-json, but not for the 'metadata' post-processor. This commit introduces util.dump_json() that (more or less) unifies the JSON output procedure of both --write-metadata and --dump-json. (#251, #252)	2019-05-09 16:49:22 +02:00
Mike Fährmann	23baecb29e	fix 'CONVERSIONS' variable name	2019-03-05 22:50:56 +01:00
Mike Fährmann	105097ddcf	add 'S' conversion options for format string fields Same as 's' (convert to string), but has a better, human-readable conversion for lists.	2019-03-04 21:13:34 +01:00
Mike Fährmann	148b8f15d0	update tests for util.py	2019-02-14 11:15:19 +01:00
Mike Fährmann	ae353ed3b0	provide "extractor" and "job" keys for logging output This allows for stuff like "{extractor.url}" and "{extractor.category}" in logging format strings. Accessing 'extractor' and 'job' in any way will return "None" if those fields aren't defined, i.e. in general logging messages.	2019-02-14 11:09:58 +01:00
Mike Fährmann	79c01ec7ae	implement J<separator>/ format option J joins list elements by calling <separator>.join(list): Example: {f:J - /} -> "a - b - c" (if "f" is ["a", "b", "c"])	2019-01-17 17:01:58 +01:00
Mike Fährmann	c5d4f558c9	allow missing field access keys in format strings (#136 )	2018-12-22 13:54:14 +01:00
Mike Fährmann	d3d7f01543	add 'prepare()' step for post-processors This allows post-processors to modify the destination path before checking if a file already exists.	2018-10-18 22:32:03 +02:00
Mike Fährmann	6ed629f2b6	allow specifying number of skips before abort/exit (closes #115 ) In addition to 'abort' and 'exit', it is now possible to specify 'abort:N' and 'exit:N' (where N is any integer) as value for 'skip' to abort/exit after consecutively skipping N downloads.	2018-10-13 17:21:55 +02:00
Mike Fährmann	48a8717a7c	add 'output.num-to-str' option ... to convert any numeric values to string when outputting them as JSON (during '--dump-json' or otherwise)	2018-10-08 20:28:54 +02:00
Mike Fährmann	0514d6a0ae	make --filter and --range config-file options The functionality of --(chapter-)filter and --(chapter-)range are now also exposed as the following config-file options: - extractor..image-filter - extractor..image-range - extractor..chapter-filter - extractor..chapter-range TODO: update configuration.rst	2018-10-07 21:39:56 +02:00
Mike Fährmann	590c0b3ad5	re-implement and improve filename formatter A format string now gets parsed only once instead of re-parsing it each time it is applied to a set of data. The initial parsing causes directory path creation to be at about 2x slower than before, since each format string there is used only once, but building a filename, the more common operation, is at least 2x faster. The "directory slowness" cancels at about 5 filenames and everything above that is significantly faster.	2018-08-25 10:45:14 +02:00
Mike Fährmann	c83fc62abc	prioritize archive over disk access (#87 )	2018-07-30 17:48:23 +02:00
Mike Fährmann	e0dd8dff5f	implement L<maxlen>/<replacement>/ format option The L option allows for the contents of a format field to be replaced with <replacement> if its length is greater than <maxlen>. Example: {f:L5/too long/} -> "foo" (if "f" is "foo") -> "too long" (if "f" is "foobar") (#92) (#94)	2018-07-29 13:52:07 +02:00
Mike Fährmann	8fe9056b16	implement string slicing for format strings It is now possible to slice string (or list) values of format string replacement fields with the same syntax as in regular Python code. "{digits}" -> "0123456789" "{digits[2:-2]}" -> "234567" "{digits[:5]}" -> "01234" The optional third parameter (step) has been left out to simplify things.	2018-07-14 09:53:15 +02:00

1 2 3

115 Commits