gallery-dl

Author	SHA1	Message	Date
Mike Fährmann	d02f7c1118	improve Extractor.wait() - allow 'until' to be a datetime object - do "time calculations" with UTC timestamps - set a default 'reason'	2020-04-05 21:23:05 +02:00
Mike Fährmann	2a4f227e08	warn about expired cookies	2020-02-25 00:34:42 +01:00
Mike Fährmann	56f1c96168	implement 'parent-directory' option (#551 )	2020-01-29 18:32:37 +01:00
Mike Fährmann	2a9be48511	improve util.load/save_cookiestxt() and add tests - take a file object as argument instead of an filename - accept whitespace before comments (" # comment") - map expiration "0" to None and not the number 0	2020-01-25 23:02:15 +01:00
Mike Fährmann	c1a6862863	implement functions to load/save cookies.txt files (closes #586 ) The methods of the standard libraries' MozillaCookieJar have several shortcomings (#HttpOnly_ cookies, 0 expiration timestamps, etc.) and require construction of an ultimately pointless CookieJar object.	2020-01-21 21:59:36 +01:00
Mike Fährmann	bd5ce9855c	allow GalleryExtractors to set URL-independent extensions	2020-01-14 11:53:32 +01:00
Mike Fährmann	3811fd8a25	fix time formatting for Python 3.4 and 3.5 'datetime.time.isoformat()' only has an optional 'timespec' argument since Python 3.6.	2020-01-05 00:47:10 +01:00
Mike Fährmann	569747a78d	implement extractor.wait()	2020-01-04 23:42:07 +01:00
Mike Fährmann	ce54b8c04c	let extractors opt-out of cookie option usage useful to avoid sending unnecessary cookies when all authentication is done through OAuth tokens	2020-01-01 21:12:37 +01:00
Mike Fährmann	d3e44e899d	raise NotFoundErrors for 404 responses in GalleryExtractors	2019-12-13 18:42:04 +01:00
Mike Fährmann	a4dd8b3dab	improve _check_cookies() Only loop over all cookies once instead of calling cookiejar._find() for each cookie name.	2019-12-13 15:51:20 +01:00
Mike Fährmann	15f9bb3d14	add option to disable pyOpenSSL usage (#508 ) (pyOpenSSL is now disabled by default)	2019-12-08 21:21:00 +01:00
Mike Fährmann	e17907ee2a	change default value of 'cookies-update' to 'true'	2019-12-05 23:43:49 +01:00
Mike Fährmann	e2710702d4	fix Cloudflare bypss	2019-12-01 01:07:24 +01:00
Mike Fährmann	ae09f87602	improve SharedConfigMixin config lookups	2019-11-25 18:31:38 +01:00
Mike Fährmann	f5604492c3	update interface of config functions	2019-11-24 00:42:28 +01:00
Mike Fährmann	d45fabb79d	match user profile handling on deviantart and newgrounds	2019-11-22 23:20:21 +01:00
Mike Fährmann	1a197d2195	store the original cookiejar as Extractor._cookiejar	2019-11-05 21:53:22 +01:00
Mike Fährmann	de83ae4576	make 'method' argument of Extractor.request keyword-only	2019-11-05 17:28:09 +01:00
Mike Fährmann	d44f790e81	adjust output for HTTP status related errors	2019-10-27 23:55:02 +01:00
Mike Fährmann	389d2d7e38	implement 'cookies-update' option (#445 )	2019-10-19 15:23:55 +02:00
Mike Fährmann	1693d97bd3	update extractor class hierarchies - let the GalleryExtractor class inherit directly from Extractor - make ChapterExtractor a subclass of GalleryExtractor - change enumeration field names of GalleryExtractors to 'num'	2019-10-16 18:15:29 +02:00
Mike Fährmann	f4bc75e854	fix rate limit handling for OAuth APIs (#368 )	2019-08-03 13:43:00 +02:00
Mike Fährmann	21991acc49	add 'ciphers' option; update default User-Agent	2019-07-19 17:14:40 +02:00
Mike Fährmann	84f4d3bc0b	replace urllib3's default cipher list with Firefox's (#342 ) Avoids Cloudflare CAPTCHAs on both Linux in Windows without pyOpenSSL installed.	2019-07-18 19:42:13 +02:00
Mike Fährmann	09f37fde39	[reddit] move date-min/-max handling into Extractor class	2019-07-16 22:54:39 +02:00
Mike Fährmann	56c7a66a4a	detect Cloudflare CAPTCHAs and update cipher list	2019-07-10 15:18:20 +02:00
Mike Fährmann	fdec59f8e2	replace extractor.request() 'expect' argument with - 'fatal': allow 4xx status codes - 'notfound': raise NotFoundError on 404	2019-07-05 00:42:16 +02:00
Mike Fährmann	69205df68d	allow '-1' for infinite retries (#300 )	2019-06-30 23:10:47 +02:00
Mike Fährmann	f7b5c4c3e7	use values of 'retries' options correctly The RE-tries option now specifies exactly that: the maximum number a failed HTTP request is re-tried. For example a value of 2 will now correctly stop after 3 attempts: the initial one + 2 re-tries. The maximum wait-time now also caps at 30min and increases exponentially for both extractor.request() and downloader.http.download().	2019-06-30 23:10:18 +02:00
Mike Fährmann	399e8e965a	also update urllib3's cipher list for versions >= 1.25	2019-05-21 23:02:20 +02:00
Mike Fährmann	c02f12ce2f	avoid Cloudflare CAPTCHAs for OpenSSL < 1.1.1 see https://github.com/Anorov/cloudflare-scrape/pull/242	2019-05-15 12:25:20 +02:00
Mike Fährmann	5fd94c6b83	import urllib3 from requests.packages	2019-05-04 22:28:07 +02:00
Mike Fährmann	35f343206c	update default SSL cipher list in urllib3 < 1.25 Cloudflare now also checks the client's SSL/TLS cipher capabilities and produces a 403: Forbidden response with CAPTCHA if they are insufficient. This commit replaces the default cipher list in urllib3 < 1.25 with the one from 1.25 (1), which doesn't cause problems as long as the client platform actually supports these ciphers. On some platforms (tested with Python 3.4 on Linux and Python 3.7 on an outdated Windows 7 VM) it is necessary to install pyOpenSSL to get everything to work. Explicitly setting a minimum/maximum version for urllib3 is also no longer necessary and installing gallery-dl will therefore not pull a incompatible urllib3 version (#229) Fixes the "403: Forbidden" error on Artstation (#227) (1) `0cedb3b0f1`	2019-05-03 22:40:04 +02:00
Mike Fährmann	e25ebc4bff	don't disable certificate checks anymore Executables generated with PyInstaller auto-include the root certificate file and certificate checks now work out-of-the-box.	2019-04-17 13:27:19 +02:00
Mike Fährmann	49a6522c38	ensure consistent headers and params ordering Necessary to avoid being labeled a bot and getting a CAPTCHA response after solving a Cloudflare challenge.	2019-04-09 10:52:27 +02:00
Mike Fährmann	f612284d24	cache cfclearance cookies	2019-03-14 16:14:29 +01:00
Mike Fährmann	591a07f20c	small code changes and cleanups	2019-03-13 22:03:02 +01:00
Mike Fährmann	6dae6bee37	automatically detect and bypass cloudflare challenge pages TODO: cache and re-apply cfclearance cookies	2019-03-10 15:31:33 +01:00
Mike Fährmann	4ca4631bad	simplify auto-disabling certificate verification if no certificate bundle is found	2019-03-08 16:34:01 +01:00
Mike Fährmann	09d872a2b1	generalize extractor creation code	2019-03-07 22:55:26 +01:00
Mike Fährmann	3595cd582f	use GalleryExtractor as common base class	2019-03-01 14:13:16 +01:00
Mike Fährmann	5530871b5a	change results of text.nameext_from_url() Instead of getting a complete 'filename' from an URL and splitting that into 'name' and 'extension', the new approach gets rid of the complete version and renames 'name' to 'filename'. (Using anything other than {extension} for a filename extension doesn't really work anyway) Example: "https://example.org/path/filename.ext" before: - filename : filename.ext - name : filename - extension: ext now: - filename : filename - extension: ext	2019-02-14 16:07:17 +01:00
Mike Fährmann	32edf4fc7b	add '_extractor' info to manga extractor results	2019-02-13 13:23:36 +01:00
Mike Fährmann	2e516a1e3e	store the full original URL in Extractor.url	2019-02-12 18:46:48 +01:00
Mike Fährmann	580baef72c	change Chapter and MangaExtractor classes - unify and simplify constructors - rename get_metadata and get_images to just metadata() and images() - rename self.url to chapter_url and manga_url	2019-02-11 18:38:47 +01:00
Mike Fährmann	4b1880fa5e	propagate 'match' to base extractor constructor	2019-02-11 13:31:10 +01:00
Mike Fährmann	9a9cd32461	implement alternative constructor for extractors	2019-02-09 14:42:25 +01:00
Mike Fährmann	6284731107	simplify extractor constants - single strings for URL patterns - tuples instead of lists for 'directory_fmt' and 'test' - single-tuple tests where applicable	2019-02-08 13:45:40 +01:00
Mike Fährmann	bc0951d974	allow for simplified test data structures Instead of a strict list of (URL, RESULTS)-tuples, extractor result tests can now be a single (URL, RESULTS)-tuple, if it's just one test, and "only matching" tests can now be a simple string.	2019-02-06 17:24:44 +01:00

1 2 3

114 Commits