615 Commits

Author SHA1 Message Date
Mike Fährmann
67791e1b36
[imgur] improve and add image extractor 2017-05-26 22:30:09 +02:00
Mike Fährmann
99b72130ee
[reddit] enable recursion (#15)
reddit extractors now recursively visit other submissions/posts
linked to in the initial set of submissions.
This behaviour can be configured via the 'extractor.reddit.recursion'
key in the configuration file or by `-o recursion=<value>`.

Example:
{"extractor": {
  "reddit": {
   "recursion": <value>
}}}

Possible values:
* -1 - infinite recursion (don't do this)
*  0 - recursion is disabled (default)
*  1 and higher - maximum recursion level
2017-05-26 17:01:27 +02:00
Mike Fährmann
ae686c4c08
run queue items immediately 2017-05-24 15:15:06 +02:00
Mike Fährmann
691c4dd709
support direct image links 2017-05-24 12:51:18 +02:00
Mike Fährmann
d2dceb35b7
implement context-manager to blacklist extractors 2017-05-24 12:42:37 +02:00
Mike Fährmann
30eef527d8
update output logic on error
[ci skip]
2017-05-23 20:12:57 +02:00
Mike Fährmann
e425243b1e
[reddit] some small fixes
- filter or complete some URLs
- remove the 'nofollow:' scheme before printing URLs
- (#15)
2017-05-23 11:48:00 +02:00
Mike Fährmann
a22892f494
[reddit] add subreddit- and submission-extractor
- these extractors scan submissions and their comments for
  (external) URLs and defer them to other extractors
- (#15)
2017-05-23 09:38:50 +02:00
Mike Fährmann
398506da45
update release script 2017-05-22 08:47:58 +02:00
Mike Fährmann
8db3a2fea8
release version 0.8.4 2017-05-21 10:52:35 +02:00
Mike Fährmann
832a4a8ee9
[fallenangels] add manga extractor 2017-05-21 10:37:38 +02:00
Mike Fährmann
f226417420
simplify code by using a MangaExtractor base class 2017-05-20 11:27:43 +02:00
Mike Fährmann
2974d782a3
[yomanga] remove module
site has been shut down
2017-05-20 11:18:44 +02:00
Mike Fährmann
cbb4323f66
add setup.cfg to configure flake8 2017-05-19 19:22:39 +02:00
Mike Fährmann
232fe2dd08
improve the test extractor 2017-05-19 14:04:52 +02:00
Mike Fährmann
b0131ea402
[fallenangels] support this site's Vietnamese version
- https://truyen.fascans.com/
2017-05-18 15:22:25 +02:00
Mike Fährmann
a90c6acc9c
code cleanup + fixes 2017-05-18 15:18:18 +02:00
Mike Fährmann
4c88c0d496
rework the output format for --list-keywords 2017-05-15 18:30:47 +02:00
Mike Fährmann
b6b214f7e9
[deviantart] fix headers for custom-style journals
example: http://shimoda7.deviantart.com/journal/Temporary-absence-231936282
2017-05-15 15:58:06 +02:00
Mike Fährmann
e9a2738257
[deviantart] support images on top of journal entries
example: http://raxnae.deviantart.com/art/Kami-s-Journal-679482236
2017-05-13 21:42:29 +02:00
Mike Fährmann
92597f46d4
[deviantart] add title to journals 2017-05-13 15:36:52 +02:00
Mike Fährmann
107d29ad8a
improve handling of text:... URLs
- don't require // after the colon
- open output files in text mode
2017-05-12 14:10:25 +02:00
Mike Fährmann
677c8ced11
[deviantart] add "journal" extractor
(#14)
2017-05-10 17:21:33 +02:00
Mike Fährmann
e5f79ae839
[deviantart] add support for all media types
- this includes
  - images
  - videos
  - flash-animations
  - journals

- also renamed some of the extractors
  - User  -> Gallery
  - Image -> Deviation
2017-05-10 16:45:45 +02:00
Mike Fährmann
9f1c83297f
[pinterest] allow URLs with any TLD 2017-05-08 15:08:39 +02:00
Mike Fährmann
b3b92ac243
[deviantart] support "All" favorites and add "mature" option
- since there is apparently no actual way to get the "All" favorites
  listing via API, corresponding URLs (.../favourites/?catpath=/) will
  be handled by yielding all deviations from all favorite collections of
  that user

- the "mature" config key works on a per extractor basis (like "username"
  or "password"). values can be the strings "true" or "false", or the
  booleans true or false.

- (#14)
2017-05-06 21:26:27 +02:00
Mike Fährmann
7376ad7f3d
[deviantart] turn the "Mature Content Filter" off
(#14)
2017-05-06 14:56:41 +02:00
Mike Fährmann
ef90a2de2f
implement the "exit" option for the "skip" config-key 2017-05-05 15:49:58 +02:00
Mike Fährmann
cfbf79d788
[pixiv] fix login 2017-05-05 10:38:22 +02:00
Mike Fährmann
85a46ed700
[booru] fix issue with multiple tags 2017-05-04 11:58:51 +02:00
Mike Fährmann
fc9223c072
add '--abort-on-skip' option and ability to control skip behavior
the 'skip' config option controls skipping behavior:
    true    - skip download if file already exist (default)
    false   - download and overwrite files even if it exists
    "abort" - abort extractor run if a download would be skipped
              (same as '--abort-on-skip')
2017-05-03 15:26:04 +02:00
Mike Fährmann
7c8f61a116
release version 0.8.3 2017-05-01 13:30:09 +02:00
Mike Fährmann
d948ba1322
[readcomics] remove module
- site has been unavailable for two weeks
- (#12)
2017-05-01 11:44:12 +02:00
Mike Fährmann
a610b35a0d
[mangashare] remove module
this site has been unavailable for at least two months
2017-05-01 11:06:38 +02:00
Mike Fährmann
4e8587bad4
[pixiv] add support for https://i.pximg.net URLs 2017-04-30 22:54:49 +02:00
Mike Fährmann
e41efbd2d9
[kissmanga] fix edge-case 2017-04-30 11:02:32 +02:00
Mike Fährmann
ffd72424bf
[kissmanga] another attempt at getting the AES key 2017-04-29 15:58:33 +02:00
Mike Fährmann
af56887a47
[exhentai] fall back to e-hentai if no username is given 2017-04-28 15:59:56 +02:00
Mike Fährmann
48a5b11204
fix error if no file extension is found 2017-04-26 12:31:42 +02:00
Mike Fährmann
701c016b97
add '-q/--quiet' option 2017-04-26 11:33:19 +02:00
Mike Fährmann
4b967fa189
implement and use extractor.config() method 2017-04-25 17:12:48 +02:00
Mike Fährmann
f0aa35ac84
add '--ignore-config' option 2017-04-25 17:09:10 +02:00
Mike Fährmann
82ab1fca07
[seiga] reduce cache maxage to one week 2017-04-24 15:25:20 +02:00
Mike Fährmann
ec48d25afc
[pawoo] fix extraction results 2017-04-22 11:14:20 +02:00
Mike Fährmann
244ab75cad
[kissmanga] update AES key retrieval 2017-04-21 20:36:47 +02:00
Chen John L
a5485a46cb fixed the module for pixhost 2017-04-21 19:54:10 +08:00
Mike Fährmann
13dc5d72bc
update some extractors to use https 2017-04-20 13:32:40 +02:00
Mike Fährmann
342371086b
[pawoo] add extractors for accounts and statuses
https://pawoo.net is a Mastodon[1] instance hosted by Pixiv
[1] https://github.com/tootsuite/mastodon
2017-04-19 10:17:43 +02:00
Mike Fährmann
5af35ea150
add -v/--verbose option and reduce error verbosity
(#12)
2017-04-18 11:38:48 +02:00
Mike Fährmann
0770de0ea1
[deviantart:image] add support for sta.sh URLs 2017-04-17 11:52:16 +02:00