1066 Commits

Author SHA1 Message Date
Ivan Kozik
f815920a83 Document file formats 2015-02-05 05:37:34 +00:00
Ivan Kozik
d73ee5ba27 Make it real obvious 2015-02-05 05:34:49 +00:00
Ivan Kozik
2f7ae834bb Add ArchiveBot LICENSE 2015-02-05 05:22:18 +00:00
Ivan Kozik
0699689a14 Add igoff feature 2015-02-05 05:19:34 +00:00
Ivan Kozik
2ccb8b4d6f Add support for --no-offsite-links 2015-02-05 05:15:46 +00:00
Ivan Kozik
52d0acc3b5 Another html5lib comment 2015-02-05 05:08:03 +00:00
Ivan Kozik
64d027da2c Fix comment 2015-02-05 05:07:05 +00:00
Ivan Kozik
6f8ef82efb Rename script 2015-02-05 05:05:53 +00:00
Ivan Kozik
979b843458 Load changes from DIR/ignores and DIR/ignore_sets while the crawl is running 2015-02-05 04:59:28 +00:00
Ivan Kozik
5f7593fda2 Refactor 2015-02-05 04:39:52 +00:00
Ivan Kozik
429b2032ff Improve README 2015-02-05 04:27:38 +00:00
Ivan Kozik
1705174fb2 CRLF -> LF 2015-02-05 04:25:49 +00:00
Ivan Kozik
eea440422d Allow specifying --ignore-sets NAME1,NAME2,... 2015-02-05 04:24:05 +00:00
Ivan Kozik
a61ed949ca Use global ignore set and also ignore Icecast sites like ArchiveBot 2015-02-05 04:03:19 +00:00
Ivan Kozik
2986ae8a31 Use cookies.txt 2015-02-05 03:45:34 +00:00
Ivan Kozik
91fd89be5d Add a site-grabber based on ArchiveBot's use of wpull 2015-02-05 03:43:50 +00:00