Ivan Kozik
|
5229ddf5dc
|
Start work on websocket server for future dashboard integration
|
2015-07-17 22:42:25 +00:00 |
|
Ivan Kozik
|
03d1efc2ce
|
Clarify argument order requirement
|
2015-07-17 03:59:42 +00:00 |
|
Ivan Kozik
|
62cba3a0e7
|
Update UA
|
2015-05-19 20:52:29 +00:00 |
|
Ivan Kozik
|
2d9b1395f1
|
Put the date into the DIR name and WARC name
|
2015-05-19 18:13:03 +00:00 |
|
Ivan Kozik
|
66d22b1556
|
Update UA
|
2015-04-08 20:42:10 +00:00 |
|
Ivan Kozik
|
c83c89b0cf
|
Remove --no-skip-getaddrinfo to match ArchiveBot
|
2015-04-07 07:26:54 +00:00 |
|
Ivan Kozik
|
2983615d50
|
Send Accept-Language to avoid 500 Internal Server Error when sending Firefox UA to reddit.com
|
2015-03-30 03:07:23 +00:00 |
|
Ivan Kozik
|
8bf1b00c46
|
Copy in latest dupespotter
|
2015-03-14 20:36:16 +00:00 |
|
Ivan Kozik
|
040afe0d92
|
Copy in latest dupespotter
|
2015-03-14 19:35:55 +00:00 |
|
Ivan Kozik
|
9fbe0bdb6f
|
Copy in latest dupespotter
|
2015-03-11 22:54:35 +00:00 |
|
Ivan Kozik
|
d14d8135e6
|
Copy in latest dupespotter
|
2015-03-11 20:22:58 +00:00 |
|
Ivan Kozik
|
c140f1caf0
|
Copy in latest dupespotter
|
2015-03-09 07:35:55 +00:00 |
|
Ivan Kozik
|
b785e9b1b7
|
Pause the crawl when running low on disk or memory
|
2015-03-09 06:02:24 +00:00 |
|
Ivan Kozik
|
cbaccd9e02
|
Avoid creating directories with ? or & in the filename, which breaks
sqlalchemy when it tries to parse arguments from the filename.
Fixes https://github.com/ludios/grab-site/issues/1
|
2015-03-09 05:16:54 +00:00 |
|
Ivan Kozik
|
f80df6944f
|
Describe arguments more
|
2015-03-09 05:06:44 +00:00 |
|
Ivan Kozik
|
611a0be845
|
Cleanup
|
2015-03-09 04:53:38 +00:00 |
|
Ivan Kozik
|
820e2aeef4
|
Mention WARC files; clarify
|
2015-03-09 04:52:18 +00:00 |
|
Ivan Kozik
|
a1cbcb9ea9
|
Describe what this is
|
2015-03-09 04:48:27 +00:00 |
|
Ivan Kozik
|
3fe4774c2c
|
Copy in latest dupespotter
|
2015-03-04 04:37:39 +00:00 |
|
Ivan Kozik
|
1f9f80dff0
|
Copy in latest dupespotter
|
2015-03-04 04:23:06 +00:00 |
|
Ivan Kozik
|
fbbfa3c0b4
|
Copy in latest dupespotter
|
2015-03-01 23:43:53 +00:00 |
|
Ivan Kozik
|
5b2b68061d
|
Copy in latest dupespotter
|
2015-02-24 05:03:45 +00:00 |
|
Ivan Kozik
|
85f02d2055
|
Include path/query components in directory name
|
2015-02-23 03:03:15 +00:00 |
|
Ivan Kozik
|
62866f5336
|
Copy in latest dupespotter
|
2015-02-17 01:58:56 +00:00 |
|
Ivan Kozik
|
ccaee25497
|
Link to global ignore set
|
2015-02-05 19:32:47 +00:00 |
|
Ivan Kozik
|
e2118bbea4
|
Clarify
|
2015-02-05 19:31:50 +00:00 |
|
Ivan Kozik
|
4a22b4d593
|
Tell user to install git as well
|
2015-02-05 19:27:19 +00:00 |
|
Ivan Kozik
|
65e096a035
|
Support --ignore-sets= instead of the space-separated version
|
2015-02-05 06:05:54 +00:00 |
|
Ivan Kozik
|
2d7125951f
|
Link to pythex
|
2015-02-05 05:39:44 +00:00 |
|
Ivan Kozik
|
f815920a83
|
Document file formats
|
2015-02-05 05:37:34 +00:00 |
|
Ivan Kozik
|
d73ee5ba27
|
Make it real obvious
|
2015-02-05 05:34:49 +00:00 |
|
Ivan Kozik
|
2f7ae834bb
|
Add ArchiveBot LICENSE
|
2015-02-05 05:22:18 +00:00 |
|
Ivan Kozik
|
0699689a14
|
Add igoff feature
|
2015-02-05 05:19:34 +00:00 |
|
Ivan Kozik
|
2ccb8b4d6f
|
Add support for --no-offsite-links
|
2015-02-05 05:15:46 +00:00 |
|
Ivan Kozik
|
52d0acc3b5
|
Another html5lib comment
|
2015-02-05 05:08:03 +00:00 |
|
Ivan Kozik
|
64d027da2c
|
Fix comment
|
2015-02-05 05:07:05 +00:00 |
|
Ivan Kozik
|
6f8ef82efb
|
Rename script
|
2015-02-05 05:05:53 +00:00 |
|
Ivan Kozik
|
979b843458
|
Load changes from DIR/ignores and DIR/ignore_sets while the crawl is running
|
2015-02-05 04:59:28 +00:00 |
|
Ivan Kozik
|
5f7593fda2
|
Refactor
|
2015-02-05 04:39:52 +00:00 |
|
Ivan Kozik
|
429b2032ff
|
Improve README
|
2015-02-05 04:27:38 +00:00 |
|
Ivan Kozik
|
1705174fb2
|
CRLF -> LF
|
2015-02-05 04:25:49 +00:00 |
|
Ivan Kozik
|
eea440422d
|
Allow specifying --ignore-sets NAME1,NAME2,...
|
2015-02-05 04:24:05 +00:00 |
|
Ivan Kozik
|
a61ed949ca
|
Use global ignore set and also ignore Icecast sites like ArchiveBot
|
2015-02-05 04:03:19 +00:00 |
|
Ivan Kozik
|
2986ae8a31
|
Use cookies.txt
|
2015-02-05 03:45:34 +00:00 |
|
Ivan Kozik
|
91fd89be5d
|
Add a site-grabber based on ArchiveBot's use of wpull
|
2015-02-05 03:43:50 +00:00 |
|