Ivan Kozik
ed869864d4
README: link to ArchiveBot
2015-07-31 03:52:42 +00:00
Ivan Kozik
6cd50f9688
README: tweak
2015-07-31 03:50:39 +00:00
Ivan Kozik
412ea7791f
README: changes to ignores may take up to 3 seconds to apply
2015-07-30 23:36:12 +00:00
Ivan Kozik
19f6971261
dashboard: don't handle ctrl-f, alt-f, and other ctrl/alt- key combinations
2015-07-29 23:04:20 +00:00
Ivan Kozik
d72e4094d1
Bump version
2015-07-29 18:38:31 +00:00
Ivan Kozik
91ed7689a2
Remove unused local
2015-07-29 18:37:43 +00:00
Ivan Kozik
73d9c03e5e
Remove unused import
2015-07-29 18:35:46 +00:00
Ivan Kozik
a418beaff8
README: tweak for the non-ArchiveBot audience
2015-07-29 08:55:06 +00:00
Ivan Kozik
4f437ae2d0
dashboard: remove mentions of ignore sets
2015-07-29 08:46:35 +00:00
Ivan Kozik
deb05d981d
README: link to correct ignore sets
2015-07-29 08:45:25 +00:00
Ivan Kozik
b806316cb1
Use built-in ignore sets; don't crash if invalid ignore set is specified
2015-07-29 08:36:36 +00:00
Ivan Kozik
22835a5ddc
igsets: global: don't exclude archive.org (that ignore made sense for ArchiveBot, which sent WARCs to IA)
2015-07-29 08:24:42 +00:00
Ivan Kozik
51d3b1f794
igsets: rm internetcentrum - it is long gone
2015-07-29 07:45:58 +00:00
Ivan Kozik
5276fec1a9
Convert JSON ignore sets to plain text to avoid the backslash doubling
2015-07-29 07:44:12 +00:00
Ivan Kozik
68f5fc0dd2
igsets: noonion: fix backslash
2015-07-29 07:40:20 +00:00
Ivan Kozik
4c0f60cf06
Don't try to install patched-wpull as it doesn't exist
2015-07-29 07:37:52 +00:00
Ivan Kozik
e53f4465e5
db/ignore_patterns -> libgrabsite/ignore_sets
2015-07-29 07:37:19 +00:00
Ivan Kozik
5e70cd4acc
Remove questionable /(.*)/(\1/){3,} ignore
2015-07-29 07:33:51 +00:00
David Yip
62d1dbc0ad
Revert "Temporarily ignore voat.co, as it is not responding"
...
This reverts commit f6fb34ad5b46cf730d5e07475b1c1fc73b3570a8.
voat.co is back up.
2015-07-29 07:33:51 +00:00
Ivan Kozik
04a6c18054
Add .kr TLD for blogspot
2015-07-29 07:33:51 +00:00
Ivan Kozik
a6f8d510c0
Ignore simple.reddit.com
2015-07-29 07:33:51 +00:00
Ivan Kozik
7c4c5e42cd
Ignore /.mobile on reddit
2015-07-29 07:33:51 +00:00
Ivan Kozik
ed5fb60cce
Temporarily ignore voat.co, as it is not responding
...
Please revert this when it comes back up
2015-07-29 07:33:51 +00:00
Ivan Kozik
8aae334c25
Ignore another streaming site
2015-07-29 07:33:51 +00:00
Ivan Kozik
4d2a496fbb
Ignore another share link
2015-07-29 07:33:51 +00:00
Ivan Kozik
1d388ae969
Ignore another streaming site
2015-07-29 07:33:51 +00:00
Ivan Kozik
e014c48215
Fix filename
2015-07-29 07:33:51 +00:00
Ivan Kozik
45ec93cc1a
Add noonion ignore set to ignore .onion sites
2015-07-29 07:33:51 +00:00
Ivan Kozik
7320865fd7
Ignore another share link
2015-07-29 07:33:51 +00:00
Ivan Kozik
f34ed18ce6
Ignore Yahoo beacon
2015-07-29 07:33:51 +00:00
Ivan Kozik
6cb0fa49f5
Ignore more ?sort= pages on reddit
2015-07-29 07:33:51 +00:00
Ivan Kozik
cec78653cb
Ignore another share link
2015-07-29 07:33:51 +00:00
Ivan Kozik
9face53dba
Ignore Special:Diff and Special:MobileDiff
2015-07-29 07:33:51 +00:00
Ivan Kozik
8110d41ac4
Ignore another Google Analytics endpoint
2015-07-29 07:33:51 +00:00
Start
3b86cb984e
minor improvements
2015-07-29 07:33:51 +00:00
Start
fcbe206eed
ignore ?sort= for users
2015-07-29 07:33:51 +00:00
Ivan Kozik
25d1da749e
Ignore an Icecast server that doesn't send Icecast headers
2015-07-29 07:33:50 +00:00
Ivan Kozik
efd4658744
Ignore another share link
2015-07-29 07:33:50 +00:00
Ivan Kozik
5652cb23b0
Ignore m.reddit.com
2015-07-29 07:33:50 +00:00
Ivan Kozik
ba9995e799
Ignore /.compact on reddit
2015-07-29 07:33:50 +00:00
Start
76e531ae4a
allow ignore to work on twitter.com
2015-07-29 07:33:50 +00:00
Start
fb00df5f17
add twitter ignore set
2015-07-29 07:33:50 +00:00
Ivan Kozik
2452261272
Ignore loop on media.opb.org/clips/embed/
2015-07-29 07:33:50 +00:00
Ivan Kozik
bc5bad1b16
Ignore a non-Icecast streaming site
2015-07-29 07:33:50 +00:00
Ivan Kozik
1348234ac4
Ignore loop on tm.uol.com.br
...
e.g.
http://tm.uol.com.br/h/par/h/bol/h/par/h/pd/h/bol/h/par/h/bol/h/par/h/pd/h/bol/h/pd/h/par/h/par/xpg.js
http://tm.uol.com.br/h/par/h/bol/h/par/h/pd/h/bol/h/par/h/bol/h/par/h/pd/h/bol/h/pd/h/par/h/par/h/bol/h/par/h/pd/h/bol/h/par/h/bol/h/par/h/pd/h/bol/h/pd/h/par/xpg.js
2015-07-29 07:33:50 +00:00
Ivan Kozik
ee63f6b252
Ignore a non-Icecast streaming site
2015-07-29 07:33:50 +00:00
Nicolas SAPA
00e0d3e586
Ignore broken link to warnerbros.com
...
warnerbros.com/[number] always redirect to a 404 page.
Something on the Internet generate a lot of these links and ArchiveBot waste time getting the same error page again and again.
2015-07-29 07:33:50 +00:00
Ivan Kozik
6011648388
Ignore another share link
2015-07-29 07:33:50 +00:00
Ivan Kozik
10f204f1c3
Ignore more flickr 404s
2015-07-29 07:33:50 +00:00
Ivan Kozik
1a53ecb6ec
Ignore a Google Analytics endpoint
2015-07-29 07:33:50 +00:00