635 Commits

Author SHA1 Message Date
Ivan Kozik
5276fec1a9 Convert JSON ignore sets to plain text to avoid the backslash doubling 2015-07-29 07:44:12 +00:00
Ivan Kozik
68f5fc0dd2 igsets: noonion: fix backslash 2015-07-29 07:40:20 +00:00
Ivan Kozik
4c0f60cf06 Don't try to install patched-wpull as it doesn't exist 2015-07-29 07:37:52 +00:00
Ivan Kozik
e53f4465e5 db/ignore_patterns -> libgrabsite/ignore_sets 2015-07-29 07:37:19 +00:00
Ivan Kozik
5e70cd4acc Remove questionable /(.*)/(\1/){3,} ignore 2015-07-29 07:33:51 +00:00
David Yip
62d1dbc0ad Revert "Temporarily ignore voat.co, as it is not responding"
This reverts commit f6fb34ad5b46cf730d5e07475b1c1fc73b3570a8.

voat.co is back up.
2015-07-29 07:33:51 +00:00
Ivan Kozik
04a6c18054 Add .kr TLD for blogspot 2015-07-29 07:33:51 +00:00
Ivan Kozik
a6f8d510c0 Ignore simple.reddit.com 2015-07-29 07:33:51 +00:00
Ivan Kozik
7c4c5e42cd Ignore /.mobile on reddit 2015-07-29 07:33:51 +00:00
Ivan Kozik
ed5fb60cce Temporarily ignore voat.co, as it is not responding
Please revert this when it comes back up
2015-07-29 07:33:51 +00:00
Ivan Kozik
8aae334c25 Ignore another streaming site 2015-07-29 07:33:51 +00:00
Ivan Kozik
4d2a496fbb Ignore another share link 2015-07-29 07:33:51 +00:00
Ivan Kozik
1d388ae969 Ignore another streaming site 2015-07-29 07:33:51 +00:00
Ivan Kozik
e014c48215 Fix filename 2015-07-29 07:33:51 +00:00
Ivan Kozik
45ec93cc1a Add noonion ignore set to ignore .onion sites 2015-07-29 07:33:51 +00:00
Ivan Kozik
7320865fd7 Ignore another share link 2015-07-29 07:33:51 +00:00
Ivan Kozik
f34ed18ce6 Ignore Yahoo beacon 2015-07-29 07:33:51 +00:00
Ivan Kozik
6cb0fa49f5 Ignore more ?sort= pages on reddit 2015-07-29 07:33:51 +00:00
Ivan Kozik
cec78653cb Ignore another share link 2015-07-29 07:33:51 +00:00
Ivan Kozik
9face53dba Ignore Special:Diff and Special:MobileDiff 2015-07-29 07:33:51 +00:00
Ivan Kozik
8110d41ac4 Ignore another Google Analytics endpoint 2015-07-29 07:33:51 +00:00
Start
3b86cb984e minor improvements 2015-07-29 07:33:51 +00:00
Start
fcbe206eed ignore ?sort= for users 2015-07-29 07:33:51 +00:00
Ivan Kozik
25d1da749e Ignore an Icecast server that doesn't send Icecast headers 2015-07-29 07:33:50 +00:00
Ivan Kozik
efd4658744 Ignore another share link 2015-07-29 07:33:50 +00:00
Ivan Kozik
5652cb23b0 Ignore m.reddit.com 2015-07-29 07:33:50 +00:00
Ivan Kozik
ba9995e799 Ignore /.compact on reddit 2015-07-29 07:33:50 +00:00
Start
76e531ae4a allow ignore to work on twitter.com 2015-07-29 07:33:50 +00:00
Start
fb00df5f17 add twitter ignore set 2015-07-29 07:33:50 +00:00
Ivan Kozik
2452261272 Ignore loop on media.opb.org/clips/embed/ 2015-07-29 07:33:50 +00:00
Ivan Kozik
bc5bad1b16 Ignore a non-Icecast streaming site 2015-07-29 07:33:50 +00:00
Ivan Kozik
1348234ac4 Ignore loop on tm.uol.com.br
e.g.

http://tm.uol.com.br/h/par/h/bol/h/par/h/pd/h/bol/h/par/h/bol/h/par/h/pd/h/bol/h/pd/h/par/h/par/xpg.js
http://tm.uol.com.br/h/par/h/bol/h/par/h/pd/h/bol/h/par/h/bol/h/par/h/pd/h/bol/h/pd/h/par/h/par/h/bol/h/par/h/pd/h/bol/h/par/h/bol/h/par/h/pd/h/bol/h/pd/h/par/xpg.js
2015-07-29 07:33:50 +00:00
Ivan Kozik
ee63f6b252 Ignore a non-Icecast streaming site 2015-07-29 07:33:50 +00:00
Nicolas SAPA
00e0d3e586 Ignore broken link to warnerbros.com
warnerbros.com/[number] always redirect to a 404 page.
Something on the Internet generate a lot of these links and ArchiveBot waste time getting the same error page again and again.
2015-07-29 07:33:50 +00:00
Ivan Kozik
6011648388 Ignore another share link 2015-07-29 07:33:50 +00:00
Ivan Kozik
10f204f1c3 Ignore more flickr 404s 2015-07-29 07:33:50 +00:00
Ivan Kozik
1a53ecb6ec Ignore a Google Analytics endpoint 2015-07-29 07:33:50 +00:00
David Yip
7d36d72086 db: ic.cz: remove Drupal-specific repeated component ignore 2015-07-29 07:33:50 +00:00
David Yip
7c9812d32c db: ic.cz: add common patterns from #archivebot 2015-07-29 07:33:50 +00:00
Sanky Sanqui
cfa1fb52c4 ic.cz: remove typo in ignores 2015-07-29 07:33:50 +00:00
Sanky Sanqui
de56bd2eb2 ic.cz: ignore another calendar 2015-07-29 07:33:50 +00:00
Sanky Sanqui
54b6a9fac9 correct escapes in inc.cz ignore 2015-07-29 07:33:50 +00:00
Sanky Sanqui
5920615b10 ic.cz: ignore order, more language variants, more statistics, random_num 2015-07-29 07:33:50 +00:00
Sanky Sanqui
072fdf83c6 ic.cz: ignore broken & escapes 2015-07-29 07:33:50 +00:00
Sanky Sanqui
6ac62ac495 ignore irrelevant languages and .pl spam sites 2015-07-29 07:33:50 +00:00
David Yip
9618bb2f6a db: ic.cz: ignore prev/next links on web boards 2015-07-29 07:33:50 +00:00
David Yip
0bea1ba215 db: ic.cz: ignore web poll thing 2015-07-29 07:33:50 +00:00
David Yip
d04fc446e6 db: ic.cz: ignore all site statistics.
Normally I'd be interested, but we just don't have enough time for
these.
2015-07-29 07:33:50 +00:00
David Yip
37c59bdb44 db: ic.cz: ignore targetx&targety= pairs that come from clicking maps 2015-07-29 07:33:50 +00:00
David Yip
d8ea1afd50 db: ic.cz: ignore more reply/UI-state-change actions. 2015-07-29 07:33:50 +00:00