Ivan Kozik
5652cb23b0
Ignore m.reddit.com
2015-07-29 07:33:50 +00:00
Ivan Kozik
ba9995e799
Ignore /.compact on reddit
2015-07-29 07:33:50 +00:00
Start
76e531ae4a
allow ignore to work on twitter.com
2015-07-29 07:33:50 +00:00
Start
fb00df5f17
add twitter ignore set
2015-07-29 07:33:50 +00:00
Ivan Kozik
2452261272
Ignore loop on media.opb.org/clips/embed/
2015-07-29 07:33:50 +00:00
Ivan Kozik
bc5bad1b16
Ignore a non-Icecast streaming site
2015-07-29 07:33:50 +00:00
Ivan Kozik
1348234ac4
Ignore loop on tm.uol.com.br
...
e.g.
http://tm.uol.com.br/h/par/h/bol/h/par/h/pd/h/bol/h/par/h/bol/h/par/h/pd/h/bol/h/pd/h/par/h/par/xpg.js
http://tm.uol.com.br/h/par/h/bol/h/par/h/pd/h/bol/h/par/h/bol/h/par/h/pd/h/bol/h/pd/h/par/h/par/h/bol/h/par/h/pd/h/bol/h/par/h/bol/h/par/h/pd/h/bol/h/pd/h/par/xpg.js
2015-07-29 07:33:50 +00:00
Ivan Kozik
ee63f6b252
Ignore a non-Icecast streaming site
2015-07-29 07:33:50 +00:00
Nicolas SAPA
00e0d3e586
Ignore broken link to warnerbros.com
...
warnerbros.com/[number] always redirect to a 404 page.
Something on the Internet generate a lot of these links and ArchiveBot waste time getting the same error page again and again.
2015-07-29 07:33:50 +00:00
Ivan Kozik
6011648388
Ignore another share link
2015-07-29 07:33:50 +00:00
Ivan Kozik
10f204f1c3
Ignore more flickr 404s
2015-07-29 07:33:50 +00:00
Ivan Kozik
1a53ecb6ec
Ignore a Google Analytics endpoint
2015-07-29 07:33:50 +00:00
David Yip
7d36d72086
db: ic.cz: remove Drupal-specific repeated component ignore
2015-07-29 07:33:50 +00:00
David Yip
7c9812d32c
db: ic.cz: add common patterns from #archivebot
2015-07-29 07:33:50 +00:00
Sanky Sanqui
cfa1fb52c4
ic.cz: remove typo in ignores
2015-07-29 07:33:50 +00:00
Sanky Sanqui
de56bd2eb2
ic.cz: ignore another calendar
2015-07-29 07:33:50 +00:00
Sanky Sanqui
54b6a9fac9
correct escapes in inc.cz ignore
2015-07-29 07:33:50 +00:00
Sanky Sanqui
5920615b10
ic.cz: ignore order, more language variants, more statistics, random_num
2015-07-29 07:33:50 +00:00
Sanky Sanqui
072fdf83c6
ic.cz: ignore broken & escapes
2015-07-29 07:33:50 +00:00
Sanky Sanqui
6ac62ac495
ignore irrelevant languages and .pl spam sites
2015-07-29 07:33:50 +00:00
David Yip
9618bb2f6a
db: ic.cz: ignore prev/next links on web boards
2015-07-29 07:33:50 +00:00
David Yip
0bea1ba215
db: ic.cz: ignore web poll thing
2015-07-29 07:33:50 +00:00
David Yip
d04fc446e6
db: ic.cz: ignore all site statistics.
...
Normally I'd be interested, but we just don't have enough time for
these.
2015-07-29 07:33:50 +00:00
David Yip
37c59bdb44
db: ic.cz: ignore targetx&targety= pairs that come from clicking maps
2015-07-29 07:33:50 +00:00
David Yip
d8ea1afd50
db: ic.cz: ignore more reply/UI-state-change actions.
2015-07-29 07:33:50 +00:00
David Yip
157588e2e1
db: ic.cz: ignore negative indices for image galleries.
...
These don't yield anything useful.
2015-07-29 07:33:50 +00:00
David Yip
0f46c5ddb8
db: Also ignore album sort on coppermine thumbnail pages.
2015-07-29 07:33:50 +00:00
David Yip
7355b63f1d
db: ic.cz: even more calendars.
2015-07-29 07:33:50 +00:00
David Yip
ff8b6de2e5
db: ic.cz: Ignore sorts on shops, write-product-review pages
2015-07-29 07:33:50 +00:00
David Yip
8d78e2eeae
db: How many ways can _you_ write "calendar"?
2015-07-29 07:33:50 +00:00
David Yip
97df444126
db: Remove incorrect mode= string from ic.cz Phorum ignores.
2015-07-29 07:33:50 +00:00
David Yip
fa9dbd8304
db: Ignore sort-order-in-query-string thing on Phorum boards
2015-07-29 07:33:50 +00:00
David Yip
613591b30e
db: ignore more infinite-calendar-things on ic.cz.
2015-07-29 07:33:50 +00:00
Christopher Foo
2d5e36b662
ignore_patterns.singletumblr: Allow a.tumblr.com
...
Allow things like https://a.tumblr.com/tumblr_njvn2jIkir1unm52po1.mp3
served from http://dmcasafe.tumblr.com/post/111195617071/
2015-07-29 07:33:50 +00:00
David Yip
d5ca9e0ce9
db: More ic.cz patterns.
...
In particular:
- harizzzma.com and nahraj.net no longer resolve, so don't waste time
trying
- ignore new/register links for forums
- ignore another "add to cart" link
2015-07-29 07:33:50 +00:00
David Yip
174b1815ef
db: ic.cz ignore set - further refinements.
...
In particular:
- ignore more guestbook links
- remove viewtopic.php.*start= from set, because as it turns out that's
a totally valid method for paging through a thread (one way of many,
sigh)
2015-07-29 07:33:50 +00:00
David Yip
ebc858ae32
db: ic.cz: Also ignore &start=\d+ on forums.
...
This appears to be a pagination thing that we don't need.
2015-07-29 07:33:50 +00:00
David Yip
d19bea710a
db: More troublesome infinite-calendar loops on ic.cz.
2015-07-29 07:33:50 +00:00
David Yip
a709dfa6c2
db: An ignore set for unwanted URLs on ic.cz.
...
This could be broken up later, but this is much more convenient for now.
2015-07-29 07:33:50 +00:00
David Yip
089faa5cf9
db: coppermine: also ignore last-commented-by order.
2015-07-29 07:33:50 +00:00
David Yip
a3e21ad5fc
db: Restrict Coppermine album selector to displayimage.php.
2015-07-29 07:33:50 +00:00
David Yip
2ba9dc0187
db: Also ignore Coppermine's lastupby pseudo-album.
2015-07-29 07:33:50 +00:00
David Yip
da76445850
db: Also ignore addfav.php for Coppermine.
2015-07-29 07:33:50 +00:00
David Yip
85e8113f6a
db: Add an ignore set for Coppermine Photo Gallery.
...
ic.cz has TONS of these things.
2015-07-29 07:33:50 +00:00
Ivan Kozik
4ad23c6118
Ignore more twitter share links
2015-07-29 07:33:50 +00:00
Ivan Kozik
661f8be5a7
Ignore non-Icecast mp3 streaming sites
2015-07-29 07:33:50 +00:00
Ivan Kozik
97db1927ac
Ignore more dokuwiki nonsense
2015-07-29 07:33:50 +00:00
Ivan Kozik
aacc472354
Ignore some junk wordpress URLs
2015-07-29 07:33:50 +00:00
Ivan Kozik
0f9ccc4846
Ignore another share link
2015-07-29 07:33:50 +00:00
Ivan Kozik
3f7b022e7c
Ignore another share link
2015-07-29 07:33:49 +00:00