David Yip
d5ca9e0ce9
db: More ic.cz patterns.
...
In particular:
- harizzzma.com and nahraj.net no longer resolve, so don't waste time
trying
- ignore new/register links for forums
- ignore another "add to cart" link
2015-07-29 07:33:50 +00:00
David Yip
174b1815ef
db: ic.cz ignore set - further refinements.
...
In particular:
- ignore more guestbook links
- remove viewtopic.php.*start= from set, because as it turns out that's
a totally valid method for paging through a thread (one way of many,
sigh)
2015-07-29 07:33:50 +00:00
David Yip
ebc858ae32
db: ic.cz: Also ignore &start=\d+ on forums.
...
This appears to be a pagination thing that we don't need.
2015-07-29 07:33:50 +00:00
David Yip
d19bea710a
db: More troublesome infinite-calendar loops on ic.cz.
2015-07-29 07:33:50 +00:00
David Yip
a709dfa6c2
db: An ignore set for unwanted URLs on ic.cz.
...
This could be broken up later, but this is much more convenient for now.
2015-07-29 07:33:50 +00:00
David Yip
089faa5cf9
db: coppermine: also ignore last-commented-by order.
2015-07-29 07:33:50 +00:00
David Yip
a3e21ad5fc
db: Restrict Coppermine album selector to displayimage.php.
2015-07-29 07:33:50 +00:00
David Yip
2ba9dc0187
db: Also ignore Coppermine's lastupby pseudo-album.
2015-07-29 07:33:50 +00:00
David Yip
da76445850
db: Also ignore addfav.php for Coppermine.
2015-07-29 07:33:50 +00:00
David Yip
85e8113f6a
db: Add an ignore set for Coppermine Photo Gallery.
...
ic.cz has TONS of these things.
2015-07-29 07:33:50 +00:00
Ivan Kozik
4ad23c6118
Ignore more twitter share links
2015-07-29 07:33:50 +00:00
Ivan Kozik
661f8be5a7
Ignore non-Icecast mp3 streaming sites
2015-07-29 07:33:50 +00:00
Ivan Kozik
97db1927ac
Ignore more dokuwiki nonsense
2015-07-29 07:33:50 +00:00
Ivan Kozik
aacc472354
Ignore some junk wordpress URLs
2015-07-29 07:33:50 +00:00
Ivan Kozik
0f9ccc4846
Ignore another share link
2015-07-29 07:33:50 +00:00
Ivan Kozik
3f7b022e7c
Ignore another share link
2015-07-29 07:33:49 +00:00
Ivan Kozik
b55a89ecb0
Work around https://github.com/ArchiveTeam/ArchiveBot/issues/138#issuecomment-68352100
2015-07-29 07:33:49 +00:00
Ivan Kozik
12c8536cd3
Work around https://github.com/ArchiveTeam/ArchiveBot/issues/138#issuecomment-68352100
2015-07-29 07:33:49 +00:00
Ivan Kozik
3817170f6d
Work around https://github.com/ArchiveTeam/ArchiveBot/issues/138#issuecomment-68352100
2015-07-29 07:33:49 +00:00
David Yip
4b192e63c5
db: Use correct delimiter for {primary_netloc} in singletumblr. #104 .
2015-07-29 07:33:49 +00:00
David Yip
483c9ac2d2
db: Remove trailing space in singletumblr ignore set. #104 .
2015-07-29 07:33:49 +00:00
David Yip
6be228fe0b
pipeline: Switch to templates for placeholders. #104 .
...
string.format() substitutes all occurrences of {token} with a token in
the formatting map. Unfortunately, {m,} is also regex syntax for
"match m or more repetitions of preceding regex", and we use {3,} in a
global ignore.
Solution: Use a different delimiter. Python's string templates look
like they give us enough power to do what we need to do, and they won't
clobber repetition ranges.
Unfortunately, we can't use the default $ delimiter, because $ is a
regex metacharacter. %# seems sufficiently unlikely to appear in URLs.
2015-07-29 07:33:49 +00:00
David Yip
fd1d4f74d3
db: Add an ignore set to restrict !a *.tumblr.com to the target. #104 .
...
(This is the sort of thing that #104 is useful for.)
2015-07-29 07:33:49 +00:00
Ivan Kozik
673f23960c
Fix typo in /js/chartbeat.js
2015-07-29 07:33:49 +00:00
Ivan Kozik
1126169737
Ignore Special:Log/
2015-07-29 07:33:49 +00:00
Ivan Kozik
1114e93271
Ignore another streaming site
2015-07-29 07:33:49 +00:00
Ivan Kozik
6366e07906
Ignore more of streamtheworld.com
...
Sample URL:
http://7579.live.streamtheworld.com/977_90?type=.flv
2015-07-29 07:33:49 +00:00
Ivan Kozik
cc13f8f7cc
Ignore imageshack.com/lost
2015-07-29 07:33:49 +00:00
David Yip
543c0ca86d
Ignore sets: fix JSON errors.
2015-07-29 07:33:49 +00:00
Start
ae33daa88d
fix ignore
2015-07-29 07:33:49 +00:00
PressStartandSelect
13d921a2a0
add social media ignores and safari user agent
2015-07-29 07:33:49 +00:00
Ivan Kozik
46aae55eaa
Add blogspot.sg
2015-07-29 07:33:49 +00:00
David Yip
c46406bb43
Add Meetup Everywhere ignore set.
...
Added to help out with a bunch of Meetup Everywhere jobs.
2015-07-29 07:33:49 +00:00
Ivan Kozik
6cb33929b2
Ignore Windows 7 .iso's that we've already grabbed
2015-07-29 07:33:49 +00:00
Ivan Kozik
5cb7e2acca
Ignore another Icecast site
2015-07-29 07:33:49 +00:00
Ivan Kozik
51dfe02202
Ignore another Icecast site
2015-07-29 07:33:49 +00:00
Ivan Kozik
27b64dd2a7
Ignore another mp3 streaming site
2015-07-29 07:33:49 +00:00
Ivan Kozik
584746b60f
Ignore another mp3 streaming site
2015-07-29 07:33:49 +00:00
Ivan Kozik
7748204e2f
Ignore another share link
2015-07-29 07:33:49 +00:00
Ivan Kozik
fc51c61050
Ignore another mp3 streaming site
2015-07-29 07:33:49 +00:00
Ivan Kozik
89565717af
Ignore another share link
2015-07-29 07:33:49 +00:00
Ivan Kozik
ca85f5f803
Ignore more share links
2015-07-29 07:33:49 +00:00
Ivan Kozik
e3c8b96b82
Ignore more do=markread
2015-07-29 07:33:49 +00:00
Ivan Kozik
7ea9331fd6
Ignore another Icecast site
2015-07-29 07:33:49 +00:00
Ivan Kozik
2179192043
Ignore some vbulletin loops
2015-07-29 07:33:49 +00:00
Ivan Kozik
46a45eb391
Ignore /ucp\.php\?mode=delete_cookies
2015-07-29 07:33:49 +00:00
Ivan Kozik
644f787151
Fix licdn.com ignore for new wpull URL encoding behavior
2015-07-29 07:33:49 +00:00
Ivan Kozik
d46def8308
Ignore blogger.com/blog_this.pyra
2015-07-29 07:33:49 +00:00
Ivan Kozik
ec8151fcb6
Move blogger.com ignore to global
2015-07-29 07:33:49 +00:00
Ivan Kozik
7483dcbae7
Ignore another mp3 streaming site
2015-07-29 07:33:49 +00:00