Ivan Kozik
aacc472354
Ignore some junk wordpress URLs
2015-07-29 07:33:50 +00:00
Ivan Kozik
0f9ccc4846
Ignore another share link
2015-07-29 07:33:50 +00:00
Ivan Kozik
3f7b022e7c
Ignore another share link
2015-07-29 07:33:49 +00:00
Ivan Kozik
b55a89ecb0
Work around https://github.com/ArchiveTeam/ArchiveBot/issues/138#issuecomment-68352100
2015-07-29 07:33:49 +00:00
Ivan Kozik
12c8536cd3
Work around https://github.com/ArchiveTeam/ArchiveBot/issues/138#issuecomment-68352100
2015-07-29 07:33:49 +00:00
Ivan Kozik
3817170f6d
Work around https://github.com/ArchiveTeam/ArchiveBot/issues/138#issuecomment-68352100
2015-07-29 07:33:49 +00:00
David Yip
4b192e63c5
db: Use correct delimiter for {primary_netloc} in singletumblr. #104 .
2015-07-29 07:33:49 +00:00
David Yip
483c9ac2d2
db: Remove trailing space in singletumblr ignore set. #104 .
2015-07-29 07:33:49 +00:00
David Yip
6be228fe0b
pipeline: Switch to templates for placeholders. #104 .
...
string.format() substitutes all occurrences of {token} with a token in
the formatting map. Unfortunately, {m,} is also regex syntax for
"match m or more repetitions of preceding regex", and we use {3,} in a
global ignore.
Solution: Use a different delimiter. Python's string templates look
like they give us enough power to do what we need to do, and they won't
clobber repetition ranges.
Unfortunately, we can't use the default $ delimiter, because $ is a
regex metacharacter. %# seems sufficiently unlikely to appear in URLs.
2015-07-29 07:33:49 +00:00
David Yip
fd1d4f74d3
db: Add an ignore set to restrict !a *.tumblr.com to the target. #104 .
...
(This is the sort of thing that #104 is useful for.)
2015-07-29 07:33:49 +00:00
Ivan Kozik
673f23960c
Fix typo in /js/chartbeat.js
2015-07-29 07:33:49 +00:00
Ivan Kozik
1126169737
Ignore Special:Log/
2015-07-29 07:33:49 +00:00
Ivan Kozik
1114e93271
Ignore another streaming site
2015-07-29 07:33:49 +00:00
Ivan Kozik
6366e07906
Ignore more of streamtheworld.com
...
Sample URL:
http://7579.live.streamtheworld.com/977_90?type=.flv
2015-07-29 07:33:49 +00:00
Ivan Kozik
cc13f8f7cc
Ignore imageshack.com/lost
2015-07-29 07:33:49 +00:00
David Yip
543c0ca86d
Ignore sets: fix JSON errors.
2015-07-29 07:33:49 +00:00
Start
ae33daa88d
fix ignore
2015-07-29 07:33:49 +00:00
PressStartandSelect
13d921a2a0
add social media ignores and safari user agent
2015-07-29 07:33:49 +00:00
Ivan Kozik
46aae55eaa
Add blogspot.sg
2015-07-29 07:33:49 +00:00
David Yip
c46406bb43
Add Meetup Everywhere ignore set.
...
Added to help out with a bunch of Meetup Everywhere jobs.
2015-07-29 07:33:49 +00:00
Ivan Kozik
6cb33929b2
Ignore Windows 7 .iso's that we've already grabbed
2015-07-29 07:33:49 +00:00
Ivan Kozik
5cb7e2acca
Ignore another Icecast site
2015-07-29 07:33:49 +00:00
Ivan Kozik
51dfe02202
Ignore another Icecast site
2015-07-29 07:33:49 +00:00
Ivan Kozik
27b64dd2a7
Ignore another mp3 streaming site
2015-07-29 07:33:49 +00:00
Ivan Kozik
584746b60f
Ignore another mp3 streaming site
2015-07-29 07:33:49 +00:00
Ivan Kozik
7748204e2f
Ignore another share link
2015-07-29 07:33:49 +00:00
Ivan Kozik
fc51c61050
Ignore another mp3 streaming site
2015-07-29 07:33:49 +00:00
Ivan Kozik
89565717af
Ignore another share link
2015-07-29 07:33:49 +00:00
Ivan Kozik
ca85f5f803
Ignore more share links
2015-07-29 07:33:49 +00:00
Ivan Kozik
e3c8b96b82
Ignore more do=markread
2015-07-29 07:33:49 +00:00
Ivan Kozik
7ea9331fd6
Ignore another Icecast site
2015-07-29 07:33:49 +00:00
Ivan Kozik
2179192043
Ignore some vbulletin loops
2015-07-29 07:33:49 +00:00
Ivan Kozik
46a45eb391
Ignore /ucp\.php\?mode=delete_cookies
2015-07-29 07:33:49 +00:00
Ivan Kozik
644f787151
Fix licdn.com ignore for new wpull URL encoding behavior
2015-07-29 07:33:49 +00:00
Ivan Kozik
d46def8308
Ignore blogger.com/blog_this.pyra
2015-07-29 07:33:49 +00:00
Ivan Kozik
ec8151fcb6
Move blogger.com ignore to global
2015-07-29 07:33:49 +00:00
Ivan Kozik
7483dcbae7
Ignore another mp3 streaming site
2015-07-29 07:33:49 +00:00
Ivan Kozik
74b96843c5
Ignore more JavaScript non-URLs
2015-07-29 07:33:49 +00:00
Ivan Kozik
c05ecaf70e
Ignore more mp3 streaming sites
2015-07-29 07:33:49 +00:00
Ivan Kozik
5dc41cf274
Ignore *.corp.ne1.yahoo.com - drops traffic
2015-07-29 07:33:49 +00:00
Ivan Kozik
02bb21afd2
Ignore more mp3 streaming sites
2015-07-29 07:33:49 +00:00
Ivan Kozik
fef513ef9d
Ignore more mp3 streaming sites
2015-07-29 07:33:49 +00:00
Ivan Kozik
27451df729
Ignore more mp3 streaming sites
2015-07-29 07:33:49 +00:00
Ivan Kozik
71164d0f8a
Update global.json
2015-07-29 07:33:49 +00:00
Ivan Kozik
4f0295f473
Ignore another Icecast site
2015-07-29 07:33:49 +00:00
Ivan Kozik
60c8f47f72
Remove unnecessary ignore
...
" is quoted
2015-07-29 07:33:49 +00:00
Ivan Kozik
dd85e1f295
Add ignores for wpull@develop
...
It does not quote as many URLs
2015-07-29 07:33:49 +00:00
Ivan Kozik
5021267c8c
Ignore bad /js/chartbeat.js links
2015-07-29 07:33:49 +00:00
Ivan Kozik
884dac1e51
Ignore bad linkedin URLs found by wpull
2015-07-29 07:33:49 +00:00
Ivan Kozik
a1da3de9af
Ignore more twitter share links
2015-07-29 07:33:49 +00:00