255 Commits

Author SHA1 Message Date
Ivan Kozik
9f59687c20 Ignore another reddit share URL 2015-07-29 07:33:45 +00:00
Ivan Kozik
5fe8496284 Ignore linkedin 'share' URL 2015-07-29 07:33:45 +00:00
Ivan Kozik
97f1d28b92 Ignore all twitter.com/share? 2015-07-29 07:33:45 +00:00
Ivan Kozik
c9a83e4da0 Ignore another twitter 'tweet' link 2015-07-29 07:33:45 +00:00
Ivan Kozik
ed8fe96db0 Ignore reddit share buttons 2015-07-29 07:33:45 +00:00
Ivan Kozik
edfa9043bf Add missing escapes 2015-07-29 07:33:45 +00:00
Ivan Kozik
d21050d2e4 Fix default gravatar ignore 2015-07-29 07:33:45 +00:00
Ivan Kozik
b755e28355 Ignore http variant as well 2015-07-29 07:33:45 +00:00
Ivan Kozik
c863719a8e Ignore default gravatar; ignore tweet buttons; don't ignore facebook login because we shouldn't be hitting it 2015-07-29 07:33:45 +00:00
Ivan Kozik
00353d9c4a Ignore some https:// facebook pages as well 2015-07-29 07:33:45 +00:00
Ivan Kozik
e53b933602 Add more ignores to blogs set, needed now that linked pages are being grabbed 2015-07-29 07:33:45 +00:00
Ivan Kozik
66a183a4dc Add another tumblr ignore 2015-07-29 07:33:45 +00:00
Ivan Kozik
60e383b685 Also ignore feedformat= on mediawiki 2015-07-29 07:33:45 +00:00
Ivan Kozik
fe6a679dc0 Add more mediawiki ignores 2015-07-29 07:33:45 +00:00
Ivan Kozik
7fecfebba2 Add another tumblr ignore 2015-07-29 07:33:45 +00:00
Ivan Kozik
57caa51825 Ignore curid= 2015-07-29 07:33:45 +00:00
Ivan Kozik
7ea290b4b6 Add another forums ignore 2015-07-29 07:33:45 +00:00
Ivan Kozik
2d3ddb9178 Add more mediawiki ignores 2015-07-29 07:33:45 +00:00
Ivan Kozik
ada958c3c4 Start mediawiki ignore patterns 2015-07-29 07:33:45 +00:00
Ivan Kozik
6f2c746d71 Ignore 16x16 tumblr avatars
There are sometimes a million of these on a blog
2015-07-29 07:33:45 +00:00
Ivan Kozik
544e4a3838 Ignore /CSI/$ on blogspot 2015-07-29 07:33:45 +00:00
Ivan Kozik
3f83c998eb Combine some ignores 2015-07-29 07:33:45 +00:00
Ivan Kozik
e458aeba8a Add another ignore for tumblr 2015-07-29 07:33:45 +00:00
Ivan Kozik
f7a6104f13 Add tumblr ignores 2015-07-29 07:33:45 +00:00
Ivan Kozik
71c81557e3 Add &share= to blog ignores 2015-07-29 07:33:45 +00:00
Ivan Kozik
93b1dc9714 Add tumblr junk to blog ignores 2015-07-29 07:33:45 +00:00
Ivan Kozik
71fa12d78d Add ?showComment%5C to blog ignores 2015-07-29 07:33:45 +00:00
Ivan Kozik
04b2d31740 Ignore ?like_comment=\d+ 2015-07-29 07:33:45 +00:00
Ivan Kozik
8afe4d91e7 Add some social buttons to blogs ignore patterns 2015-07-29 07:33:45 +00:00
David Yip
138e2e7c32 mode=reply can occur in first query position. 2015-07-29 07:33:45 +00:00
Ivan Kozik
78247040c8 Ignore &replytocom= as well 2015-07-29 07:33:45 +00:00
Ivan Kozik
b3402433d3 Add /jetpack-comment/ ignore pattern 2015-07-29 07:33:45 +00:00
Ivan Kozik
5d4eb6e047 Fix trailing comma 2015-07-29 07:33:45 +00:00
Ivan Kozik
bc2a1cc798 Ignore more IMDb /videogallery/ 2015-07-29 07:33:45 +00:00
Ivan Kozik
c26e6d8c2e and another IMDb ignore pattern 2015-07-29 07:33:45 +00:00
Ivan Kozik
d2f06d22a1 Add another IMDb ignore pattern 2015-07-29 07:33:45 +00:00
Ivan Kozik
9e5b2ef8fa Add IMDb ignore patterns
/board/nest/ has everything, no need to grab other /board/ formats
2015-07-29 07:33:45 +00:00
Ivan Kozik
5f662492ab Add more forum ignore patterns 2015-07-29 07:33:45 +00:00
Ivan Kozik
d99a7d48be Add more phpBB ignore patterns 2015-07-29 07:33:45 +00:00
Ivan Kozik
4ba9cc3a78 Add phpBB patterns; make patterns stricter 2015-07-29 07:33:45 +00:00
Ivan Kozik
0c256a272a Add twitter.com/intent/tweet; add blogspot TLDs 2015-07-29 07:33:45 +00:00
David Yip
054722c334 Ignore patterns: Lua pattern syntax -> regex syntax. 2015-07-29 07:33:45 +00:00
Ivan Kozik
cb5aa1f2ca blogs ignore set: ignore http://www.tumblr.com/impixu 2015-07-29 07:33:45 +00:00
David Yip
fa54c01f56 Fix syntax error in forums ignore set. 2015-07-29 07:33:45 +00:00
David Yip
81a4e2b4b6 Also ignore registration, RSS, and some odd cronjob runner. 2015-07-29 07:33:45 +00:00
David Yip
b3d97ebb67 Start a forums ignore set.
These ignore patterns are derived from vBulletin; more work is needed to
derive a good set for e.g. PhpBB, IPS, and IBB.  I don't think we'll run
into a situation where one set of URLs is valid for one forum software
but not another, but if we do, you can expect this set to be split out
by software name.
2015-07-29 07:33:45 +00:00
David Yip
eed67549f1 Ignore another "open with reply form" LJ URL. 2015-07-29 07:33:45 +00:00
David Yip
e14dbf5261 Add patterns useful for archiving LiveJournal sites.
The rundown:

livejournal%.com/ljcounter%?: LJ's hit counter thing
%?replyto=%d+: reply-to links that just generate a reply box; useless
               for anonymous archival
xiti%.com/hit%.xiti%?: another hit counter thing
2015-07-29 07:33:45 +00:00
David Yip
1c9d0af35c Remove _id from blogs ignore pattern. #40.
_id is now automatically calculated.
2015-07-29 07:33:45 +00:00
Ivan Kozik
694885b733 Also ignore http://r-login.wordpress.com/remote-login.php 2015-07-29 07:33:45 +00:00