Ivan Kozik
d21050d2e4
Fix default gravatar ignore
2015-07-29 07:33:45 +00:00
Ivan Kozik
b755e28355
Ignore http variant as well
2015-07-29 07:33:45 +00:00
Ivan Kozik
c863719a8e
Ignore default gravatar; ignore tweet buttons; don't ignore facebook login because we shouldn't be hitting it
2015-07-29 07:33:45 +00:00
Ivan Kozik
00353d9c4a
Ignore some https:// facebook pages as well
2015-07-29 07:33:45 +00:00
Ivan Kozik
e53b933602
Add more ignores to blogs set, needed now that linked pages are being grabbed
2015-07-29 07:33:45 +00:00
Ivan Kozik
66a183a4dc
Add another tumblr ignore
2015-07-29 07:33:45 +00:00
Ivan Kozik
60e383b685
Also ignore feedformat= on mediawiki
2015-07-29 07:33:45 +00:00
Ivan Kozik
fe6a679dc0
Add more mediawiki ignores
2015-07-29 07:33:45 +00:00
Ivan Kozik
7fecfebba2
Add another tumblr ignore
2015-07-29 07:33:45 +00:00
Ivan Kozik
57caa51825
Ignore curid=
2015-07-29 07:33:45 +00:00
Ivan Kozik
7ea290b4b6
Add another forums ignore
2015-07-29 07:33:45 +00:00
Ivan Kozik
2d3ddb9178
Add more mediawiki ignores
2015-07-29 07:33:45 +00:00
Ivan Kozik
ada958c3c4
Start mediawiki ignore patterns
2015-07-29 07:33:45 +00:00
Ivan Kozik
6f2c746d71
Ignore 16x16 tumblr avatars
...
There are sometimes a million of these on a blog
2015-07-29 07:33:45 +00:00
Ivan Kozik
544e4a3838
Ignore /CSI/$ on blogspot
2015-07-29 07:33:45 +00:00
Ivan Kozik
3f83c998eb
Combine some ignores
2015-07-29 07:33:45 +00:00
Ivan Kozik
e458aeba8a
Add another ignore for tumblr
2015-07-29 07:33:45 +00:00
Ivan Kozik
f7a6104f13
Add tumblr ignores
2015-07-29 07:33:45 +00:00
Ivan Kozik
71c81557e3
Add &share= to blog ignores
2015-07-29 07:33:45 +00:00
Ivan Kozik
93b1dc9714
Add tumblr junk to blog ignores
2015-07-29 07:33:45 +00:00
Ivan Kozik
71fa12d78d
Add ?showComment%5C to blog ignores
2015-07-29 07:33:45 +00:00
Ivan Kozik
04b2d31740
Ignore ?like_comment=\d+
2015-07-29 07:33:45 +00:00
Ivan Kozik
8afe4d91e7
Add some social buttons to blogs ignore patterns
2015-07-29 07:33:45 +00:00
David Yip
138e2e7c32
mode=reply can occur in first query position.
2015-07-29 07:33:45 +00:00
Ivan Kozik
78247040c8
Ignore &replytocom= as well
2015-07-29 07:33:45 +00:00
Ivan Kozik
b3402433d3
Add /jetpack-comment/ ignore pattern
2015-07-29 07:33:45 +00:00
Ivan Kozik
5d4eb6e047
Fix trailing comma
2015-07-29 07:33:45 +00:00
Ivan Kozik
bc2a1cc798
Ignore more IMDb /videogallery/
2015-07-29 07:33:45 +00:00
Ivan Kozik
c26e6d8c2e
and another IMDb ignore pattern
2015-07-29 07:33:45 +00:00
Ivan Kozik
d2f06d22a1
Add another IMDb ignore pattern
2015-07-29 07:33:45 +00:00
Ivan Kozik
9e5b2ef8fa
Add IMDb ignore patterns
...
/board/nest/ has everything, no need to grab other /board/ formats
2015-07-29 07:33:45 +00:00
Ivan Kozik
5f662492ab
Add more forum ignore patterns
2015-07-29 07:33:45 +00:00
Ivan Kozik
d99a7d48be
Add more phpBB ignore patterns
2015-07-29 07:33:45 +00:00
Ivan Kozik
4ba9cc3a78
Add phpBB patterns; make patterns stricter
2015-07-29 07:33:45 +00:00
Ivan Kozik
0c256a272a
Add twitter.com/intent/tweet; add blogspot TLDs
2015-07-29 07:33:45 +00:00
David Yip
054722c334
Ignore patterns: Lua pattern syntax -> regex syntax.
2015-07-29 07:33:45 +00:00
Ivan Kozik
cb5aa1f2ca
blogs ignore set: ignore http://www.tumblr.com/impixu
2015-07-29 07:33:45 +00:00
David Yip
fa54c01f56
Fix syntax error in forums ignore set.
2015-07-29 07:33:45 +00:00
David Yip
81a4e2b4b6
Also ignore registration, RSS, and some odd cronjob runner.
2015-07-29 07:33:45 +00:00
David Yip
b3d97ebb67
Start a forums ignore set.
...
These ignore patterns are derived from vBulletin; more work is needed to
derive a good set for e.g. PhpBB, IPS, and IBB. I don't think we'll run
into a situation where one set of URLs is valid for one forum software
but not another, but if we do, you can expect this set to be split out
by software name.
2015-07-29 07:33:45 +00:00
David Yip
eed67549f1
Ignore another "open with reply form" LJ URL.
2015-07-29 07:33:45 +00:00
David Yip
e14dbf5261
Add patterns useful for archiving LiveJournal sites.
...
The rundown:
livejournal%.com/ljcounter%?: LJ's hit counter thing
%?replyto=%d+: reply-to links that just generate a reply box; useless
for anonymous archival
xiti%.com/hit%.xiti%?: another hit counter thing
2015-07-29 07:33:45 +00:00
David Yip
1c9d0af35c
Remove _id from blogs ignore pattern. #40 .
...
_id is now automatically calculated.
2015-07-29 07:33:45 +00:00
Ivan Kozik
694885b733
Also ignore http://r-login.wordpress.com/remote-login.php
2015-07-29 07:33:45 +00:00
Ivan Kozik
d981f64e3d
Ignore all ?share=
2015-07-29 07:33:45 +00:00
Ivan Kozik
580120eee7
Add showComment=; add /search/label/; fix . -> %.
2015-07-29 07:33:44 +00:00
David Yip
17a13ea5f5
Fix mistakenly escaped . in blogs ignore set.
2015-07-29 07:33:44 +00:00
David Yip
c1a52c3321
Fix unescaped ( in blogs ignore set.
2015-07-29 07:33:44 +00:00
David Yip
7e5ecf25ce
Add the blogs ignore set in #21 .
2015-07-29 07:33:44 +00:00