20 Commits

Author SHA1 Message Date
Ivan Kozik
d2f06d22a1 Add another IMDb ignore pattern 2015-07-29 07:33:45 +00:00
Ivan Kozik
9e5b2ef8fa Add IMDb ignore patterns
/board/nest/ has everything, no need to grab other /board/ formats
2015-07-29 07:33:45 +00:00
Ivan Kozik
5f662492ab Add more forum ignore patterns 2015-07-29 07:33:45 +00:00
Ivan Kozik
d99a7d48be Add more phpBB ignore patterns 2015-07-29 07:33:45 +00:00
Ivan Kozik
4ba9cc3a78 Add phpBB patterns; make patterns stricter 2015-07-29 07:33:45 +00:00
Ivan Kozik
0c256a272a Add twitter.com/intent/tweet; add blogspot TLDs 2015-07-29 07:33:45 +00:00
David Yip
054722c334 Ignore patterns: Lua pattern syntax -> regex syntax. 2015-07-29 07:33:45 +00:00
Ivan Kozik
cb5aa1f2ca blogs ignore set: ignore http://www.tumblr.com/impixu 2015-07-29 07:33:45 +00:00
David Yip
fa54c01f56 Fix syntax error in forums ignore set. 2015-07-29 07:33:45 +00:00
David Yip
81a4e2b4b6 Also ignore registration, RSS, and some odd cronjob runner. 2015-07-29 07:33:45 +00:00
David Yip
b3d97ebb67 Start a forums ignore set.
These ignore patterns are derived from vBulletin; more work is needed to
derive a good set for e.g. PhpBB, IPS, and IBB.  I don't think we'll run
into a situation where one set of URLs is valid for one forum software
but not another, but if we do, you can expect this set to be split out
by software name.
2015-07-29 07:33:45 +00:00
David Yip
eed67549f1 Ignore another "open with reply form" LJ URL. 2015-07-29 07:33:45 +00:00
David Yip
e14dbf5261 Add patterns useful for archiving LiveJournal sites.
The rundown:

livejournal%.com/ljcounter%?: LJ's hit counter thing
%?replyto=%d+: reply-to links that just generate a reply box; useless
               for anonymous archival
xiti%.com/hit%.xiti%?: another hit counter thing
2015-07-29 07:33:45 +00:00
David Yip
1c9d0af35c Remove _id from blogs ignore pattern. #40.
_id is now automatically calculated.
2015-07-29 07:33:45 +00:00
Ivan Kozik
694885b733 Also ignore http://r-login.wordpress.com/remote-login.php 2015-07-29 07:33:45 +00:00
Ivan Kozik
d981f64e3d Ignore all ?share= 2015-07-29 07:33:45 +00:00
Ivan Kozik
580120eee7 Add showComment=; add /search/label/; fix . -> %. 2015-07-29 07:33:44 +00:00
David Yip
17a13ea5f5 Fix mistakenly escaped . in blogs ignore set. 2015-07-29 07:33:44 +00:00
David Yip
c1a52c3321 Fix unescaped ( in blogs ignore set. 2015-07-29 07:33:44 +00:00
David Yip
7e5ecf25ce Add the blogs ignore set in #21. 2015-07-29 07:33:44 +00:00