From 33cc3040ed7b80ef9dc064b0cac4d455b66c7ad3 Mon Sep 17 00:00:00 2001 From: Ivan Kozik Date: Mon, 10 Aug 2015 11:48:53 +0000 Subject: [PATCH] mediawiki igset: add comments --- libgrabsite/ignore_sets/mediawiki | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/libgrabsite/ignore_sets/mediawiki b/libgrabsite/ignore_sets/mediawiki index e480028..8e5b4dc 100644 --- a/libgrabsite/ignore_sets/mediawiki +++ b/libgrabsite/ignore_sets/mediawiki @@ -1,3 +1,5 @@ +# This ignore set avoids grabbing the full history of each page, because there +# are generally far too many ?oldid= pages to crawl completely. [\?&]oldid=\d+ [\?&]curid=\d+ [\?&]limit=(20|100|250|500) @@ -10,11 +12,15 @@ ([\?&]title=|/)Special:Log/ [\?&]action=edit§ion=(\d+|new) [\?&]feed(format)?=atom -[\?&]redlink=1 [\?&]printable=yes [\?&]mobileaction= [\?&]undo(after)?=\d+ ^http://a\.wikia-beacon\.com/__track/ + +# Links to pages that don't exist +[\?&]redlink=1 + +# Loops /User_talk:.+/User_talk: /User_blog:.+/User_blog: /User:.+/User: