igoff by default

master
Ivan Kozik 2015-07-18 08:23:56 +00:00
parent 0d8288e6fb
commit f4f445b7dd
2 changed files with 4 additions and 3 deletions

View File

@ -64,8 +64,8 @@ changes will be applied as soon as the next URL is grabbed.
`DIR/ignores` is a newline-separated list of [Python 3 regular expressions](http://pythex.org/)
to use in addition to the ignore sets.
You can `touch DIR/igoff` to stop `IGNOR` message spew, and `rm DIR/igoff`
to turn it back on again.
You can `rm DIR/igoff` to display all URLs that are being filtered out
by the ignores, and `touch DIR/igoff` to turn it back off.
Monitoring all of your crawls with the dashboard

View File

@ -30,10 +30,11 @@ for arg in "$@"; do
shift
done
echo
echo
echo "$id" > "$dir/id"
echo "$url" > "$dir/start_url"
echo "global,$igsets" > "$dir/igsets"
touch "$dir/igoff"
touch "$dir/ignores"
# Note: we use the default html5lib parser instead of the lxml that ArchiveBot uses