grab-site/README.md

24 lines
575 B
Markdown
Raw Normal View History

2015-02-05 04:27:38 +00:00
On Ubuntu 14.04.1 or newer:
2015-02-05 04:25:49 +00:00
```
sudo apt-get install --no-install-recommends build-essential python3-dev python3-pip
pip3 install --user wpull manhole lmdb
git clone https://github.com/ludios/grab-site
```
2015-02-05 04:27:38 +00:00
Usage:
```
./grab-site URL
./grab-site URL --ignore-sets blogs,forums
```
Note: `--ignore-sets=` with `=` will **not** work.
While the crawl is running, you can edit `DIR/ignores` and `DIR/ignore_sets`; the
changes will be applied as soon as the next URL is grabbed.
2015-02-05 04:27:38 +00:00
License:
This repo is almost entirely code from ArchiveBot, please see the ArchiveBot license.