2015-02-05 04:27:38 +00:00
|
|
|
On Ubuntu 14.04.1 or newer:
|
|
|
|
|
2015-02-05 04:25:49 +00:00
|
|
|
```
|
|
|
|
sudo apt-get install --no-install-recommends build-essential python3-dev python3-pip
|
|
|
|
pip3 install --user wpull manhole lmdb
|
|
|
|
git clone https://github.com/ludios/grab-site
|
|
|
|
```
|
|
|
|
|
2015-02-05 04:27:38 +00:00
|
|
|
Usage:
|
|
|
|
|
|
|
|
```
|
|
|
|
./grab-site URL
|
|
|
|
./grab-site URL --ignore-sets blogs,forums
|
|
|
|
```
|
|
|
|
|
|
|
|
Note: `--ignore-sets=` with `=` will **not** work.
|
|
|
|
|
2015-02-05 04:59:28 +00:00
|
|
|
While the crawl is running, you can edit `DIR/ignores` and `DIR/ignore_sets`; the
|
|
|
|
changes will be applied as soon as the next URL is grabbed.
|
|
|
|
|
2015-02-05 04:27:38 +00:00
|
|
|
License:
|
|
|
|
|
|
|
|
This repo is almost entirely code from ArchiveBot, please see the ArchiveBot license.
|