README: Fix headings

2015-12-11 08:40:30 +00:00 · 2015-12-11 08:40:30 +00:00 · 6d3f0f901d
commit 6d3f0f901d
parent 140efe082a
1 changed files with 8 additions and 8 deletions
--- a/README.md
+++ b/README.md
@ -232,13 +232,13 @@ Options can come before or after the URL.

 The defaults usually work fine.

-### Blogger / blogspot.com blogs
+#### Blogger / blogspot.com blogs

 If you want to archive X.blogspot.com from outside the US, start the crawl on http://X.blogspot.com/ncr (ncr = no country redirect) to avoid getting redirected to another TLD.  Note that /ncr sets an `NCR` cookie that expires after a few weeks.

 Some blogspot.com blogs use "[Dynamic Views](https://support.google.com/blogger/answer/1229061?hl=en)" themes that require JavaScript and serve absolutely no HTML content.  In rare cases, you can get JavaScript-free pages by appending `?m=1` (e.g. http://happinessbeyondthought.blogspot.com/?m=1).  Otherwise, you can archive parts of these blogs through Google Cache instead (e.g. https://webcache.googleusercontent.com/search?q=cache:http://blog.datomic.com/) or by using http://archive.is/ instead of grab-site.  If neither of these options work, try [using grab-site with phantomjs](https://github.com/ludios/grab-site/issues/55#issuecomment-162118702).

-### Tumblr blogs
+#### Tumblr blogs

 Use `--igsets=singletumblr` to avoid crawling the homepages of other tumblr blogs.

@ -246,27 +246,27 @@ If you don't care about who liked or reblogged a post, add `\?from_c=` to the cr

 Some tumblr blogs appear to require JavaScript, but they are actually just hiding the page content with CSS.  You are still likely to get a complete crawl.  (See the links in the page source for http://X.tumblr.com/archive).

-### Directory listings ("Index of ...")
+#### Directory listings ("Index of ...")

 Use `--no-dupespotter` to avoid triggering false positives on the duplicate page detector.  Without it, the crawl may miss large parts of the directory tree.

-### Very large websites
+#### Very large websites

 Use `--no-offsite-links` to stay on the main website and avoid crawling linked pages on other domains.

-### Websites that are likely to ban you for crawling fast
+#### Websites that are likely to ban you for crawling fast

 Use `--concurrency=1 --delay=500-1500`.

-### MediaWiki sites with English language
+#### MediaWiki sites with English language

 Use `--igsets=mediawiki`.  Note that this ignore set ignores old page revisions.

-### MediaWiki sites with non-English language
+#### MediaWiki sites with non-English language

 You will probably have to add ignores with translated `Special:*` URLs based on [ignore_sets/mediawiki](https://github.com/ludios/grab-site/blob/master/libgrabsite/ignore_sets/mediawiki).

-### Forums
+#### Forums

 Forums require more manual intervention with ignore patterns.  `--igsets=[forums](https://github.com/ludios/grab-site/blob/master/libgrabsite/ignore_sets/forums)` is often useful for non-SMF forums, but you will have to add other ignore patterns, including one to ignore individual-forum-post pages if there are too many posts to crawl.  (Generally, crawling the thread pages is enough.)