Contents
- 1 WordPress Robots.txt
- 2 Adding Sitemaps to WordPress Robots.txt
- 3 Explanation
- 3.1 Allowing all Bots
- 3.2 Not Allowing any Bots
- 3.3 Block a Folder
- 3.4 Block a File
- 3.5 Block a page and/or a directory named private
- 3.6 Block All Sub Folders starting with private
- 3.7 Block URL’s end with
- 3.8 Block URL’s which includes Question Mark (?)
- 3.9 Block a File Type
- 3.10 Block all Paginated pages which don’t have “?” at the end
- 3.11 Using Hash
- 4 Bots / User Agents
WordPress Robots.txt
1 2 3 |
User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ |
Adding Sitemaps to WordPress Robots.txt
1 2 3 4 5 |
User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Sitemap: http://www.example.com/post-sitemap.xml |
Explanation
Allowing all Bots
- Allowing any Bots to Crawl
1 2 |
User-agent: * Disallow: |
Not Allowing any Bots
- Not Allowing any Bots to Crawl
1 2 3 |
User-agent: * Disallow: / |
Block a Folder
1 2 |
User-agent: * Disallow: /Folder/ |
Block a File
1 2 |
User-agent: * Disallow: /file.html |
Block a page and/or a directory named private
1 2 |
User-agent: * Disallow: /private |
Block All Sub Folders starting with private
1 2 |
User-agent: * Disallow: /private*/ |
Block URL’s end with
1 2 |
User-agent: * Disallow: /*.asp$ |
Block URL’s which includes Question Mark (?)
1 2 |
User-agent: * Disallow: /*?* |
Block a File Type
1 2 |
User-agent: * Disallow: /*.jpeg$ |
Block all Paginated pages which don’t have “?” at the end
- http://www.example.com/blog/? ( Allow )
- http://www.example.com/blog/?page=2 ( Not Allow )
Helps us to Block Paginated pages from Crawling
1 2 3 |
User-agent: * Disallow: /*? # block URL that includes ? Allow: /*?$ # allow URL that ends in ? |
Using Hash
1 |
# Hash is used for commenting out |
Bots / User Agents
Top 10 Bots
Robot |
bingbot |
Googlebot |
Googlebot Mobile |
AhrefsBot |
Baidu |
MJ12bot |
proximic |
A6 |
ADmantX |
msnbot/2.0b |
Individual Crawl request for each Bots
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
User-Agent: Googlebot Allow: / User-Agent: Googlebot-Mobile Allow: / User-Agent: msnbot Allow: / User-Agent: bingbot Allow: / # Adsense User-Agent: Mediapartners-Google Disallow: / # Blekko User-Agent: ScoutJet Allow: / User-Agent: Yandex Allow: / # CommonCrawl User-agent: ccbot Allow: / User-agent: baiduspider Allow: / User-agent: DuckDuckBot Allow: / User-Agent: * Disallow: / |