排除测试子域被搜索引擎抓取(w/ SVN Repository)

我有:

domain.com
testing.domain.com

我想domain.com被爬行和索引的搜索引擎，但不是testingdomain.com

测试域和主域共享同一个SVN存储库，所以我不确定单独的robots.txt文件是否可以工作…

1)创建单独的robots.txt文件(例如命名为robots_testing.txt)。

2)将此规则添加到网站根目录下的。htaccess目录:

RewriteCond %{HTTP_HOST} =testing.example.com
RewriteRule ^robots.txt$ /robots_testing.txt [L]

如果域名= testing.example.com，它将重写(内部重定向)robots.txt到robots_testing.txt的任何请求。

或者，做相反的事情——重写robots.txt对robots_disabled.txt的所有请求，除了example.com:

RewriteCond %{HTTP_HOST} !=example.com
RewriteRule ^robots.txt$ /robots_disabled.txt [L]

testingdomain.com应该有自己的robots.txt文件，如下所示

User-agent: *
Disallow: /
User-agent: Googlebot
Noindex: /

位于http://testing.domain.com/robots.txt
这将禁止所有bot用户代理，因为谷歌也会查看Noindex，我们只是为了更好地衡量它。

你也可以将你的子域名添加到网站管理员工具-阻止由robots.txt，并提交一个网站删除(虽然这将只适用于谷歌)。要了解更多信息，请查看http://googlewebmastercentral.blogspot.com/2010/03/url-removal-explained-part-i-urls.html

相关内容

最新更新

热门标签：