为什么请求抛出异常时,由googlebot抓取,而不是当我粘贴在URL



我已经在我的事件日志中得到了大量的这些异常。

EVENT ID: 1309
Event code: 3005 
Event message: An unhandled exception has occurred. 
Event time: 12/12/2011 1:40:41 PM 
Event time (UTC): 12/12/2011 8:40:41 PM 
Event ID: f85f113a40d349f5a1fe9ef481038281 
Event sequence: 8993 
Event occurrence: 1463 
Event detail code: 0 
Application information: 
    Application domain: /LM/W3SVC/12/ROOT-1-129681577057031250 
    Trust level: Full 
    Application Virtual Path: / 
    Application Path: C:inetpubwwwrootgouki 
    Machine name: GOUKIPRIME 
Process information: 
    Process ID: 7508 
    Process name: w3wp.exe 
    Account name: IIS APPPOOLgouki 
Exception information: 
    Exception type: HttpException 
    Exception message: A potentially dangerous Request.Path value was detected from the client (?).
   at System.Web.HttpRequest.ValidateInputIfRequiredByConfig()
   at System.Web.HttpApplication.PipelineStepManager.ValidateHelper(HttpContext context)

Request information: 
    Request URL: http://gouki.com/Story/?page=8&orderby=views&tagged=&subject=&author=?page=10&orderby=views,views,views,&tagged=,,,,,,,,,,,,&subject=,,,,,,,,,,,,,,,,,,&author=,,,,,,,,,,,,,, 
    Request path: /Story/?page=8&orderby=views&tagged=&subject=&author= 
    User host address: 66.249.68.81 
    User:  
    Is authenticated: False 
    Authentication Type:  
    Thread account name: IIS APPPOOLgouki 
Thread information: 
    Thread ID: 142 
    Thread account name: IIS APPPOOLgouki 
    Is impersonating: False 
    Stack trace:    at System.Web.HttpRequest.ValidateInputIfRequiredByConfig()
   at System.Web.HttpApplication.PipelineStepManager.ValidateHelper(HttpContext context)

Custom event details: 
Connection: Keep-alive
Accept: */*
Accept-Encoding: gzip,deflate
From: googlebot(at)googlebot.com
Host: gouki.com
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

我不确定谷歌机器人在哪里捡到畸形的URL(我已经试图在我的网站上无济于事),但我更好奇的是为什么这个异常被记录到事件日志中,当我自己复制/粘贴URL(继续,尝试一下),我没有得到错误。是的,页面有点坏了,因为参数值没有意义,我可以看到为什么双问号可能会导致问题,但没有抛出异常。我试过把我的用户代理改成googlebot,但还是没有看到错误。

由于某种原因,Asp.net MVC是看到第一?作为路径的一部分,而不是查询字符串的开始,但仅当googlebot请求该页面时。

是否有某种类型的转义发生在这里,我没有在事件日志中看到?

注意:

Request path: /Story/?page=8&orderby=views&tagged=&subject=&author=

服务器认为查询字符串参数是页面名称的一部分,这可能意味着第一个问号实际上使用%3f进行了转义,但在错误消息中没有这样显示。问号可以作为查询字符串的分隔符,但不能作为页名的一部分。

机器人在某个地方捡到了URL,也许试图修复它。确保你已经正确地转义了URL,例如,当URL在HTML元素的属性中时,&应该是&

如果页面中有?page=8&orderby=views&tagged=&subject=&author=这样的相对链接,bot可能会尝试通过将其与当前页面URL组合来生成完整的URL,这将解释查询字符串的双集。这应该正常工作,但如果URL的转义有问题,它可能会把它弄乱。

看这个

http://geekswithblogs.net/renso/archive/2011/08/26/a-potentially-dangerous-request-value-was-detected-from-the-client.aspx

最新更新