Urllib2.url打开并冻结请求

编辑：我发现我犯了一个错误，因为错误的原因不是urllib，而是nltk，它无法处理来自这个页面的长字符串。对不起这个。

我不知道为什么，但无论我是使用Urllib2.urlopen还是在遇到特定url时请求，这都无关紧要。

import requests
r = requests.get('SomeURL')
print html = r.text

这是它的行为。1) 当我想到一个由200个URL组成的循环时，它每次都会冻结在同一个URL上。如果我不终止程序，它会在这里停留数小时。2) 当您尝试在循环之外只使用代码的示例时，它是有效的。3) 如果我只把这个网址列入黑名单，它就会顺利通过循环。

它实际上不返回任何类型的错误代码，在循环外运行良好，还设置了超时，但它什么都不做。它仍然无限期地挂着。

那么，有没有其他方法可以在一定时间后强制停止http-get请求，因为超时不起作用。除了urlib2和请求之外，还有其他库可以完成这项工作吗？

for i in range(0,mincount):
code(call the request for urlist[i]) 
It always works but freezes only when I request this site. If i had 200 request to yahoo   for example it would work. But when i try go to this particular url i cannot.  
#end

编辑：这是一个循环的标准，没有太多的错误空间。

我认为这只是一个非常慢的页面；在我的系统上，加载大约需要9.7秒。

如果你试图在短循环中运行它，它确实会冻结。

你可以试试

links = [
'SomeURL',
'http://www.google.com/'
]
for link in links:
try:
html = requests.get(link, timeout=2.).content
print("Successfully loaded {}".format(link))
except requests.Timeout:
print("Timed out loading {}".format(link))

这给了我

Timed out loading SomeURL
Successfully loaded http://www.google.com/

相关内容

最新更新

热门标签：